Internal Cluster Quality Indexes for Classification of Symbolic Data

Dudek, Andrzej

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2009 | 225 Methodological Aspects and Applications of Multivariate Statistical Analysis | 91--100

Tytuł artykułu

Internal Cluster Quality Indexes for Classification of Symbolic Data

Autorzy

Andrzej Dudek

Warianty tytułu

Mierniki jakości klasyfikacji dla danych symbolicznych

Języki publikacji

Abstrakty

This paper describes main classification methods used for symbolic data (e.g. data in form of: single quantitative value, categorical value, interval, multivalued variable, multivaliued variable with weights) presents difficulties of measuring clustering quality for symbolic data (such as lack of "traditional" data matrix), presents which of known indexes like Silhouette index, Ball index, Hartingan index, Baker and Hubert index, Huberta and Levine index, Ratkovski index, Ball index, Hartigan index, Krzanowski and Lai index, Scott index, Marriot index, Rubin index, Friedman index may be used for validation of such type of data and what indexes are specific only for symbolic data. Simulation results arc used to propose most adequate indexes for each classification algorithm. (original abstract)

Artykuł opisuje procedury klasyfikacyjne, które mogą być używane dla danych symbolicznych (tj. dla danych mogących być reprezentowanych w postaci: liczb, danych jakościowych, przedziałów liczbowych, zbioru wartości, zbioru wartości z wagami), przedstawia problemy związane z mierzeniem jakości klasyfikacji dla tych procedur (takie jak brak „klasycznej" macierzy danych) oraz przedstawia, które ze znanych indeksów, takich jak: Silhouette, indeks Calińskiego-Harabasza, indeks Bakera-Huberta, indeks Huberta-Levine, indeks Ratkowskiego, indeks Balia, indeks Hartigana, indeks Krzanowskiego-Lai, indeks Scotta, indeks Marriota, indeks Rubina i indeks Friedmana, mogą być wykorzystane dla tego typu danych oraz jakie są miary jakości podziału specyficzne dla danych symbolicznych. Na podstawie przeprowadzonych symulacji zaproponowane zostały indeksy faktycznie odzwierciedlające strukturę klas dla poszczególnych algorytmów klasyfikacyjnych. (abstrakt oryginalny)

Słowa kluczowe

Classification Classification methods Measures of clustering quality

Klasyfikacja Metody klasyfikacyjne Mierniki jakości klasyfikacji

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2009

Tom

225 Methodological Aspects and Applications of Multivariate Statistical Analysis

Strony

91--100

Opis fizyczny

Twórcy

autor

Andrzej Dudek

Wrocław University of Economics, Poland

Bibliografia

Baker F. B., Hubert L. J. (1975), Measuring the power of hierarchical cluster analysis, „Journal of the American Statistical Association", 70, 349, 31-38.
Bock H.-H., Diday E. (eds) (2000), Analysis of symbolic data. Explanatory methods for extracting statistical information from complex data, Springer Verlag, Berlin.
Caliński R. B., Harabasz J. (1974), A dendrite method for cluster analysis, "Communications in Statistics", 3, 1-27.
Chavent M., DeCarvalho F. A. T., Verde R. and LechevallierY. (2003), Trois nouvelle méthodes de classification automatique de données symboliques de type intervalle, "Revue de Statistique Appliquée" , LI 4, 5-29.
Diday E. (2002), An introduction to symbolic data analysis and the SODAS software, "J.S.D. A., International E-Journal".
Gordon A. D. (1999), Classification, Chapman & Hall/CRC, London
Hubert L. J. (1974), Approximate evaluation technique for the single-link and complete-link hierarchical clustering procedures, "Journal of the American Statistical Association", 69, 347, 698-704.
Hubert L. J., Levine J. R. (1976), Evaluating object set partitions: free sort analysis and some generalizations, "Journal of Verbal Learning and Verbal Behaviour", 15, 549-570.
Kaufman L., Rousseeuw P. J. (1990), Finding groups in data: an introduction to cluster analysis, Wiley, New York.
Krzanowski W. J., Lai Y. T. (1985), A criterion for determining the number of groups in a data set using sum of squares clustering, "Biometrics", 44, 23-34.
Malerba D., Espozito F., Giovalle V., Tamma V. (2001), Comparing dissimilarity measures for symbolic data analysis, "New Techniques and Technologies for Statistics" (ETK-NTTS'01), 473-481.
Mc Quitty L. L. (1966), Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data, "Educational and Psychological Measurement", 26, 825-831.
Milligan G. W., Cooper M. C. (1985), An examination of procedures for determining the number of clusters in a data set, "Psychometrika", 2, 159-179.
Rousseeuw P. J. (1987), Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, "Journal of Computational and Applied Mathematics", 20, 53-65.
Verde R. (2004), Clustering methods in symbolic data analysis, Classification, "Clustering and Data Mining", Berlin-Springer-Verlag, 299-318.
Weingessel A., Dimitriadou A., Dolnicar S. (1999), An examination of indexes for determining the number of clusters in binary data sets, available at URL: http://www.wu- -wicn.ac.at/am/wp99.htm

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000165203525

Komentarze

Musisz być zalogowany aby pisać komentarze.

Acta Universitatis Lodziensis. Folia Oeconomica

Internal Cluster Quality Indexes for Classification of Symbolic Data

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane