Comparison of Clustering Accuracy in Ensemble Approach Based on Co-Occurence Data

Rozmus, Dorota

doi:11089/351

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2010 | 235 Multivariate Statistical Analysis | 177--184

Tytuł artykułu

Comparison of Clustering Accuracy in Ensemble Approach Based on Co-Occurence Data

Autorzy

Dorota Rozmus

Treść / Zawartość

Pełne teksty:

http://dspace.uni.lodz.pl:8080/xmlui/bitstream/handle/11089/351/177-184.pdf?sequence=1 [zdalny]

Warianty tytułu

Porównanie dokładności metod taksonomicznych w podejściu wielomodelowym opartym na macierzy współwystąpień

Języki publikacji

Abstrakty

Ensemble approach has been successfully applied in the context of supervised learning to increase the accuracy and stability of classification. Recently, analogous techniques for cluster analysis have been suggested. Research has proved that by combining a collection of different clusterings, an improved solution can be obtained. In the traditional way of learning from the data set the classifiers are built in a feature space. However, an alternative way can be found by constructing decision rules on dissimilarity representations. In such a recognition process each object is described by a matrix showing the similarities or distances to the rest of training samples. This research has focused on exploiting the additional information provided by a collection of diverse clusterings to generate a co-association (co-occurrence) matrix (Fred and Jain, 2002). Taking the co-occurrences of pairs of patterns in the same cluster as votes for their association, the data partitions are mapped into a co-association matrix of patterns. This n x n matrix represents a new similarity measure between patterns. The final data partition is obtained by clustering this matrix. In the experiments, the behavior of partitions built on co-occurrence data with different clustering methods is studied. (original abstract)

Podejście wielomodelowe dotychczas z dużym powodzeniem stosowane było w klasyfikacji i regresji w celu podniesienia dokładności predykcji. W ostatnich latach analogiczne propozycje pojawiły się także w taksonomii, a liczne badania wykazały, że agregacja różniących się między sobą wyników wielokrotnego grupowania, pozwala na poprawę dokładności klasyfikacji. W badaniu uwaga została skupiona na pozyskaniu dodatkowej informacji dostarczanej przez zbiór wyników wielokrotnie dokonanej klasyfikacji w celu konstrukcji tzw. macierzy współwystąpień. Biorąc pod uwagę jednoczesne wystąpienie pary obiektów w tej samej klasie jako wskazówkę istnienia związku między nimi, pierwotny zbiór obserwacji przekształcany jest w n x n wymiarową macierz, która opisuje podobieństwo miedzy obiektami. Ostateczne grupowanie dokonywane jest na podstawie uzyskanej macierzy współwystąpień. Celem referatu jest porównanie dokładności rozpoznawania poprawnej struktury klas zaproponowanego podejścia wielomodelowego z zastosowaniem różnych algorytmów taksonomicznych do konstrukcji macierzy współwystąpień oraz jej późniejszego podziału na klasy obiektów podobnych do siebie. (abstrakt oryginalny)

Słowa kluczowe

Taxonomic methods Taxonomy Matrix Classification Statistical methods

Metody taksonomiczne Taksonomia Macierze Klasyfikacja Metody statystyczne

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2010

Tom

235 Multivariate Statistical Analysis

Strony

177--184

Opis fizyczny

Twórcy

autor

Dorota Rozmus

The Karol Adamiecki University of Economics in Katowice, Poland

Bibliografia

Bezdek J. C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, New York.
Blake C., Keogh E., Merz C. J. (1988), UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine.
Breiman L. (1996), Bagging Predictors, Machine Learning, 26 (2): 123 - 140.
Fred A. (2002), Finding Consistent Clusters in Data Partitions, in Roli F., Kittler J., editors, Proceedings of the International Workshop on Multiple Classifier Systems, pages: 309 - 318, LNC.
Fred A., Jain A. K. (2002), Data Clustering Using Evidence Accumulation, Proceedings of the Sixteenth International Conference on Pattern Recognition, pages 276-280, ICPR, Canada.
Jain A., Murty M.N. and Flynn P. (1999), Data Clustering: A Review, ACM Computing Surveys, 31 (3):264-323.
Kaufman L., Rousseeuw P. J. (1990), Finding Groups in Data: A Introduction to Cluster Analysis, Wiley, New York.
Freund Y. (1990), Boosting a weak learning algorithm by majority. Proceedings of the Third Annual Workshop on Computational Learning Theory, pages: 202-216.
Kuncheva L. I., Hadjitodorov S. T., Todorova L. P. (2006), Experimental Comparison of Cluster Ensemble Methods, Nineteenth International Conference on Information Fusion, pages: 1-7, Florence.
Pekalska E., Duin R. P. W. (2000), Classifiers for Dissimilarity-based Pattern Recognition, in Sanfeliu A., Villanueva J. J, Vanrell M., Alquezar R., Jain A. K. and Kittler J., editors, Proceedings of the Fifteenth International Conference on Pattern Recognition, pages: 12 - 16, IEEE Computer Society Press, Los Alamitos.
Rand W. M. (1971), Objective criteria for the evaluation of Clustering methods, Journal of the American Statistical Association, 66: 846 850.
Strehl A., Ghosh J. (2002), Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions, Journal of Machine Learning Research, 3: 583 - 618.
Tsymbal A., Pechenizkiy M., Cunningham P. (2003), Diversity in Ensemble Feature Selection, Technical Report, Trinity College Dublin.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

11089/351

Identyfikator YADDA

bwmeta1.element.ekon-element-000169658208

Komentarze

Musisz być zalogowany aby pisać komentarze.

Acta Universitatis Lodziensis. Folia Oeconomica

Comparison of Clustering Accuracy in Ensemble Approach Based on Co-Occurence Data

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane