Measures of Diversity and the Classification Error in the Multiple-model Approach

Gatnar, Eugeniusz

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2009 | 225 Methodological Aspects and Applications of Multivariate Statistical Analysis | 101--109

Tytuł artykułu

Measures of Diversity and the Classification Error in the Multiple-model Approach

Autorzy

Eugeniusz Gatnar

Warianty tytułu

Miary zróżnicowania modeli a błąd klasyfikacji w podejściu wielomodelowym

Języki publikacji

Abstrakty

Multiple-model approach (model aggregation, model fusion) is most commonly used in classification and regression. In this approach K component (single) models C₁(x), C₁(x), … , C_K(x) are combined into one global model (ensemble) C^*(x), for example using majority voting:
                     _K
C^* = arg max {Σ I (C_k(x)=y)}            (1)
                ^y   ^k=1
Turner i Ghosh (1996) proved that the classification error of the ensemble C^*(x) depends on the diversity of the ensemble members. In other words, the higher diversity of component models, the lower classification error of the combined model. Since several diversity measures for classifier ensembles have been proposed so far in this paper we present a comparison of the ability of selected diversity measures to predict the accuracy of classifier ensembles. (original abstract)

Podejście wielomodelowe (agregacja modeli), stosowane najczęściej w analizie dyskryminacyjnej i regresyjnej, polega na połączeniu M modeli składowych C₁(x), ..., C_M(x) jeden model globalny C^*(x):
_K
C^* = arg max {Σ I (C_m(x)=y)}
^y ^k=1
Turner i Ghosh (1996) udowodnili, że błąd klasyfikacji dla modelu zagregowanego C^*(x) zależy od stopnia podobieństwa (zróżnicowania) modeli składowych. Inaczej mówiąc, najbardziej dokładny model C^*(x) składa się z modeli najbardziej do siebie niepodobnych, tj. zupełnie inaczej klasyfikujących te same obiekty. W literaturze zaproponowano kilka miar pozwalających ocenić podobieństwo (zróżnicowanie) modeli składowych w podejściu wielomodelowym. W artykule omówiono związek znanych miar zróżnicowania z oceną wielkości błędu klasyfikacji modelu zagregowanego. (abstrakt oryginalny)

Słowa kluczowe

Multiple-model approach Classification error Diversity measures Aggregation models

Podejście wielomodelowe Błąd klasyfikacji Miary zróżnicowania Agregacja modeli

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2009

Tom

225 Methodological Aspects and Applications of Multivariate Statistical Analysis

Strony

101--109

Opis fizyczny

Twórcy

autor

Eugeniusz Gatnar

The Karol Adamiecki University of Economics in Katowice, Poland

Bibliografia

Breiman L. (1996), Bagging predictors, "Machine Learning", 24, 123-140.
Breiman L. (1998), Arcing classifiers, "Annals of Statistics", 26, 801-849.
Breiman L. (1999), Using adaptive bagging to debias regressions. Technical Report 547, Department of Statistics, University of California, Berkeley.
Breiman L. (2001), Random forests, "Machine Learning", 45, 5-32.
Cunnigham P., Carney J. (2000), Diversity versus quality in classification ensembles based on feature selection, [in:] Proceedings of European Conference on Machine Learning, LNCS, vol. 1810, Springer, Berlin, 109-116.
Dietterich T., Bakiri G. (1995), Solving multiclass learning problem via error-correcting output codes, "Journal of Artificial Intelligence Research", 2, 263-286.
Fleiss J. L. (1981), Statistical methods for rates and proportions, John Wiley and Sons, New York.
Freund Y., Schapire R. E. (1997), A decision-theoretic generalization of on-line learning and an application to boosting, "Journal of Computer and System Sciences", 55, 119-139.
Gatnar E. (2001), Nonparametric method for classification and regression, PWN, Warszawa (in Polish).
Gatnar E. (2005), A diversity measure for tree-based classifier ensembles, [in:] Data analysis and decision support, eds D. Baier, R. Decker, L. Schmidt-Thieme, Springer-Verlag, Heidelberg-Berlin, 30-38.
Giacinto G., Roli F. (2001), Design of effective neural network ensembles for image classification processes, "Image Vision and Computing Journal", 19, 699-707.
Hansen L. K., Sal am on P. (1990), Neural network ensembles, "IEEE Transactions on Pattern Analysis and Machine Intelligence", 12, 993-1001.
Ho T. K. (1998), The random subspace method for constructing decision forests, "IEEE Transactions on Pattern Analysis and Machine Intelligence", 20, 832-844.
Kuncheva L., Whitaker C., Shipp D., Duin R. (2000), Is independence good for combining classifiers, [in:] Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, 168-171.
Kuncheva L., Whitaker C. (2003): Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, "Machine Learning", 51, 181-207.
Margineantu M. M., Dietterich T. G. (1997), Pruning adaptive boosting, [in:] Proceeding of the 14th Internationa Conference on Machine Learning, Morgan Kaufmann, San Mateo, 211-218.
Oza N. C., Tumar K., Dimensionality reduction through classifier ensembles, Technical Report, NASA-ARC-IC-1999-126, Computational Sciences Division, NASA Ames Research Center.
Partridge D., Krzanowski W. J. (1997), Software diversity: practical statistics for its measurement and exploitation, "Information and software Technology", 39, 707 717.
Partridge D., Yates W. B. (1996), Engineering multiversion neural-net systems, "Neural Computation", 8, 869-893.
Sharkey A., Sharkey N. (1997), Diversity, selection, and ensembles of artificial neural nets, [in:] Neural Networks and their applications, NEURAP-97, 205-212.
Skalak D. B. (1996), The sources of increased accuracy for two proposed boosting algorithms, [in:] Proceedings of the American Association for Artificial Intelligence AAAI-96, Morgan Kaufmann, San Mateo.
Tumer K., Ghosh J. (1996), Analysis of decision boundaries in linearly combined neural classifiers, "Pattern Recognition", 29, 341-348.
Wolpert D. (1992), Stacked generalization, "Neural Networks", 5, 241-259.

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000165203605

Komentarze

Musisz być zalogowany aby pisać komentarze.

Acta Universitatis Lodziensis. Folia Oeconomica

Measures of Diversity and the Classification Error in the Multiple-model Approach

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane