Random Selection of Variables for Aggregated Tree-Based Models

Gatnar, Eugeniusz; Rozmus, Dorota

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2006 | 196 Multivariate Statistical Analysis : Methods and Applications | 103--111

Tytuł artykułu

Random Selection of Variables for Aggregated Tree-Based Models

Autorzy

Eugeniusz Gatnar , Dorota Rozmus

Warianty tytułu

Języki publikacji

Abstrakty

Tree-based models are popular a widely used because they are simple, flexible and powerful tools for classification. Unfortunately they are not stable classifiers. Significant improvement of the model stability and prediction accuracy can be obtained by aggregation of multiple classification trees. Proposed methods, i.e. bagging, adaptive bagging, and arcing are based on sampling cases from the training set while boosting uses a system of weights for cases. The result is called committee of trees, an ensemble or a forest. Recent developments in this field showed that randomization (random selection of variables) in aggregated tree-based classifiers leads to consistent models while boosting can overfit. In this paper we discuss optimal parameter values for the method of random selection of variables (RandomForesi) for an aggregated tree-based model (i.e. number of trees in the forest and number of variables selected for each split). (original abstract)

Drzewa klasyfikacyjne, z uwagi na swoją prostotę, elastyczność i skuteczność stają się coraz częściej wykorzystywaną metodą klasyfikacji. Mimo wielu zalet, wadą tej metody jest brak stabilności. Poprawę stabilności i dokładności predykcji można osiągnąć poprzez agregację wielu drzew klasyfikacyjnych w jeden model. Proponowane w literaturze metody agregacji, takie jak: bagging, adaptive bagging i arcing opierają się na losowaniu obiektów ze zbioru uczącego; natomiast boosting stosuje dodatkowo system wag. W efekcie otrzymujemy zbiór drzew klasyfikacyjnych, tworzących model zagregowany. Ponieważ losowanie obiektów może powodować zmiany rozkładu zmiennych w zbiorze uczącym, dlatego poprawę dokładności predykcji można uzyskać poprzez losowy dobór zmiennych do prób uczących, w oparciu o które powstają modele składowe agregatu. W niniejszym artykule przedmiotem rozważań jest oszacowanie optymalnej wielkości parametrów dla procedury RandomForesi, realizującej losowy dobór zmiennych do modelu w postaci zbioru zagregowanych drzew klasyfikacyjnych. (abstrakt oryginalny)

Słowa kluczowe

Classification trees Random variable Econometric models Model with random parameters Variables selection

Drzewa klasyfikacyjne Zmienne losowe Modele ekonometryczne Model z losowymi parametrami Dobór zmiennych

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2006

Tom

196 Multivariate Statistical Analysis : Methods and Applications

Strony

103--111

Opis fizyczny

Twórcy

autor

Eugeniusz Gatnar

The Karol Adamiecki University of Economics in Katowice, Poland

autor

Dorota Rozmus

The Karol Adamiecki University of Economics in Katowice, Poland

Bibliografia

Blake C, Keogh E., Merz С. J. (1998), UCI Repository of Machine Learning Databases, Departament of Information and Computer Science, University of California, Irvine, CA.
Bauer E., Kohavi R. (1999), "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants", Machine Learning, 36, 105-142.
Breiman L. (2003), Manual on Settings up, Using and Understanding Random Forest, http://oz.berkeley.EDU/users/breiman/UsingrandomforestsV3.l.
Breiman L. (2001), "Random Forests", Machine Learning, 45, 5-32.
Breiman L. (1999), "Using Adaptive Bagging to Debias Regressions", Technical Report 547, Statistics Department, University of California, Berkeley.
Breiman L. (1998), "Arcing Classifers", Annals of Statistics, 26, 801-849.
Breiman L. (1996), "Bagging Predictors", Machine Learning, 24, 123-140.
Dietterich Т., Kong E. (1995), "Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms", Technical Report, Department of Computer Science, Oregon State University.
Freund Y., Schapire R. E. (1997), "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting", Journal of Computer and System Sciences, 55, 119-139.
Gatnar E. (2001), Nonparametric Method for Discrimination and Regression, (in Polish Wydawnictwo Naukowe PWN, Warszawa.
Но Т. К. (1998), 'The Random Subspace Method for Constructing Decision Forests", IEEE Trans, on Pattern Analysis and Machine Learning, 20, 832-844.
Quinlan J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo.
Wolpert D. (1992), "Stacked Generalization", Neural Networks, 5, 241-259.

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000168679891

Komentarze

Musisz być zalogowany aby pisać komentarze.

Acta Universitatis Lodziensis. Folia Oeconomica

Random Selection of Variables for Aggregated Tree-Based Models

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane