Losowy dobór cech a agregacja drzew klasyfikacyjnych

Gatnar, Eugeniusz

Artykuł - szczegóły

Czasopismo

Studia Ekonomiczne / Akademia Ekonomiczna w Katowicach

2003 | nr 29 Metody wnioskowania statystycznego w badaniach ekonomicznych | 57--69

Tytuł artykułu

Losowy dobór cech a agregacja drzew klasyfikacyjnych

Autorzy

Eugeniusz Gatnar

Warianty tytułu

Random Feature Selection and Aggregation of the Classification Trees.

Języki publikacji

Abstrakty

Artykuł dotyczy zagadnienia oceny wartości prognostycznych modeli budowanych na podstawie drzewa klasyfikacyjnego. Ponieważ wartość prognostyczna drzewa klasyfikacyjnego nie jest zbyt duża, przedstawiono propozycję wyeliminowania braku stabilności tego modelu poprzez agregację w jeden z wielu pojedynczych modeli dyskryminacyjnych. Zaproponowano zamiast losowego doboru obiektów do prób uczących, losowy dobór zmiennych do modelu, co skutkuje wyraźną redukcją błędu klasyfikacji.

Single classification tree model depends on the contents of the training set, i.e. the small changes in the data lead to major changes in the response y, therefore it is not a stable classifier. In result it often gives a high classification error for the set of cases to be classified. The serious reduction of the classification error is possible by aggregation of the multiple classification trees. The proposed methods, i.e., bagging, boosting and adaptive bagging (a hybrid method) are based on bootstrap sampling from the training set. They are successful in reduction of a classification error but, on the other hand, resampling leads to major modification of the training set. Randomization can be also used in the tree-based classifiers in a different way. Instead of weighting cases and sampling them for the training samples, it is possible to use training samples with randomly chosen subsets of the variables. In addition, this method does not modify the distribution of the predictors in the training set. (original abstract)

Słowa kluczowe

Modele ekonometryczne Analiza dyskryminacyjna Model z losowymi parametrami Dobór zmiennych Metody samowsporne

Econometric models Discriminant analysis Model with random parameters Variables selection Bootstrap

Czasopismo

Studia Ekonomiczne / Akademia Ekonomiczna w Katowicach

Rocznik

2003

Numer

nr 29 Metody wnioskowania statystycznego w badaniach ekonomicznych

Strony

57--69

Opis fizyczny

Twórcy

autor

Eugeniusz Gatnar

Bibliografia

Amit Y., Geman D. (1997). Shape Quantization and Recognition with Randomized Trees. "Neural Computation" 9, pp. 1545-1588.
Blake C., Keogh E., Merz CJ. (1998). UCI Repository of Machine Learning Databases. Department of Information and Computer Science. University of California, Irvine, CA.
Breiman L. (1996). Bagging Predictors. "Machine Learning" 24, pp. 123-140.
Breiman L. (1998). Arcing Classifiers. "Annals of Statistics" 26, pp. 801-849.
Breiman L. (1999). Using Adaptive Bagging to Debias Regressions. Technical Report, Department of Statistics, University of California, Berkeley.
Breiman L. (2001). Random Forests. "Machine Learning" 45, pp. 5-32.
Breiman L., Friedman J., Olshen R., Stone C. (1984). Classification and Regression Trees. Chapman & Hall/CRC Press, London.
Freund Y., Schapire R.E. (1997). A Decision-theoretic Generalization of On-line Learning and an Application to Boosting. "Journal of Computer and System Sciences" 55, pp. 119-139.
Gatnar E. (2001). Nieparametryczna metoda dyskryminacji i regresji. PWN, Warszawa.
Gatnar E. (2002). Agregacja modeli dyskryminacyjnych. "Taksonomia". Prace Naukowe Akademii Ekonomicznej we Wrocławiu, nr 942, pp. 217-226.
Hastie T., Tibshirani R., Friedman J. (2001). The Elements of Statistical Learning. Springer, New York.
Ho T.K. (1998). The Random Subspace Method for Constructing Decision Forests. "IEEE Trans, on Pattern Analysis and Machine Learning" 20, pp. 832-844.
Kohavi R., Wolpert D.H. (1996). Bias Plus Variance Decomposition for Zero-One Loss Functions. In: Saitta L. (Ed.). Machine Learning: Proceedings of the 13th International Conference. Morgan Kaufman, pp. 313-321.
Quinlan J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo.

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000064116452

Komentarze

Musisz być zalogowany aby pisać komentarze.

Studia Ekonomiczne / Akademia Ekonomiczna w Katowicach

Losowy dobór cech a agregacja drzew klasyfikacyjnych

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane