The Influence of Irrelevant Variables on Classification Error in Rules Induction

Kubus, Mariusz

doi:11089/696

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2011 | 255 Methodological Aspects of Multivariate Statistical Analysis : Statistical Models and Applications | 167--173

Tytuł artykułu

The Influence of Irrelevant Variables on Classification Error in Rules Induction

Autorzy

Mariusz Kubus

Treść / Zawartość

Pełne teksty:

http://dspace.uni.lodz.pl:8080/xmlui/bitstream/handle/11089/696/167-173.pdf?sequence=1 [zdalny]

Warianty tytułu

Wpływ zmiennych nieistotnych na błąd klasyfikacji w indukcji reguł

Języki publikacji

Abstrakty

Typical data mining task is to extract unsuspected and systematic relations from the data, when there are no previously set expectations about the nature of this relations. When data sets are large and not collected for a purpose to answer the particular question, there are usually many irrelevant variables which may deteriorate the quality of discrimination model. In such situations feature selection methods are applied. In adaptive and nonparametric methods of discrimination (classification trees, rules induction) feature selection is a part of learning algorithm. Using simulations, the influence of irrelevant variables on classification error is examined in this methods. (original abstract)

Typowym zadaniem data mining jest wykrycie niespodziewanych i systematycznych relacji w danych, gdy nie ma wcześniejszych oczekiwań co do natury tych relacji. W dużych zbiorach, które nie były zgromadzone w celu prowadzonej przez badacza analizy, zwykle występuje wiele zmiennych nieistotnych, co może obniżyć jakość modelu dyskryminacyjnego. W takich sytuacjach stosowane są metody selekcji zmiennych. W nieparametrycznych i adaptacyjnych metodach dyskryminacji (drzewa klasyfikacyjne, indukcja reguł) selekcja zmiennych jest częścią algorytmu uczącego. Za pomocą symulacji badany jest wpływ zmiennych nieistotnych na błąd klasyfikacji w tych metodach. (abstrakt oryginalny)

Słowa kluczowe

Logic of induction Data Mining Discriminant analysis Algorithms

Logika indukcji Data Mining Analiza dyskryminacyjna Algorytmy

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2011

Tom

255 Methodological Aspects of Multivariate Statistical Analysis : Statistical Models and Applications

Strony

167--173

Opis fizyczny

Twórcy

autor

Mariusz Kubus

Opole University of Technology, Poland

Bibliografia

Breiman L. (2001). Random forests. "Machine Learning", 45. p. 5-32.
Clark P., Boswell R. (1991). Rule induction with CN2: some recent improvements, [in:] Kodratoff Y. (red.) Machine learning - EWSL-91, European working session on learning, p. 151-163. Springer Verlag. Berlin.
Clark P., Niblett T. (1989), The CN2 induction algorithm. "Machine Learning", 3(4). p. 261-283. Kluwer.
Cohen W.W. (1995). Fast effective rule induction. In Prieditis A., Russell S. (Eds.) Proceedings of the 12th International Conference on Machine Learning.
Cohen W.W., Singer Y. (1999). A Simple, Fast, and Effective Rule Learner. In Proceedings of Annual Conference of American Association for Artificial Intelligence (p.335-342).
Freund Y., Schapire R. E. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. "Journal of Computer and System Sciences". No 55. p. 119-139.
Friedman J. H., Popescu B. E. (2004). Gradient directed regularization for linear regression and classification. (Technical Report). Dept. of Statistics. Stanford University
Friedman J. H., Popescu B. E. (2005). Predictive learning via rule ensembles. (Technical Report). Dept. of Statistics. Stanford University
Fürnkranz J. (1999). Separate-and-Conquer Rule Learning. Artificial Intelligence Review 13(1).
Gatnar E. (2008). Podejście wielomodelowe w zagadnieniach dyskryminacji i regresji. PWN. Warszawa
Hastie T., Tibshirani R., Friedman J. (2001). The Elements of Statistical Learning: Data Mining, Inferance, and Prediction. Springer. New York
Kohavi R., John G. (1997). Wrappers for feature selection. Artificial Intelligence. 97( 1-2): 273-324.
Kubus M. (2009). Porównanie indukcji reguł z wybranymi metodami dyskryminacji, [in:] K. Jajuga, M. Walesiak (red.). Taksonomia 16, Klasyfikacja i analiza danych - teoria i zastosowania. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu. No 47. p. 367-374.
Michalski R.S. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th International Symposium on Information Processing (FC1P-69). Vol. A3 (Switching Circuits), p. 125-128 Bled. Yugoslavia.
Quinlan J.R. (1993). C4.5 programs for machine learning. Morgan Kaufmann. San Mateo.
Rissanen J. (1978). Modeling by shortest data description. Automatica. 14. p. 465-471.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

11089/696

Identyfikator YADDA

bwmeta1.element.ekon-element-000171193921

Komentarze

Musisz być zalogowany aby pisać komentarze.

Acta Universitatis Lodziensis. Folia Oeconomica

The Influence of Irrelevant Variables on Classification Error in Rules Induction

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane