Feature Selection and the Chessboard Problem

Kubus, Mariusz

doi:11089/14486

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2015 | vol. 1, t. 311 Statistical Analysis in Theory and Practice | 17--25

Tytuł artykułu

Feature Selection and the Chessboard Problem

Autorzy

Mariusz Kubus

Treść / Zawartość

Pełne teksty:

http://repozytorium.uni.lodz.pl:8080/xmlui/bitstream/handle/11089/14486/3-Kubus.pdf?sequence=3&isAllowed=y [zdalny]

Warianty tytułu

Selekcja zmiennych a problem szachownicy

Języki publikacji

Abstrakty

Feature selection methods are usually classified into three groups: filters, wrappers and embedded methods. The second important criterion of their classification is an individual or multivariate approach to evaluation of the feature relevance. The chessboard problem is an illustrative example, where two variables which have no individual influence on the dependent variable can be essential to separate the classes. The classifiers which deal well with such data structure are sensitive to irrelevant variables. The generalization error increases with the number of noisy variables. We discuss the feature selection methods in the context of chessboard-like structure in the data with numerous irrelevant variables. (original abstract)

W artykule podjęto dyskusję nad aspektem przeszukiwania w metodach selekcji zmiennych. Posłużono się znanym z literatury przykładem szachownicy, gdzie zmienne, które indywidualnie nie mają mocy dyskryminacyjnej (mają jednakowe rozkłady w klasach) mogą rozpinać przestrzeń, w której klasy są dobrze separowalne. Uogólniając ten przykład wygenerowano zbiór z trójwymiarową strukturą szachownicy i zmiennymi zakłócającymi, a następnie zweryfikowano metody selekcji zmiennych. Rozważono też możliwość zastosowania analizy skupień jako narzędzia wspomagającego etap dyskryminacji. (abstrakt oryginalny)

Słowa kluczowe

Variables selection Cluster analysis Simulation

Dobór zmiennych Analiza skupień Symulacja

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2015

Numer

vol. 1, t. 311 Statistical Analysis in Theory and Practice

Strony

17--25

Opis fizyczny

Twórcy

autor

Mariusz Kubus

Opole University of Technology

Bibliografia

Blum A.L., Langley P. (1997), Selection of relevant features and examples in machine learning, Artificial Intelligence, v. 97 n. 1-2, p. 245-271.
Caruana R.A., Freitag D. (1994), How useful is relevance? Working Notes of the AAAI Fall Symposium on Relevance (pp. 25-29). New Orleans, LA: AAAI Press.
Forman G. (2003), An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3: 1289-1305.
Gatnar E. (2005), Dobór zmiennych do zagregowanych modeli dyskryminacyjnych, in: Jajuga K., Walesiak M. (Eds.), Taksonomia 12, Klasyfikacja i analiza danych - teoria i zastosowania, Prace Naukowe Akademii Ekonomicznej we Wrocławiu, n. 1076, p.79-85.
Guyon I., Elisseeff A. (2006), An introduction to feature extraction, in I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh (Eds.), Feature Extraction: Foundations and Applications, Springer, New York.
Guyon I., Weston J., Barnhill S., Vapnik V. (2002), Gene Selection for Cancer Classification using Support Vector Machines, Machine Learning, 46: 389-422.
Hall M. (2000), Correlation-based feature selection for discrete and numeric class machine learning, Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, San Francisco.
Hellwig Z. (1969), Problem optymalnego wyboru predykant, ,,Przegląd Statystyczny", n. 3-4.
Jensen D. D., Cohen P. R. (2000), Multiple comparisons in induction algorithms. Machine Learning, 38(3): p. 309-338.
John G.H., Kohavi R., Pfleger P. (1994), Irrelevant features and the subset selection problem. In Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, p. 121-129.
Kira K., Rendell L. A. (1992), The feature selection problem: Traditional methods and a new algorithm. In Proc. AAAI-92, p. 129-134. MIT Press.
Koller D., Sahami M. (1996), Toward optimal feature selection. In 13th International Conference on Machine Learning, p. 284-292.
Kononenko I. (1994), Estimating attributes: Analysis and extensions of RELIEF, In Proceedings European Conference on Machine Learning, p. 171-182.
Ng K. S., Liu H. (2000), Customer retention via data mining. AI Review, 14(6): 569 - 590.
Quinlan J.R., Cameron-Jones R.M. (1995), Oversearching and layered search in empirical learning. In Mellish C. (ed.), Proceedings of the 14th International Joint Conference on Artificial Intelligence, Morgan Kaufman, p. 1019-1024.
Xing E., Jordan M., Karp R. (2001), Feature selection for high-dimensional genomic microarray data. In Proceedings of the Eighteenth International Conference on Machine Learning, p. 601-608.
Yu L., Liu H. (2004), Redundancy based feature selection for microarray data. In Proceedings of the Tenth ACM SIGKDD Conference on Knowledge Discovery and Data Mining, p. 737-742.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

11089/14486

Identyfikator YADDA

bwmeta1.element.ekon-element-000171393117

Komentarze

Musisz być zalogowany aby pisać komentarze.

Acta Universitatis Lodziensis. Folia Oeconomica

Feature Selection and the Chessboard Problem

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane