Some Remarks on the Data Imputation Using "missForest" Method

Misztal, Małgorzata

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2013 | 285 Multivariate Statistical Analysis Theory and Practice | 169--179

Tytuł artykułu

Some Remarks on the Data Imputation Using "missForest" Method

Autorzy

Małgorzata Misztal

Warianty tytułu

Kilka uwag o imputacji danych z wykorzystaniem metody "missForest"

Języki publikacji

Abstrakty

Missing data are quite common in practical applications of statistical methods and imputation is a general statistical method for the analysis of incomplete data sets. Stekhoven and Btihlmann (2012) proposed an iterative imputation method (called "missForest") based on Random Forests (Breiman 2001) to cope with missing values. In the paper a short description of "missForest" is presented and some selected missing data techniques are compared with "missForest" by artificially simulating different proportions and mechanisms of missing data using complete data sets from the UCI repository of machine learning databases. (original abstract)

W pracy Stekhovena i Buhlmanna (2012) zaproponowano nową iteracyjną metodę imputacji (nazwaną "missForest") opartą na metodzie Random Forests Breimana (2001). W niniejszym artykule omówiono metodę "missForest" i porównano kilka wybranych technik postępowania w sytuacji występowania braków danych z metodą "missForest". W tym celu wykorzystano podejście symulacyjne generując różne proporcje i mechanizmy powstawania braków danych w zbiorach danych pochodzących głównie z repozytorium baz danych na Uniwersytecie Kalifornijskim w Irvine. (abstrakt oryginalny)

Słowa kluczowe

Data processing Data analysis Machine learning

Przetwarzanie danych Analiza danych Uczenie maszynowe

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2013

Tom

285 Multivariate Statistical Analysis Theory and Practice

Strony

169--179

Opis fizyczny

Twórcy

autor

Małgorzata Misztal

University of Lodz, Poland

Bibliografia

Allison P. D. (2002), Missing data, Series: Quantitative Applications in the Social Sciences 07-136, SAGE Publications, Thousand Oaks, London, New Delhi.
Blake C, Keogh E., Merz C. J. (1988), UCI Repository of Machine Learning Datasets, Department of Information and Computer Science, University of California, Irvine.
Breiman, L. (2001), Random Forests, "Machine learning" 45(1): 5-32.
Little R. J. A., Rubin D. B. (2002), Statistical Analysis with Missing Data, Second Edition, Wiley, New Jersey.
Oba S., Sato M., Takemasa I., Monden M., Matsubara K., Ishii S. (2003), A Bayesian Missing Value Estimation Methodfor Gene Expression Profile Data, "Bioinformatics" 19(16): 2088-2096.
Stadler N., Buhlmann P. (2010), Pattern Alternating Maximization Algorithm for High-Dimensional Missing Data, Arxiv preprint arXiv.1005.0366.
Stekhoven D. J., Buhlmann P. (2012), MissForest - Nonparametric Missing Value Imputation for Mixed-Type Data, "Bioinformatics" 28(1): 112-118.
Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R. (2001), Missing Value Estimation Methods for DNA Microarrays, "Bioinformatics" 17(6): 520-525.
van Buuren S., Groothuis-Oudshoorn K. (2011), MICE: Multivariate Imputation by Chained Equations in R, "Journal of Statistical Software", 45(3): 1-67.

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000171258221

Komentarze

Musisz być zalogowany aby pisać komentarze.

Acta Universitatis Lodziensis. Folia Oeconomica

Some Remarks on the Data Imputation Using "missForest" Method

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane