Warianty tytułu
Języki publikacji
Abstrakty
When faced with missing data in a statistical survey or administrative sources, imputation is frequently used in order to fill the gaps and reduce the major part of bias that can affect aggregated estimates as a consequence of these gaps. This paper presents research on the efficiency of model-based imputation in business statistics, where the explanatory variable is a complex measure constructed by taxonomic methods. The proposed approach involves selecting explanatory variables that fit best in terms of variation and correlation from a set of possible explanatory variables for imputed information, and then replacing them with a single complex measure (meta-feature) exploiting their whole informational potential. This meta-feature is constructed as a function of a median distance of given objects from the benchmark of development. A simulation study and empirical study were used to verify the efficiency of the proposed approach. The paper also presents five types of similar techniques: ratio imputation, regression imputation, regression imputation with iteration, predictive mean matching and the propensity score method. The second study presented in the paper involved a simulation of missing data using IT business data from the California State University in Los Angeles, USA. The results show that models with a strong dependence on functional form assumptions can be improved by using a complex measure to summarize the predictor variables rather than the variables themselves (raw or normalized). (original abstract)
Twórcy
autor
- Statistical Office in Poznan, Poland
Bibliografia
- ALLISON, P. D., (2000). Multiple Imputation for Missing Data: A Cautionary Tale, Sociological Methods and Research, Vol. 28, pp. 301-309.
- ANDRIDGE, R. R. and LITTLE, R. J. A., (2010). A Review of Hot Deck Imputation of Survey Non-response, International Statistical Review, Vol. 70, pp. 40-64.
- ARCARO, C. and YUNG, W., (2001). Variance estimation in the presence of imputation, SSC Annual Meeting, Proceedings of the Survey Method Section, pp. 75-80.
- CHAUVET, G., DEVILLE, J.-C. and HAZIZA, D., (2011). On Balanced Random Imputation in Surveys, Biometrika, Vol. 98, pp. 459-471.
- DE WAAL, T., PANNEKOEK, J. and SCHOLTUS, S. (2011). Handbook of Statistical Data Editing and Imputation, Wiley Handbooks in Survey Methodology, John Wiley & Sons, Inc., Hoboken, New Jersey.
- DUROCHER, S. and KICKPATRICK, D., (2009). The projection median of a set of points, Computational Geometry, Vol. 42, pp. 364-375.
- HORTON, N. J. and LIPSITZ, S. R., (2001). Multiple Imputation in Practice: Comparison of Software Packages for Regression Models with Missing Variables, Journal of the American Statistical Association, Vol. 55, pp. 244-254.
- HUNDEPOOL, A., DOMINGO-FERRER, J., FRANCONI, L., GIESSING, S., NORDHOLT, E. S., SPICER, K., DE WOLF, P.-P., (2012). Statistical Disclosure Control, Series: Wiley Series in Survey Methodology, John Wiley & Sons, Ltd.
- JOLLIFFE, I. T. (2002). Principle Component Analysis. Second Edition. Springer - Verlag, New York, Berlin, Heidelberg.
- KIM, K., (2000). Variance estimation under regression imputation model, Proceedings of the Survey Research Methods Section, American Statistical Association.
- KIM, J. K., BRICK, M., FULLER, W. A. and KALTON, G., (2006). On the bias of the multiple-imputation variance estimator in survey sampling, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 68, pp. 509-521.
- LAVORI, P. W., DAWSON, R. and SHERA, D., (1995). A Multiple Imputation Strategy for Clinical Trials with Truncation of Patient Data, Statistics in Medicine, Vol. 14, pp. 1913-1925.
- LITTLE, R. J. A. and RUBIN, D. B., (2002). Statistical Analysis with Missing Data. Second Edition, John Wiley & Sons, Inc., New York.
- MALINA, A. and ZELIAŚ, A., (1998). On Building Taxonometric Measures on Living Conditions, Statistics in Transition, Vol. 3, No. 3, pp. 523-544.
- MILASEVIC, P. and DUCHARME, G. R., (1987). Uniqueness of the Spatial Median, The Annals of Statistics, Vol. 15, No. 3, pp. 1332-1333.
- MŁODAK, A., (2014). On the construction of an aggregated measure of the development of interval data, Computational Statistics, Vol. 29, pp. 895-929.
- MŁODAK, A., (2006). Multilateral normalisations of diagnostic features, Statistics in Transition, vol. 7, pp. 1125-1139.
- NETER, J., WASSERMAN, W. and KUTNER, M. H., (1985). Applied Linear Statistical Models: Regression, Analysis of Variance and Experimental Designs, 2nd edition, Homewood, IL: Richard D. Irwin, Inc., U.S.A.
- PAMPAKA, M., HUTCHESON, G. and WILLIAMS, J., (2016). Handling missing data: analysis of a challenging data set using multiple imputation, International Journal of Research & Method in Education, vol. 39, No. 1, pp. 19-37.
- ROUSSEEUW, P. J. and LEROY, A. M., (1987). Robust Regression and Outlier Detection, ed. by John Wiley & Sons, New York.
- RUBIN, D. B., (1987). Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons, New York.
- SÄRNDAL, C. E. (1992). Methods for estimating the precision of survey estimates when imputation has been used, Survey Methodology, vol. 18, pp. 241-252.
- SCHAFER, J. L., (1997). Analysis of Incomplete Multivariate Data, New York: Chapman and Hall.
- TIBSHIRANI, R., (1996). Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society, Series B (Methodological), Vol. 58, No. 1, pp. 267-288.
- VANDEV, D. L., (2002). Computing of Trimmed L1 - Median, Laboratory of Computer Stochastics, Institute of Mathematics, Bulgarian Academy of Sciences, (preprint), available at http://www.fmi.uni-sofia.bg/fmi/statist/Personal/Vandev/papers/aspap.pdf .
- YUAN, Y. C., (2010). Multiple Imputation for Missing Data: Concepts and New Development (Version 9.0), SAS Institute Inc, Rockville, MD, U.S.A.
- ZELIAŚ, A., (20042). Some Notes on the Selection of Normalization of Diagnostic Variables, Statistics in Transition, vol. 5, No. 5, pp. 787-802.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.ekon-element-000171634014