Robust Method of Sparse Feature Selection for Multi-Label Classification with Naive Bayes
The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature selection method that is optimised with the Naive Bayes classifier. The method takes advantage of the sparse feature representation and uses diversified backward-forward greedy search to arrive with the highly competitive solution at the minimum processing time. It promotes the paradigm of shifting the complexity of predictive systems away from the model towards careful data preprocessing and filtering that allows to accomplish predictive big data tasks on a single processor despite billions of data examples nominally exposed for processing. This method was applied to the AAIA Data Mining Competition 2014 concerned with predicting human injuries as a result of fire incidents based on nearly 12000 risk factors extracted from thousands of fire incident reports and scored the second place with the predictive accuracy of 96%.(original abstract)
- Banko M. and Brill E. "Scaling to Very Very Large Corpora for Natural Language Disambiguation," In Proceedings. of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), pp 26-33, 2001.
- Bengio Y. "Learning Deep Architectures for AI," Foundations and Trends in Machine Learning 2(1): 1-127, 2009.
- Davenport T. Big Data at Work: Dispelling the Myths, Uncovering the Opportunities, Harvard Business Review Press, Boston; 2014.
- Diaz-Aviles E., Nejdl W., Drumond L. and Schmidt-Thieme L. "Towards real-time collaborative filtering for big fast data," In Proceedings of the 22nd International Conference on World Wide Web companion 2013, pp 779-780, 2013.
- Fleuret F. and Guyon I. "Fast Binary Feature Selection with Conditional Mutual Information," Journal of Machine Learning Research 5: 1531-1555, 2004
- Franks B. Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, Wiley, Hoboken, NJ; 2012.
- Friedman J. H. "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics 29: 1189-1232, 2000.
- Liu H. and Yu L. "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Transactions on Knowledge and Data Engineering 17(4): 491-502, 2005.
- Mayer-Schonberger V and Cukier K. Big Data: A Revolution That Will Transform How We Live, Work, and Think, John Murray Poblishers, London; 2013.
- Mitchell T. "Generative and discriminative classifiers: naive bayes and logistic regression," in Machine Learning, McGraw Hill, 2010.
- Ratanamahatana C. and Gunopulos D. "Feature Selection for the Naïve Bayes Classifier Using Decision Trees," Applied Artificial Intelligence 17: 475-487, 2003.
- Yu L. and Liu H. "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution," In Proceedings of the 20th International Conference on Machine Learning, pp 856-863, 2003.
- Zhi-Hua Z. Ensemble Methods: Foundations and Algorithms, Chapman & Hall / CRC Press, Boca Raton, FL; 2012.