New Developments in Data Analysis and Classification

Bock, Hans-Hermann

Artykuł - szczegóły

Czasopismo

Studia i Prace Uniwersytetu Ekonomicznego w Krakowie

2010 | nr 11 | 7--37

Tytuł artykułu

New Developments in Data Analysis and Classification

Autorzy

Hans-Hermann Bock

Warianty tytułu

Języki publikacji

Abstrakty

In this article we concentrate on a few topics and methods in data analysis where new developments and approaches can be illustrated. Essentially we concentrate on methods from discrimination (section 2) and clustering (section 3). In section 4 we describe more recent problems in the classification domain: ensemble methods, two-way clustering, and clustering of time series and point to some new methods in this area. Relevant monographs include Hastie, Tibshirani & Friedman (2001), Gentle, Hardle & Mori (2004), and Izenman (2008). (fragment of text)

W artykule skoncentrowano się na kilku metodach analizy danych, w których zilustrowano nowe podejścia i kierunki rozwoju. Zasadniczo skupiono się na metodach dotyczących dyskryminacji (część 2) i tworzenia danych (część 3). W części 4 opisano aktualne problemy pojawiające się w dziedzinie klasyfikacji danych (klasyfikatory zbiorcze, grupowanie dwukierunkowe, grupowanie szeregów czasowych) oraz wskazano nowe metody stosowane w tej dziedzinie.

Słowa kluczowe

Data analysis Statistical data analysis Data classifications

Analiza danych Analiza danych statystycznych Klasyfikacja danych

Czasopismo

Studia i Prace Uniwersytetu Ekonomicznego w Krakowie

Rocznik

2010

Numer

nr 11

Strony

7--37

Opis fizyczny

Twórcy

autor

Hans-Hermann Bock

RWTH Aachen University

Bibliografia

Aggarwal, C. C., Han, J. and Wang, J. (2003) A Framework for Clustering Evolving Data Streams in Proceedings of 29th Very Large Data Base (VLDB) Conference, Berlin, Germany.
Barthelemy, J.-P. and Guenoche, A. (1988) Trees and Proximity Relations. Chichester: Wiley.
Belanche, L., Vazquez, J. L. and Vazquez, M. (2008) "Distance-based Kernels for Real-valued Data" in C. Preisach et al. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Benzécri, J.-P. (1973) L'analyse des donnees. Vol. 1: La Taxonomie Vol. 2: L'analyse des correspondances. Paris: Dunod.
Biau, G. et al. (2008) "Consistency of Random Forests and Other Averaging Classifiers". Journal Of Machine Learning Research 9.
Bock, H.-H. (1974) Automatische Klassifikation. Theoretische Und Praktische Methoden Zur Gruppierung Und Strukturierung Von Daten (Cluster-Analyse). Göttingen: Vandenhoeck & Ruprecht.
Bock, H.-H. (1979) "1. Clustering by Density Estimation. 2. Simultaneous Clustering of Objects and Variables. 3. Fuzzy Clustering Procedures" in R. Tomassone (ed.) Analyse des données et informatique. Institut de Recherche en Informatique et en Automatique (INRIA), Le Chesnay, France.
Bock, H.-H. (1991) "A Clustering Technique for Maximizing Phi-divergence, Noncentrality and Discriminating Power" in M. Schader (ed.) Analyzing and Modeling Data and Knowledge. Heidelberg: Springer.
Bock H.-H. (1996a) "Probabilistic Models in Partitional Cluster Analysis" in A. Ferligoj and A. Kramberger (eds) Developments in Data Analysis. Fdv, Metodoloski zvezki 12, Ljubljana, Slovenia.
Bock, H.-H. (1996b) "Probabilistic Models in Cluster Analysis". Computational Statistics and Data Analysis 23.
Bock, H.-H. (2003) "Two-way Clustering for Contingency Tables: Maximizing a Dependence Measure" in M. Schader, W. Gaul and M. Vichi (eds) Between Data Science and Applied Data Analysis. Heidelberg: Springer.
Bock, H.-H. (2004) "Convexity-based Clustering Criteria: Theory, Algorithms, and Applications in Statistics". Statistical Methods & Applications 12.
Bock, H.-H (2005) "Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering" in D. Baier, R. Decker, L. Schmidt-Thieme (eds) Data Analysis and Decision Support. Heidelberg: Springer.
Bock, H.-H. (2008) "Origins and Extensions of the k-means Algorithm in Cluster Analysis". Special Issue on "Contribution l'histoire de l'analyse des données". Electronic Journal for History of Probability and Statistics (JEHPS) 4(2), www.emis. de/journals/JEHPS/decembre2008.html.
Bock, H.-H. (2009) "Analyzing Symbolic Data: Problems, Methods, and Perspectives" in A. Okada et al. (eds) Cooperation in Classification and Data Analysis. Heidelberg: Springer.
Bock, H.-H., Diday, E. (2000) Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer-Verlag.
Boets, J. et al. (2005) "Clustering Time Series, Subspace Identification and Cepstral Distances". Communications in Information and Systems 5.
Breiman, L. (2001) "Random Forests". Machine Learning 45.
Bühlmann, P. (2002) "Analyzing Bagging". Annals of Statistics 30.
Bühlmann, P. (2004) "Bagging, Boosting and Ensemble Methods" in J. Gentle, W. Hardle, Y. Mori (eds) Handbook of Computational Statistics. Berlin: Springer.
Celeux, G. (2007) "Mixture Models for Classification" in R. Decker, H.-J. Lenz (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Cuevas, A., Febrero, M. and Fraiman, R. (2001) "Cluster Analysis: A Further Approach Based on Density Estimation". Computational Statistics and Data Analysis 36.
Czogiel, I. et al. (2007) "Localized Linear Discrimination Analysis" in R. Decker, H.-J. Lenz (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Decker, R. and Lenz, H.-J. (eds) (2007) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Devroye, L., Györfi, L. and Lugosi, G. (1996) A Probabilistic Theory of Pattern Recognition. Berlin: Springer.
Dhillon, I. (2001) "Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning" in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). New York: ACM Press.
Diday, E. and Noirhomme, M. (eds) (2008) Symbolic Data Analysis and the SODAS Software. New York: Wiley.
Dietterich,T. G. (2000) "Ensemble Methods in Machine Learning" in Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy.
Douzal Chouakria, A. and Naidu Nagabushan, P. (2007) "Adaptive Dissimilarity Index for Measuring Time Series Proximity". Advances in Data Analysis and Classification 1.
Duda, R. O., Hart, P. E. and Stork, D. G. (2000) Pattern Classification. 2nd edition. New York: Wiley.
Fraley, Ch. and Raftery, A. E. (2002) "Model-based Clustering, Discriminant Analysis and Density Estimation". Journals of American Statistical Association 97(458).
Frame, S. J. and Jammalamadaka, S. R. (2007) "Generalized Mixture Models, Semi- -supervised Learning, and Unknown Class Inference". Advances in Data Analysis and Classification 1.
Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. 2nd edition. New York: Academic Press.
Gatnar, E. (2008) "Fusion of Multiple Statistical Classifiers" in C. Preisach et al. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Gaul, W. and Schader, M. (1988) "Clusterwise Aggregation of Relations". Applied Stochastic Models and Data Analysis 4.
Gentle, J., Hardle, W. and Mori, Y. (eds) (2004) Handbook of Computational Statistics. Berlin: Springer.
Gordon, A. (1999) Classification. Chapman & Hall. Boca Raton: CRC Press.
Gordon, A. D. and Vichi, M. (2001) "Fuzzy Partition Models for Fitting a Set of Partitions". Psychometrika 66.
Groenen, P. J. F., Nalbantov, G. and Bioch, J. C. (2008) "SVM-Maj: a Majorization Approach to Linear Support Vector Machines with Different Hinge Errors". Advances in Data Analysis and Classification 2.
Hastie,T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. New York: Springer.
Hébrail, G. (2007) Statistical Challenges in Data Stream Applications. Presented at the Conference ISI 2007, Lisbon, Portugal.
Hébrail, G. (2008) Summarizing Data Streams by Sampling and Clustering. International Workshop on Data Stream Management and Mining. Beijing.
Horenko, I. (2010a) "Finite Element Approach to Clustering of Multidimensional Time Series". SIAM Journal on Scientific Computing 32(1).
Horenko, I. (2010b) "On Clustering of Non-stationary Meteorological Time Series". Dynamics of Atmospheres and Oceans. DOI 49(2-3). In print. 10.1016/J. Dynatmoce.2009.04.003.
Hornik, K. and Böhm, W. (2008) "Hard and Soft Euclidean Consensus Partitions" in C. Preisach et al. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Izenman, A. J. (2008) Modern Multivariate Statistical Techniques. New York: Springer.
Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall.
Karatzoglou, A. and Feinerer, I. (2007) "Text Clustering with String Kernels in R" in R. Decker, H.-J. Lenz (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Kaufman, L., and Rousseeuw, P. J. (1990) Finding Groups in Data. New York: Wiley.
Lebart, L. (1994) Statistique textuelle. Paris: Dunod.
Lebart, L., Morineau, A. and Piron, M. (1995) Statistique Explratoire Multidimensionelle. Paris: Dunod.
Le Thi, H. A., Le, H. M. and Pham Dinh, T. (2007) "Fuzzy Clustering Based on Nonconvex Optimisation Approaches Using Difference of Convex (DC) Functions Algorithms". Advances in Data Analysis and Classification l.
Lee, J. W. et al. (2005) "An Extensive Comparison of Recent Classification Tools Applied to Microarray Data". Computational Statistics and Data Analysis 48.
Maranzana, F. E. (1963) "On the Location of Supply Points to Minimize Transportation Costs". IBM Systems Journal 2.
McLachlan, G. J. and Krishnan, T. (2008) The EM Algorithm and Extensions. 2nd edition. Hoboken, NJ: Wiley.
McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. New York: Wiley.
Mitchell,T. (1997) Machine Learning. New York: McGraw Hill.
Miyamoto, S., Ichihashi, H. and Honad, K. (2008) Algorithms for Fuzzy Clustering. Berlin-Heidelberg: Springer.
Muthukrishnan, S. (2005) "Data Streams: Algorithms and Applications". Foundations and Trends in Theoretical Computer Science 1(2).
Ng, A., Jordan, M. and Weiss, Y. (2002) "On Spectral Clustering: Analysis and an Algorithm" in T. Dietterich, S. Becker, and Z. Ghahramani (eds) Advances in Neural Information Processing Systems. Vol. 14, MIT Press.
Preisach, C. et al. (eds) (2008) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Ripley, B. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.
Sato-Ilic, M. and Jain, L. C. (2006) Innovations in Fuzzy Clustering. Berlin-Heidelberg: Springer.
Schiffner, J. and Weihs, C. (2008) "Comparison of Local Classification Methods" in C. Preisach et al. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Heidelberg: Springer.
Seewald, A. K. and Kleedorfer, F. (2007) "Lambda Pruning: an Approximation of the String Subsequence Kernel for Practical SVM Classification and Redundancy Clustering". Advances in Data Analysis and Classification 1.
Strehl, A. and Ghosh, J. (2002) "Cluster Ensembles - a Knowledge Reuse Framework for Combining Multiple Partitions". Journal of Machine Learning Research 3.
Tan, P.-N., Steinbach, M. and Kumar, V. (2006) Introduction To Data Mining. New York: Addison-Wesley.
Tukey, J. W. (1977) Exploratory Data Analysis. Reading, MA: Addison-Wesley.
Valentini, G. and Dietterich, T. G. (2002) "Ensembles of Learning Machines" in Neural Nets WIRN Vietri 2002.
Valentini, G. and Dietterich, T.G. (2004) "Bias-variance Analysis of Support Vector Machines for the Development of SYM-based Ensemble Methods". The Journal of Machine Learning Research 5.
Van Cutsem, B. (ed.) (1994) Classification and Dissimilarity Analysis. New York: Springer.
Van Mechelen, L, Bock, H.-H. and De Boeck, P. (2004) "Two-mode Clustering Methods: A Structured Review". Statistical Methods in Medical Research 13.
Van Os, B. J. (2000) Dynamic Programming for Partitioning in Multivariate Data Analysis. Thesis. Leiden University, The Netherlands.
Vichi, M. (2008) "Fitting Semiparametric Clustering Models to Dissimilarity Data". Advances in Data Analysis and Classification 2.
Von Luxburg, U., Belkin, M. and Bousquet, O. (2008) "Consistency of Spectral Clustering". Annals of Statistics 36(2).
Von Luxburg, U., Bousąuet, O. and Belkin, M. (2004) "On the Convergence of Spectral Clustering on Random Samples: the Normalized Case" in J. Shawe-Taylor and Y. Singer (eds) Learning Theory. 17th Annual Conference on Learning Theory, COLT 2004. Banff, Canada, July. Proceedings. New York: Springer.
Weihs, C. et al. (2007) "Classification in Music Research". Advances in Data Analysis and Classification 1.

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000168536998

Komentarze

Musisz być zalogowany aby pisać komentarze.

Studia i Prace Uniwersytetu Ekonomicznego w Krakowie

New Developments in Data Analysis and Classification

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane