Symulacyjna optymalizacja wyboru procedury klasyfikacyjnej dla danego typu danych - oprogramowanie komputerowe i wyniki badań

Walesiak, Marek; Dudek, Andrzej

Artykuł - szczegóły

Czasopismo

Prace Naukowe Akademii Ekonomicznej we Wrocławiu. Taksonomia

2006 | 13 | nr 1126 Klasyfikacja i analiza danych - teoria i zastosowania | 120--129

Tytuł artykułu

Symulacyjna optymalizacja wyboru procedury klasyfikacyjnej dla danego typu danych - oprogramowanie komputerowe i wyniki badań

Autorzy

Marek Walesiak , Andrzej Dudek

Warianty tytułu

Determination of Optimal Clustering Procedure for a Data Set - Computer Program and Empirical Results

Języki publikacji

Abstrakty

W artykule krótko scharakteryzowano dziewięć ścieżek w symulacyjnej optymalizacji wyboru procedury klasyfikacyjnej dla danego typu danych, (zaproponowanych w pracy Walesiaka i Dudka [2005]). Następnie zaprezentowano podstawowe funkcje programu komputerowego clusterSim, służącego realizacji wyodrębnionych ścieżek, oraz wybrane wyniki obliczeń symulacyjnych przy wzrastającej liczbie obiektów i zmiennych w macierzy danych. Wszystkie procedury opracowano w języku R oraz pomocniczo w języku C++. (fragment tekstu)

In typical cluster analysis study eight major steps are distinguished. Four of them represent the critical steps: decisions concerning variable normalisation formula, selection of a distance measure, selection of clustering method, determining the number of clusters.
The article presents:
a) determination of optimal clustering procedure for a data set by varying all combinations of normalization formulas, distance measures, and clustering methods. Nine paths of simulation was separated depends on variable scale of measurement in a data set;
b) clusterSim computer program written in R and C++ languages;
c) some empirical results of simulation study based on data matrix with growing number of objects and variables. (original abstract)

Słowa kluczowe

Oprogramowanie komputerowe Symulacja

Computer software Simulation

Czasopismo

Prace Naukowe Akademii Ekonomicznej we Wrocławiu. Taksonomia

Rocznik

2006

Tom

Numer

nr 1126 Klasyfikacja i analiza danych - teoria i zastosowania

Strony

120--129

Opis fizyczny

Twórcy

autor

Marek Walesiak

Akademia Ekonomiczna we Wrocławiu

autor

Andrzej Dudek

Akademia Ekonomiczna we Wrocławiu

Bibliografia

Baker F.B., Hubert L.J. (1975), Measuring the Power of Hierarchical Cluster Analysis, "Journal of the American Statistical Association" vol. 70, nr 349, s. 31-38.
Galiński R.B., Harabasz J. (1974), A Dendrite Method for Cluster Analysis, "Communications in Statistics" vol. 3, s. 1-27.
Dudoit S., Fridlyand J. (2002), A Prediction-Based Resampling Method for Estimating the Number of Clusters in A Dataset, "Genome Biology" vol. 3, nr 7, s. 1-20.
Everitt B.S., Landau S., Leese M. (2001), Cluster Analysis, Edward Arnold, London.
Gatnar E., Walesiak M. (red.) (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych, AE, Wrocław.
Gordon A.D. (1999), Classification, Chapman and Hall/CRC, London.
Hubert L.J. (1974), Approximate Evaluation Technique for the Single-Link and Complete-Link Hierarchical Clustering Procedures, "Journal of the American Statistical Association" vol. 69, nr 347, s. 698-704.
Hubert L.J., Levine J.R. (1976), Evaluating Object Set Partitions: Free Sort Analysis and Some Generalizations, "Journal of Verbal Learning and Verbal Behaviour" vol. 15, s. 549-570.
Kaufman L., Rousseeuw P.J. (1990), Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York.
Krzanowski W.J., Lai Y.T. (1985), A Criterion for Determining the Number of Groups in A Data Set Using Sum of Squares Clustering, "Biometrics" nr 44, s. 23-34.
Milligan G.W. (1996), Clustering Validation: Results and Implications for Applied Analyses, [w:] P. Arabie, L.J. Hubert, G. De Soete (red.), Clustering and Classification, World Scientific, Singapore, s. 341-375.
Milligan G.W., Cooper M.C. (1985), An Examination of Procedures for Determining the Number of Clusters in A Data Set, "Psychometrika" nr 2, s. 159-179.
Mufti G.B., Bertrand P., El Moubarki L. (2005), Determining the Number of Groups from Measures of Cluster Stability, [w:] J. Janssen, P. Lenca (red.), Applied Stochastic Models and Data Analysis, ENST Bretagne, Brest, s. 404-413.
Rousseeuw P.J. (1987), Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis, "Journal of Computational and Applied Mathematics" nr 20, s. 53-65.
Sugar C.A., James G.H. (2003), Finding the Number of Clusters in a Dataset: an Information-Theoretic Approach, "Journal of the American Statistical Association" vol. 98, nr 463, s. 750-763.
Tibshirani R., Walther G., Hastie T. (2001), Estimating the Number of Clusters in A Data Set Via the Gap Statistic, "Journal of the Royal Statistical Society", ser. B, vol. 63, cz. 2, s. 411-423.
Walesiak M. (2002), Uogólniona miara odległości w statystycznej analizie wielowymiarowej, AE, Wrocław.
Walesiak M. (2005), Rekomendacje w zakresie strategii postępowania w procesie klasyfikacji zbioru obiektów, [w:] A. Zeliaś (red.), Przestrzenno-czasowe modelowanie i prognozowanie zjawisk gospodarczych, AE, Kraków, s. 185-203.
Walesiak M., Dudek А. (2005), Symulacyjna optymalizacja wyboru procedury klasyfikacyjnej dla danego typu danych - charakterystyka problemu, "Zeszyty Naukowe Uniwersytetu Szczecińskiego" (w druku).

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000171558330

Komentarze

Musisz być zalogowany aby pisać komentarze.

Prace Naukowe Akademii Ekonomicznej we Wrocławiu. Taksonomia

Symulacyjna optymalizacja wyboru procedury klasyfikacyjnej dla danego typu danych - oprogramowanie komputerowe i wyniki badań

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane