On the optimal division of an empirical distribution (and some related problems)

Owsiński, Jan W.

Artykuł - szczegóły

Czasopismo

Przegląd Statystyczny

2012 | numer specjalny 1 | 109--122

Tytuł artykułu

On the optimal division of an empirical distribution (and some related problems)

Autorzy

Jan W. Owsiński

Treść / Zawartość

Pełne teksty:

http://keii.ue.wroc.pl/przeglad/Rok%202012/Zeszyt%20Specjalny%201/2012_spec_1_109-122.pdf [zdalny]

Warianty tytułu

Optymalny podział rozkładu empirycznego (i kilka problemów z tym związanych)

Języki publikacji

Abstrakty

We consider division of an empirical distribution of x_i, i being the index of a unit, for which we observe x_i (e.g., province i, for which x i is the GDP per capita). Values x_i are ordered non-decreasingly. We analyse the cumulative distribution, z_i =Σ_{i' = 1,..., i} x_i. The sequence z_i is convex. We want to divide the distribution of z i into subsets of i, with the shape of the distribution {z_i} possibly well approximated by the segments of the straight line, determined for the subsets, forming a piecewise linear contour, the number of segments being possibly small. This corresponds to the frequently used categorisations for similar distributions (e.g., "developed", "developing",... countries). For such categorisations, usually no formal methods are applied but "substantive" prerequisites, or the methods applied are limited to establishing quantiles of the distribution, without considering its shape and the objective premises for determination of a different number of segments, including optimisation of the criterion mentioned before. A general approach is proposed for optimising division of such distribution conform to the criterion mentioned. A general objective function is proposed and its concrete realisation, as well as algorithms. The methodology proposed allows for obtaining the optimum divisions into categories for arbitrary distributions. Yet, on the basis of concrete empirical distributions, problems are outlined, due to the fact that the distributions obtained often display the features, leading to questioning of the foundations of the methodology proposed, and of the very sense of such categorisations. Examples of distributions of this kind, and consequences for the potential categorisations, are discussed. In summary, the methodology proposed, including the criterion function, constitutes a basis for the categorisation with respect to the cumulative distribution, and a tool for evaluating the rationality of the way, in which the distributions are obtained (original abstract)

Praca zajmuje się podziałem empirycznego rozkładu wielkości x_i, gdzie i jest indeksem jednostki, dla której obserwujemy tę wielkość (np. x_i to PKB na mieszkańca w kraju i-tym). Wartości x_i uporządkowano niemalejąco. Analizujemy dystrybuantę rozkładu, tj. wartości z_i = Σ_{i' = 1,..., i} x_i, które tworzą ciąg wypukły. Chcemy otrzymać taki podział dystrybuanty na podzbiory, by przybliżyć kształt rozkładu {z_i} z możliwie małym błędem przy pomocy odcinków linii prostej, odpowiadających podzbiorom, a zarazem - by tych odcinków było możliwie mało. Odpowiada to kategoryzacji podobnych rozkładów (np. kraje "rozwinięte", "rozwijające się", ...), gdzie zwykle nie stosuje się metod statystycznych, tylko przesłanki "merytoryczne", bądź stosowanie metod statystycznych ogranicza się do ustalenia, np., kwantyli rozkładu, bez uwzględniania kształtu i innych przesłanek dla rozwiązania, optymalizującego wspomniane kryterium. Zaproponowano ogólną metodykę optymalizacji podziału takich rozkładów w duchu wspomnianego kryterium, funkcję celu i jej konkretną realizację, wraz z algorytmami. Na podstawie przykładów konkretnych rozkładów, zarysowano także problemy, wynikające z faktu, że rozkłady empiryczne mają często charakter, stawiający pod znakiem zapytania podstawy przyjętej metodyki i w ogóle sens podobnych zadań. Przeanalizowano możliwe pochodzenie tych rozkładów oraz skutki dla ewentualnej kategoryzacji. Zaproponowana metodyka daje podstawy do kategoryzacji empirycznych dystrybuant i narzędzie do oceny racjonalności sposobu ich otrzymywania. (abstrakt oryginalny)

Słowa kluczowe

Mathematical statistics Statistical feature distribution Optimalization Mathematical programming

Statystyka matematyczna Rozkład cech statystycznych Optymalizacja Programowanie matematyczne

Czasopismo

Przegląd Statystyczny

Rocznik

2012

Numer

numer specjalny 1

Strony

109--122

Opis fizyczny

Twórcy

autor

Jan W. Owsiński

Instytut Badań Systemowych PAN, Warszawa

Bibliografia

[1] Gafner Th., (1991), Mathematical programming approach to classification, Ph. D. dissertation, Institute of Statistics, Faculty of Economics and Business, University of Neuchatel.
[2] Gan G., Ma Ch., Wu J., (2007), Data Clustering, Theory, Algorithms and Applications, SIAM & ASA, Philadelphia.
[3] QOL, (2005), http://www.economist.com/media/pdf/QUALITY OF LIFE.PDF, The Economist Intelligence Unit's quality-of-life index (as seen on September 25th, 2012).
[4] QOL, (2007), http://www.il-ireland.com/il/qofl07/2007 Quality of Life Index (as seen on September 25th, 2012).
[5] Nielsen L., (2011), Classification of Countries Based on Their Level of Development: How it is Done and How it Could be Done, IMF Working Paper, WP/11/31, IMF.
[6] Owsiński J.W., (1990), On a new naturally indexed quick clustering method with a global objective function, Applied Stochastic Models and Data Analysis, 6, 157-171.
[7] Owsiński J.W., (2011), The bi-partial approach in clustering and ordering: the model and the algorithms, Statistica & Applicazioni, Special Issue, 43-59.
[8] Owsiński J.W., (2012), On dividing an empirical distribution into optimal segments, SIS (Italian Statistical Society) Scientific Meeting, Rome, June 2012, http://meetings.sis- statistica.org/index.php/sm/sm2012/paper/viewFile/2368/229

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000171230537

Komentarze

Musisz być zalogowany aby pisać komentarze.

Przegląd Statystyczny

On the optimal division of an empirical distribution (and some related problems)

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane