Econometric modeling of panel data using parallel computing with Apache Spark

Bernardelli, Michał

Artykuł - szczegóły

Czasopismo

Roczniki Kolegium Analiz Ekonomicznych / Szkoła Główna Handlowa

2016 | nr 41 | 198--202

Tytuł artykułu

Econometric modeling of panel data using parallel computing with Apache Spark

Autorzy

Michał Bernardelli

Treść / Zawartość

Pełne teksty:

http://rocznikikae.sgh.waw.pl/p/roczniki_kae_z41_12.pdf [zdalny]

Warianty tytułu

Ekonometryczne modelowanie danych panelowych z wykorzystaniem obliczeń równoległych na Apache Spark

Języki publikacji

Abstrakty

The aim of this article is to provide a method for determining the fixed effects estimators using MapReduce programming model implemented in Apache Spark. From many known algorithms two common approaches were exploited: the within transformation and least squares dummy variables method (LSDV). Efficiency of the computations was demonstrated by solving a specially crafted example for sample data. Based on theoretical analysis and computer experiments it can be stated that Apache Spark is an efficient tool for modeling panel data especially if it comes to Big Data.(original abstract)

Celem artykułu jest przedstawienie sposobu wyznaczania estymatora fixed effects przy użyciu modelu programowania MapReduce zaimplementowanego w Apache Spark. Spośród wielu znanych algorytmów zostały wykorzystane dwa popularne podejścia: transformacja within oraz least squares dummy variables method (LSDV). Efektywność obliczeń wykazano, rozwiązując specjalnie spreparowany przykład dla wygenerowanej losowo próbki danych. Na podstawie analizy teoretycznej i eksperymentów numerycznych można stwierdzić, że Apache Spark jest efektywnym narzędziem do modelowania danych panelowych, zwłaszcza jeśli chodzi o Big Data. (abstrakt oryginalny)

Słowa kluczowe

Econometric modeling Panel data Big Data Algorithms

Modelowanie ekonometryczne Dane panelowe Big Data Algorytmy

Czasopismo

Roczniki Kolegium Analiz Ekonomicznych / Szkoła Główna Handlowa

Rocznik

2016

Numer

nr 41

Strony

198--202

Opis fizyczny

Twórcy

autor

Michał Bernardelli

Warsaw School of Economics, Poland

Bibliografia

Arellano M., Panel Data Econometrics, Oxford University Press, Oxford 2003.
Baltagi B., Econometric Analysis of Panel Data, Wiley, Chippenham 2013.
Cheney E., Kincaid D., Numerical Mathematics and Computing, Cengage Learning, Boston 2007.
Dean J., Ghemawat S., MapReduce: Simplified Data Processing on Large Clusters, "Communication of ACM" 2008, vol. 51, issue 1, pp. 107-113.
Diggle P., Heagerty P., Liang K. Y., Zeger S., Analysis of Longitudinal Data, Oxford University Press, Oxford 2013.
Gardiner J. C., Luo Z., Roman L. A., Fixed effects, random effects and GEE: What are the differences?, "Statistics in Medicine" 2009, vol. 28, pp. 221-239.
Hsiao Ch., Analysis of Panel Data, Cambridge University Press, New York 2003.
Strub M., Cieszewski Ch. J., Base-age invariance properties of two techniques for estimating the parameters of site index models, "Forest Science" 2006, vol. 52 (2), pp. 182-186.
Tait D., Cieszewski Ch. J., Bella I. E., The stand dynamics of lodgepole pine, "Canadian Journal Forest Research" 1986, vol. 18, pp. 1255-1260.
Ullman J. D., Designing good MapReduce algorithms, "XRDS: Crossroads. The ACM Magazine for Students" 2012, vol. 19, pp. 30-34.
Wooldridge J. M., Introductory Econometrics: A Modern Approach, South-Western, Mason 2013.
Doug L., 3D Data Management: Controlling data volume, velocity, and variety, 2001, http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management- Controlling-Data-Volume-Velocity-and-Variety.pdf (retrieved 2016.04.05).
http://hadoop.apache.org (retrieved 2016.04.05).
http://sortbenchmark.org (retrieved 2016.04.05).
http://spark.apache.org (retrieved 2016.04.05).
http://www.numpy.org (retrieved 2016.04.05).
https://www.python.org (retrieved 2016.04.05)

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000171446452

Komentarze

Musisz być zalogowany aby pisać komentarze.

Roczniki Kolegium Analiz Ekonomicznych / Szkoła Główna Handlowa

Econometric modeling of panel data using parallel computing with Apache Spark

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane