Development of the Algorithm of Polish Language Film Reviews Preprocessing

Rizun, Nina; Taranenko, Yurii

Artykuł - szczegóły

Czasopismo

Rocznik Naukowy Wydziału Zarządzania w Ciechanowie

2017 | 11 | nr 1-4 | 167--188

Tytuł artykułu

Development of the Algorithm of Polish Language Film Reviews Preprocessing

Autorzy

Nina Rizun , Yurii Taranenko

Warianty tytułu

Opracowanie akgorytmu wstępnego przetwarzania tekstów recenzji filmów w języku polskim

Języki publikacji

Abstrakty

The algorithm and the software for conducting the procedure of Preprocess- ing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokeni- zation; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in (original abstract)

Opracowano algorytm i oprogramowanie do przeprowadzania procedury wstępnego przetwarzania recenzji filmów w języku polskim. Algorytm zawiera następujące kroki: procedura adaptacji tekstu; procedura Tokenizacji; procedura przekształcania słów w format bajtów; tagowanie części mowy; procedura Stemmingu / lematyzacji; prezentacja dokumentów w formacie wektorowym (Vector Space Fodel); procedura tworzenia bazy danych modelów dokumentów. Przeprowadzono eksperymenty z zaproponowanym algorytmem na próbce testowej analizy recenzji filmów i sformułowano główne wnioski.(abstrakt oryginalny)

Słowa kluczowe

Film industry Review Translator Text analysis

Przemysł filmowy Recenzja Tłumacze zawodowi Analiza tekstu

Czasopismo

Rocznik Naukowy Wydziału Zarządzania w Ciechanowie

Rocznik

2017

Tom

Numer

nr 1-4

Strony

167--188

Opis fizyczny

Twórcy

autor

Nina Rizun

Gdansk University of Technology, Poland

autor

Yurii Taranenko

Alfred Nobel University, Dnipropetrovsk

Bibliografia

Vanyushkin A. S., Grashchenkov L.A. (2016); Methods and algorithms extracted key- words. New information technologies for automated. N° 19.
Rizun N., Kapłanski P & Taranenko Y. (2016); TheMethod of a Two-Level Text-Meaning Similarity Approximation of the Customers' Opinions. Czasopismo "Studia Ekonomiczne - Zeszyty Naukowe". Uniwersytet Ekonomiczny w Katowicach. 296, pp.64-85.
Rizun N., Kapłanski P & Taranenko Y. (2016); Development and Research of the Text Messages Semantic Clustering Methodology, The Third European Network Intelli- gence Conference (ENIC 2016). Proceedings. DOI: 10.1109/ENIC.2016.33. In book: 2016 Third European Network Intelligence Conference, Publisher: ENIC.2016.33, pp.180-187.
Kapłanski P, Rizun N., Taranenko Y. & Seganti A. (2016); Text-mining Similarity Approximation Operators for Opinion Mining in BI tools. Proceeding of the 11th Sci- entific Congerence "Internet in the Information Society-2016", Publisher: University of Dąbrowa Górnicza, Editors: Maciej Rostancki, Piotr Pikiewicz, Krystian Mączka, Paweł Buchwald, pp.121-141.
Feinerer, I., Hornik, K. & Meyer, D. (2008); Text mining infrastructure in: "R Journal of statistical software." 25(5). American Statistical Association.
Segalovich I. (2003); A fast-morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. MLMTA-2003.
Koreniu T, Laurikkala Y, Jarvelin K. & Juhola M. (2004); Stemming and Lemmatiza- tion in the Clustering of Finnish Text Documents. CIKM'04, November 8-13, Washington, DC, USA.
Alkula, R. (2001) From plain character strings to meaningful words: Producing better full text databases for inflectional and compounding languages with morphological analysis software. "Information Retrieval" N° 4, pp.195-208.
Weiss D. & Stempelator A. (2013); Hybrid Stemmer for the Polish Language.
Lewandowska-Tomaszczyk B., James Melia P (1997) PALC'97: Practical Applications in Language Corpora, pages 496-505, Łódź University Press.
Hajnicz, E. & Kupść, A. (2001); Przegląd analizatorów morfologicznych dla języka polskiego. IPI PAN Research Report 937, Institute of Computer Science, Polish Academy of Sciences, Warsaw.
Vetulani, Z. & Obrębski, T. (1997); Morphological tagging of texts using the lemmatizer of the 'POLEX' electronic dictionary. In: Lewandowska-Tomaszczyk, B. & Melia P J. (Eds.) Practical Applications in Language Corpora, Proceedings, University Press, pp. 496-505.
Obrębski, T. & Stolarski, M. (2006); UAM text tools - a flexible NLP architecture. In Proceedings of the Fifth International Conference on Language Resources and Eva- luation, LREC 2006, pages 2259-2262, Genoa. ELRA
Miłkowski, M. (2010); Developing an open-source, rule-based proofreading tool. Software: "Practice and Experience". 40(7): pp. 543-566.
Wolinski, M, Miłkowski, M., Ogrodniczuk, M., Przepiórkowski, A. & Szałkiewicz, L. (2010) PoliMorf: a (not so) new open morphological dictionary for Polish.
Przepiórkowski A. & Wolinski, M. (2003); The unbearable lightness of tagging: A case study in morphosyntactic tagging of Polish. In Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora. EACL 2003, pp. 109- 116.
Radziszewski A. & Maziarz M. (2011); Developing free morphological data for Polish, "Cognitive Studies / Etudes Cognitives" (lista ERIH), 11.
Rizun N., Taranenko Y. & Waloszek, W. (2017); The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models. Knowledge Engineering and Semantic Web. Knowledge Engineering and Semantic Web, Publisher: Proceedings of the 8th International Conference (KESW 2017), pp.53- 68. DOI: 10.1007/978-3-319-69548-8 5

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000171546437

Komentarze

Musisz być zalogowany aby pisać komentarze.

Rocznik Naukowy Wydziału Zarządzania w Ciechanowie

Development of the Algorithm of Polish Language Film Reviews Preprocessing

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane