Automatic Indexer for Polish Agricultural Texts
Today, the majority of resources are available in digital forms to acquire information. We have to search through collections of documents. In this paper text indexing which can improve searching is described. Next, indexing tool, the Agrotagger, which is useful for documents in the field of agriculture, is presented. Two available versions of the Agrotagger are tested and discussed. The Agrotagger is useful only for the English language despite the fact that it uses multilingual thesaurus Agrovoc. Because of the Agrotagger is not useful for texts in Polish, it is important to create similar tool appropriate for the Polish language. The problems connected with extensive inflection in languages such as Polish language in the process of indexing were discussed. In the final part of the paper, it is presented design and implementation of a system, based on the Polish language dictionary and the Agrovoc. Additionally some tests of implemented system are discussed. (original abstract)
-  AgroTagger. http://aims.fao.org/agrotagger (access 19.11.2014).
-  AGROVOC, http://aims.fao.org/standards/agrovoc/about/ (access 19.11.2014).
-  Dolamic, L., Savoy, J. (2008) Stemming Approaches for East European Languages. Advances in Multilingual and Multimodal Information Retrieval, Vol. 5152, 37-44.
-  Gupta S., C.D. Manning, (2011) Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers, In Proceedings of the International Joint Conference on Natural Language Processing. http://nlp.stanford.edu/pubs/gupta-manning-ijcnlp11.pdf (access 19.11.2014).
-  Jurafsky, D., Martin J.H. (2009) Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd ed. Prentice-Hall.
-  Karwowski W., (2010) Ontologies and Agricultural Information Management Standards. Information systems in management VI, ed. P. Jałowiecki & A. Orłowski, WULS Press, Warszawa 2010.
-  Lovins, J. (1968) Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11(1-2), 11-31.
-  Manning C.D., (2011) Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? Computational Linguistics and Intelligent Text Processing, 12th International Conference, Proceedings, Part I. Springer LNCS vol. 6608, 171-189.
-  Manning C.D., Raghavan P., Schuetze H. (2008) Introduction to Information Retrieval, Cambridge University Press.
-  Paice C., Husk G., (1990) Another Stemmer, ACM SIGIR Forum 24(3), 56-61.
-  Porter, M. (1980) An algorithm for suffix stripping. Program 14(3), 130-137.
-  Wrzeciono P., Karwowski W. (2013) Automatic Indexing and Creating Semantic Networks for Agricultural Science Papers in the Polish Language, Computer Software and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th Annual, Kyoto.