Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2014 | 2 | 261--268
Tytuł artykułu

Extracting Semantic Prototypes and Factual Information from a Large Scale Corpus Using Variable Size Window Topic Modelling

Warianty tytułu
Języki publikacji
In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.(original abstract)
Opis fizyczny
  • AGH University of Science and Technology Kraków, Poland
  • AGH University of Science and Technology Kraków, Poland
  • Blei D. M., Ng A.Y., and Jordan M. I., "Latent dirichlet allocation," J. Mach. Learn. Res., vol. 3, pp. 993-1022, Mar. 2003. [Online]. Available:
  • Boyd-Graber J., Chang J., Gerrish S., Wang C., and Blei D., "Reading tea leaves: How humans interpret topic models," in Neural Information Processing Systems (NIPS), 2009.
  • Deerwester S., Dumais S. T., Furnas G. W., Landauer T. K., and Harshman R., "Indexing by latent semantic analysis," Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391-407, 1990. doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9. [Online]. Available:<391::AID-ASI1>3.0.CO;2-9
  • Dorosz K. and Korzycki M., "Latent semantic analysis evaluation of conceptual dependency driven focused crawling," in Multimedia Communications, Services and Security, ser. Communications in Computer and Information Science. Springer Berlin Heidelberg, 2012, vol. 287, pp. 77-84. ISBN 978-3-642-30720-1. [Online]. Available:
  • Gatkowska I., Korzycki M., and Lubaszewski W., "Can human association norm evaluate latent semantic analysis?" in Proceedings of the 10th NLPCS Workshop, 2013, pp. 92-104.
  • Griffiths T. L. and Steyvers M., "Finding scientific topics," Proceedings of the National Academy of Sciences, vol. 101, no. Suppl. 1, pp. 5228-5235, April 2004. doi: 10.1073/pnas.0307752101. [Online]. Available:
  • Korzycki M. and Korczyński W., "Does topic modelling reflect semantic prototypes?" in New Research in Multimedia and Internet Systems, ser. Advances in Intelligent Systems and Computing, A. Zgrzywa, K. Choroś, and A. Siemiński, Eds. Springer International Publishing, 2015, vol. 314, pp. 113-122. ISBN 978-3-319-10382-2. [Online]. Available:
  • Landauer T. K., McNamara D. S., Dennis S., and Kintsch W., Eds., Handbook of Latent Semantic Analysis, ser. University of Colorado Institute of Cognitive Science Series. Mahwah, New Jersey, USA:Lawrence Erlbaum Associates, 2007. ISBN 9780805854183
  • Leetaru K., "Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space," First Monday, vol. 16, no. 9, 2011.
  • Lubaszewski W., Dorosz K., and Korzycki M., "System for web information monitoring," in Computer Applications Technology (ICCAT), 2013 International Conference on, Jan 2013. doi: 10.1109/ICCAT.2013.6522053 pp. 1-6. [Online]. Available:
  • Lytinen S. L., "Conceptual dependency and its descendants." Computers and Mathematics with Applications, vol. 23, pp. 51-73, 1992. doi: 10.1016/0898-1221(92)90136-6. [Online]. Available:
  • Minka T. and Lafferty J., "Expectation-propagation for the generative aspect model," in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, ser. UAI'02. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2002. ISBN 1-55860-897-4 pp. 352-359. [Online]. Available:
  • Ortega-Pacheco D., Arias-Trejo N., and Martinez J. B. B., "Latent semantic analysis model as a representation of free-association word norms." In MICAI (Special Sessions). IEEE, 2012. doi: 10.1109/MICAI.2012.13. ISBN 978-1-4673-4731-0 pp. 21-25. [Online]. Available:
  • Rehurek R. and Sojka P., "Software framework for topic modelling with large corpora," in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA, May 2010, pp. 45-50,
  • Rosch E., "Principles of categorization," in Cognition and categorization, E. Rosch and B. Lloyd, Eds. Hillsdale, New Jersey: Erlbaum, 1978, pp. 27-48
  • Schank R. C., "Conceptual dependency: A theory of natural language understanding," Cognitive Psychology, vol. 3, no. 4, pp. pages 532-631, 1972. doi: 10.1016/0010-0285(72)90022-9. [Online]. Available:
  • Steyvers M. and Griffiths T., "Probabilistic topic models," in Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2005. [Online]. Available:
  • Wandmacher T., "How semantic is latent semantic analysis?" in Proceedings of TALN/RECITAL, 2005.
  • Wandmacher T., Ovchinnikova E., and Alexandrov T., "Does latent semantic analysis reflect human associations?" in Proceedings of the Lexical Semantics workshop at ESSLLI'08, 2008.
  • Wettler M., Rapp R., and Sedlmeier P., "Free word associations correspond to contiguities between words in texts." Journal of Quantitative Linguistics, vol. 12, no. 2-3, pp. 111-122, 2005. doi: 10.1080/09296170500172403. [Online]. Available: 1080/09296170500172403
Typ dokumentu
Identyfikator YADDA

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane

Musisz być zalogowany aby pisać komentarze.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.