PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2016 | 5 | nr 2 | 205--214
Tytuł artykułu

A Search of Significant Phrases for Building Topic Models in Text Documents

Treść / Zawartość
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
A huge amount of documents in the digitalized libraries requires efficient methods for exploring contained there information. "Topic modeling" is considered as one of the most effective among them. In spite of commonly used approaches for finding occurrences of single words, in the paper building topic models based on phrases is pondered. We propose a methodology, which enables to create a set of significant word sequences and thus limiting the search area to phrases which contain them. The methodology is evaluated on experiments performed on real text datasets. Obtained results are compared with those received by using LDA algorithm. (original abstract)
Słowa kluczowe
Rocznik
Tom
5
Numer
Strony
205--214
Opis fizyczny
Twórcy
  • Lodz University of Technology, Poland
  • Lodz University of Technology, Poland
Bibliografia
  • [1] Papadimitriou C., Raghavan P., Tamaki H., Vempala S. (2000) Latent Semantic Indexing: A probabilistic analysis, Journal of Computer and System Sciences, Vol. 61 (2), 217-235.
  • [2] Blei D., Ng A., Jordan M. (2003) Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022.
  • [3] Blei D. (2012) Probabilistic topic models, Communications of the ACM, 55 (4), 77-84.
  • [4] Danilevsky M., Wang C., Desai N., Ren X., Guo J., Han J. (2014) Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents, SDM '14.
  • [5] Han J., Pei J., Yin Y., Mao R. (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Min. Knowl. Discov., 8 (1), 53-87.
  • [6] El-Kishky A., Song Y., Wang C., Voss C., Han J. (2014) Scalable Topical Phrase Mining from Text Corpora, Proceedings of the VLDB Endowment, Vol. 8 (3), 305-316.
  • [7] Agrawal R., Srikant R. (1995) Fast algorithms for mining association rules in large databases, In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94, pages 487-499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.
  • [8] Machine Learning for Language Toolkit http://mallet.cs.umass.edu/
  • [9] Hamming R.W. (1950) Error detecting and error correcting codes, The Bell System Technical Journal, Vol. 29 (2).
  • [10] ftp://medir.ohsu.edu/pub/ohsumed
  • [11] http://www.ai.mit.edu/people/jrennie/20Newsgroups/
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.ekon-element-000171431608

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane

Musisz być zalogowany aby pisać komentarze.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.