Czasopismo
Tytuł artykułu
Warianty tytułu
Języki publikacji
Abstrakty
We investigate whether language models used in automatic speech recognition (ASR) should be trained on speech transcripts rather than on written texts. By calculating log-likelihood statistic for part-of-speech (POS) n-grams, we show that there are significant differences between written texts and speech transcripts. We also test the performance of language models trained on speech transcripts and written texts in ASR and show that using the former results in greater word error reduction rates (WERR), even if the model is trained on much smaller corpora. For our experiments we used the manually labeled one million subcorpus of the National Corpus of Polish and an HTK acoustic model. (author's abstract)
Słowa kluczowe
Rocznik
Tom
Strony
193--197
Opis fizyczny
Twórcy
autor
- AGH University of Science and Technology Kraków, Poland
autor
- AGH University of Science and Technology Kraków, Poland
- AGH University of Science and Technology Kraków, Poland
autor
- AGH University of Science and Technology Kraków, Poland
Bibliografia
- Bardoel, T. "Comparing n-gram frequency distributions". Tilburg University School of Humanities. Tilburg center for Cognition and Communication. 2012.
- Bengio, Yoshua, Ducharme, Réjean, Vincent, Pascal, Jauvin, Christian. "A neural probabilistic language model". Journal of Machine Learning Research. vol. 3. pp. 1137-1155. 2003.
- Biber, Douglas. "Variation across speech and writing". Cambridge University Press. 1991.
- Chelba Ciprian, Bikel Dan, Shugrina Maria, Nguyen Patrick, Kumar Shankar. "Large scale language modelling in automatic speech recognition.". Google Research. 2012.
- Hirsimaki, T., Pylkkonen, J., Kurimo, M., "Importance of high-order n-gram models in morph-based speech recognition". IEEE Trans. Speech and Language Processing. 17(4):724-32. 2009. http://dx.doi.org/10.1109/TASL.2008.2012323
- Janicki, A., Wawer, D., "Automatic Speech Recognition of Polish in a Computer Game Interface", Proceedings of the Federated Conference on Computer Science and Information System 2011, pp. 711-716. 2011.
- Jurafsky, D., Martin, J. H. "Speech and language processing. 2nd edition". Prentice-Hall. Inc. New Jersey. 2008.
- Karpov, A., Ronzhin, A., Markov, K., Kipyatkova, I., Vazhenina, D. "Large vocabulary Russian speech recognition using syntacticostatistical language modelling". Speech Communication 56 (2014) 213-228. 2014. http://dx.doi.org/ 10.1016/j.specom.2013.07.004
- Kilgarriff, Adam. "Comparing Corpora". International Journal of Corpus Linguistics. 6:1. 97-133. 2001.
- Marciniak, M. "Anotowany korpus dialogów telefonicznych.". Akademicka Oficyna wydawnicza EXIT. 2011.
- Pohl, A., Ziółko, B. "Using part of speech n-grams for improving automatic speech recognition of Polish". 9th International Conference on Machine Learning and Data Mining MLDM. 2013. http://dx.doi.org/10.1007/978-3-642-39712-7_38
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.ekon-element-000171419402