Web augmentation of language models for continuous speech recognition of SMS text messages

Mathias Johan Philip Creutz, Sami Virpioja, Anna Kovaleva

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Kuvaus

In this paper, we present an efficient query
selection algorithm for the retrieval of web
text data to augment a statistical language
model (LM). The number of retrieved relevant documents is optimized with respect
to the number of queries submitted.
The querying scheme is applied in the domain of SMS text messages. Continuous
speech recognition experiments are conducted on three languages: English, Spanish, and French. The web data is utilized
for augmenting in-domain LMs in general
and for adapting the LMs to a user-specific
vocabulary. Word error rate reductions
of up to 6.6 % (in LM augmentation) and
26.0 % (in LM adaptation) are obtained in
setups, where the size of the web mixture
LM is limited to the size of the baseline
in-domain LM.
Alkuperäiskielienglanti
OtsikkoProceedings of the 12th Conference of the European Chapter of the ACL
Sivumäärä9
JulkaisupaikkaAthens
KustantajaAssociation for Computational Linguistics
Julkaisupäivähuhtikuuta 2009
Sivut157-165
TilaJulkaistu - huhtikuuta 2009
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
Tapahtuma12th Conference of the European Chapter of the ACL (EACL-09) - Athens, Kreikka
Kesto: 30 maaliskuuta 20093 huhtikuuta 2009

Lisätietoja


Volume:
Proceeding volume:

Lainaa tätä

Creutz, M. J. P., Virpioja, S., & Kovaleva, A. (2009). Web augmentation of language models for continuous speech recognition of SMS text messages. teoksessa Proceedings of the 12th Conference of the European Chapter of the ACL (Sivut 157-165). Athens: Association for Computational Linguistics.
Creutz, Mathias Johan Philip ; Virpioja, Sami ; Kovaleva, Anna . / Web augmentation of language models for continuous speech recognition of SMS text messages. Proceedings of the 12th Conference of the European Chapter of the ACL. Athens : Association for Computational Linguistics, 2009. Sivut 157-165
@inproceedings{4f637be49ae345e0801cc7b59b30f34d,
title = "Web augmentation of language models for continuous speech recognition of SMS text messages",
abstract = "In this paper, we present an efficient queryselection algorithm for the retrieval of webtext data to augment a statistical languagemodel (LM). The number of retrieved relevant documents is optimized with respectto the number of queries submitted.The querying scheme is applied in the domain of SMS text messages. Continuousspeech recognition experiments are conducted on three languages: English, Spanish, and French. The web data is utilizedfor augmenting in-domain LMs in generaland for adapting the LMs to a user-specificvocabulary. Word error rate reductionsof up to 6.6 {\%} (in LM augmentation) and26.0 {\%} (in LM adaptation) are obtained insetups, where the size of the web mixtureLM is limited to the size of the baselinein-domain LM.",
author = "Creutz, {Mathias Johan Philip} and Sami Virpioja and Anna Kovaleva",
note = "Volume: Proceeding volume:",
year = "2009",
month = "4",
language = "English",
pages = "157--165",
booktitle = "Proceedings of the 12th Conference of the European Chapter of the ACL",
publisher = "Association for Computational Linguistics",
address = "International",

}

Creutz, MJP, Virpioja, S & Kovaleva, A 2009, Web augmentation of language models for continuous speech recognition of SMS text messages. julkaisussa Proceedings of the 12th Conference of the European Chapter of the ACL. Association for Computational Linguistics, Athens, Sivut 157-165, 12th Conference of the European Chapter of the ACL (EACL-09), Athens, Kreikka, 30/03/2009.

Web augmentation of language models for continuous speech recognition of SMS text messages. / Creutz, Mathias Johan Philip; Virpioja, Sami; Kovaleva, Anna .

Proceedings of the 12th Conference of the European Chapter of the ACL. Athens : Association for Computational Linguistics, 2009. s. 157-165.

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

TY - GEN

T1 - Web augmentation of language models for continuous speech recognition of SMS text messages

AU - Creutz, Mathias Johan Philip

AU - Virpioja, Sami

AU - Kovaleva, Anna

N1 - Volume: Proceeding volume:

PY - 2009/4

Y1 - 2009/4

N2 - In this paper, we present an efficient queryselection algorithm for the retrieval of webtext data to augment a statistical languagemodel (LM). The number of retrieved relevant documents is optimized with respectto the number of queries submitted.The querying scheme is applied in the domain of SMS text messages. Continuousspeech recognition experiments are conducted on three languages: English, Spanish, and French. The web data is utilizedfor augmenting in-domain LMs in generaland for adapting the LMs to a user-specificvocabulary. Word error rate reductionsof up to 6.6 % (in LM augmentation) and26.0 % (in LM adaptation) are obtained insetups, where the size of the web mixtureLM is limited to the size of the baselinein-domain LM.

AB - In this paper, we present an efficient queryselection algorithm for the retrieval of webtext data to augment a statistical languagemodel (LM). The number of retrieved relevant documents is optimized with respectto the number of queries submitted.The querying scheme is applied in the domain of SMS text messages. Continuousspeech recognition experiments are conducted on three languages: English, Spanish, and French. The web data is utilizedfor augmenting in-domain LMs in generaland for adapting the LMs to a user-specificvocabulary. Word error rate reductionsof up to 6.6 % (in LM augmentation) and26.0 % (in LM adaptation) are obtained insetups, where the size of the web mixtureLM is limited to the size of the baselinein-domain LM.

M3 - Conference contribution

SP - 157

EP - 165

BT - Proceedings of the 12th Conference of the European Chapter of the ACL

PB - Association for Computational Linguistics

CY - Athens

ER -

Creutz MJP, Virpioja S, Kovaleva A. Web augmentation of language models for continuous speech recognition of SMS text messages. julkaisussa Proceedings of the 12th Conference of the European Chapter of the ACL. Athens: Association for Computational Linguistics. 2009. s. 157-165