Aktiviteter per år
Sammanfattning
This paper presents two systems for spelling correction formulated as a sequence
labeling task. One of the systems is an unstructured classifier and the other
one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.
labeling task. One of the systems is an unstructured classifier and the other
one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | The 54th Annual Meeting of the Association for Computational Linguistics : Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata |
Antal sidor | 9 |
Utgivningsort | Stroudsburg, PA |
Förlag | ACL |
Utgivningsdatum | 12 aug. 2016 |
Sidor | 51-59 |
ISBN (tryckt) | 978-1-945626-13-5, 1-945626-13-5 |
DOI | |
Status | Publicerad - 12 aug. 2016 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | Annual Meeting of the Association for Computational Linguistics - Berlin, Tyskland Varaktighet: 12 okt. 2016 → 12 okt. 2016 Konferensnummer: 54 |
Vetenskapsgrenar
- 6121 Språkvetenskaper
- 113 Data- och informationsvetenskap
Utrustning
-
CLARIN - Finländska språkresurser i gemensamt bruk
Linden, K. (Chef)
Avdelningen för digital humanioraUtrustning/facilitet: Coordination office
Aktiviteter
- 1 Handledare eller bihandledare av doktorsavhandling
-
Supervisor of doctoral thesis in Language Technology / Miikka Silfverberg
Linden, B. K. J. (Handledare) & Yli-Jyrä, A. (Handledare)
1 jan. 2010 → 22 okt. 2016Aktivitet: Examinationstyper › Handledare eller bihandledare av doktorsavhandling