Data-Driven Spelling Correction using Weighted Finite-State Methods

Miikka Silfverberg, Pekka Kauppinen, Krister Linden

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

This paper presents two systems for spelling correction formulated as a sequence
labeling task. One of the systems is an unstructured classifier and the other
one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.
Originalspråkengelska
Titel på värdpublikationThe 54th Annual Meeting of the Association for Computational Linguistics : Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata
Antal sidor9
UtgivningsortStroudsburg, PA
FörlagACL
Utgivningsdatum12 aug. 2016
Sidor51-59
ISBN (tryckt)978-1-945626-13-5, 1-945626-13-5
DOI
StatusPublicerad - 12 aug. 2016
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangAnnual Meeting of the Association for Computational Linguistics - Berlin, Tyskland
Varaktighet: 12 okt. 201612 okt. 2016
Konferensnummer: 54

Vetenskapsgrenar

  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här