Data-Driven Spelling Correction using Weighted Finite-State Methods

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


This paper presents two systems for spelling correction formulated as a sequence
labeling task. One of the systems is an unstructured classifier and the other
one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.
Original languageEnglish
Title of host publicationThe 54th Annual Meeting of the Association for Computational Linguistics : Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata
Number of pages9
Place of PublicationStroudsburg, PA
Publication date12 Aug 2016
ISBN (Print)978-1-945626-13-5, 1-945626-13-5
Publication statusPublished - 12 Aug 2016
MoE publication typeA4 Article in conference proceedings
EventAnnual Meeting of the Association for Computational Linguistics - Berlin, Germany
Duration: 12 Oct 201612 Oct 2016
Conference number: 54

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this