Data-Driven Spelling Correction using Weighted Finite-State Methods

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper presents two systems for spelling correction formulated as a sequence
labeling task. One of the systems is an unstructured classifier and the other
one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.
Original languageEnglish
Title of host publicationThe 54th Annual Meeting of the Association for Computational Linguistics : Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata
Number of pages9
Place of PublicationStroudsburg, PA
PublisherACL
Publication date12 Aug 2016
Pages51-59
ISBN (Print)978-1-945626-13-5, 1-945626-13-5
Publication statusPublished - 12 Aug 2016
MoE publication typeA4 Article in conference proceedings
EventAnnual Meeting of the Association for Computational Linguistics - Berlin, Germany
Duration: 12 Oct 201612 Oct 2016
Conference number: 54

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this

Silfverberg, M., Kauppinen, P., & Linden, K. (2016). Data-Driven Spelling Correction using Weighted Finite-State Methods. In The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata (pp. 51-59). Stroudsburg, PA: ACL.
Silfverberg, Miikka ; Kauppinen, Pekka ; Linden, Krister. / Data-Driven Spelling Correction using Weighted Finite-State Methods. The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata. Stroudsburg, PA : ACL, 2016. pp. 51-59
@inproceedings{2eedf63901c74f73816d6330b18c7ba8,
title = "Data-Driven Spelling Correction using Weighted Finite-State Methods",
abstract = "This paper presents two systems for spelling correction formulated as a sequencelabeling task. One of the systems is an unstructured classifier and the otherone is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.",
keywords = "6121 Languages, 113 Computer and information sciences",
author = "Miikka Silfverberg and Pekka Kauppinen and Krister Linden",
note = "Volume: Proceeding volume:",
year = "2016",
month = "8",
day = "12",
language = "English",
isbn = "978-1-945626-13-5",
pages = "51--59",
booktitle = "The 54th Annual Meeting of the Association for Computational Linguistics",
publisher = "ACL",
address = "United States",

}

Silfverberg, M, Kauppinen, P & Linden, K 2016, Data-Driven Spelling Correction using Weighted Finite-State Methods. in The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata. ACL, Stroudsburg, PA, pp. 51-59, Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 12/10/2016.

Data-Driven Spelling Correction using Weighted Finite-State Methods. / Silfverberg, Miikka; Kauppinen, Pekka; Linden, Krister.

The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata. Stroudsburg, PA : ACL, 2016. p. 51-59.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Data-Driven Spelling Correction using Weighted Finite-State Methods

AU - Silfverberg, Miikka

AU - Kauppinen, Pekka

AU - Linden, Krister

N1 - Volume: Proceeding volume:

PY - 2016/8/12

Y1 - 2016/8/12

N2 - This paper presents two systems for spelling correction formulated as a sequencelabeling task. One of the systems is an unstructured classifier and the otherone is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.

AB - This paper presents two systems for spelling correction formulated as a sequencelabeling task. One of the systems is an unstructured classifier and the otherone is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Eger et al. (2016) even though the system presented in the paper is simpler than AliSeTra because it does not include a model for input segmentation. In addition to experiments on tweet normalization, we present experiments on OCR post-processing using an Early Modern Finnish corpus of OCR processed newspaper text.

KW - 6121 Languages

KW - 113 Computer and information sciences

M3 - Conference contribution

SN - 978-1-945626-13-5

SN - 1-945626-13-5

SP - 51

EP - 59

BT - The 54th Annual Meeting of the Association for Computational Linguistics

PB - ACL

CY - Stroudsburg, PA

ER -

Silfverberg M, Kauppinen P, Linden K. Data-Driven Spelling Correction using Weighted Finite-State Methods. In The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata. Stroudsburg, PA: ACL. 2016. p. 51-59