An OCR system for the Unified Northern Alphabet

Niko Partanen, Michael Rießler

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKirjan luku tai artikkeliTieteellinenvertaisarvioitu

Kuvaus

This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.
Alkuperäiskielienglanti
OtsikkoProceedings of the fifth Workshop on Computational Linguistics for Uralic Languages
Sivumäärä13
KustantajaThe Association for Computational Linguistics
Julkaisupäivä2019
Sivut77-89
ISBN (elektroninen) 978-1-948087-92-6
TilaJulkaistu - 2019
Julkaistu ulkoisestiKyllä
OKM-julkaisutyyppiA3 Kirjan tai muun kokoomateoksen osa
TapahtumaInternational Workshop on Computational Linguistics for Uralic Languages
- Tartu, Viro
Kesto: 7 tammikuuta 20199 tammikuuta 2019
Konferenssinumero: 5

Tieteenalat

  • 6121 Kielitieteet

Lainaa tätä

Partanen, N., & Rießler, M. (2019). An OCR system for the Unified Northern Alphabet. teoksessa Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages (Sivut 77-89). The Association for Computational Linguistics.
Partanen, Niko ; Rießler, Michael. / An OCR system for the Unified Northern Alphabet. Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, 2019. Sivut 77-89
@inbook{d8417bb8d5f44dd29c6621b041a5cff3,
title = "An OCR system for the Unified Northern Alphabet",
abstract = "This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98{\%} and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.",
keywords = "6121 Languages",
author = "Niko Partanen and Michael Rie{\ss}ler",
year = "2019",
language = "English",
pages = "77--89",
booktitle = "Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages",
publisher = "The Association for Computational Linguistics",
address = "United States",

}

Partanen, N & Rießler, M 2019, An OCR system for the Unified Northern Alphabet. julkaisussa Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, Sivut 77-89, International Workshop on Computational Linguistics for Uralic Languages
, Tartu, Viro, 07/01/2019.

An OCR system for the Unified Northern Alphabet. / Partanen, Niko; Rießler, Michael.

Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, 2019. s. 77-89.

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKirjan luku tai artikkeliTieteellinenvertaisarvioitu

TY - CHAP

T1 - An OCR system for the Unified Northern Alphabet

AU - Partanen, Niko

AU - Rießler, Michael

PY - 2019

Y1 - 2019

N2 - This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.

AB - This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.

KW - 6121 Languages

M3 - Chapter

SP - 77

EP - 89

BT - Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages

PB - The Association for Computational Linguistics

ER -

Partanen N, Rießler M. An OCR system for the Unified Northern Alphabet. julkaisussa Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics. 2019. s. 77-89