An OCR system for the Unified Northern Alphabet

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKapitelVetenskapligPeer review

Sammanfattning

This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.
Originalspråkengelska
Titel på gästpublikationProceedings of the fifth Workshop on Computational Linguistics for Uralic Languages
Antal sidor13
FörlagThe Association for Computational Linguistics
Utgivningsdatum2019
Sidor77-89
ISBN (elektroniskt) 978-1-948087-92-6
StatusPublicerad - 2019
MoE-publikationstypA3 Del av bok eller annan forskningsbok
EvenemangInternational Workshop on Computational Linguistics for Uralic Languages
- Tartu, Estland
Varaktighet: 7 jan 20199 jan 2019
Konferensnummer: 5

Vetenskapsgrenar

  • 6121 Språkvetenskaper

Citera det här

Partanen, N., & Rießler, M. (2019). An OCR system for the Unified Northern Alphabet. I Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages (s. 77-89). The Association for Computational Linguistics.
Partanen, Niko ; Rießler, Michael. / An OCR system for the Unified Northern Alphabet. Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, 2019. s. 77-89
@inbook{d8417bb8d5f44dd29c6621b041a5cff3,
title = "An OCR system for the Unified Northern Alphabet",
abstract = "This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98{\%} and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.",
keywords = "6121 Languages",
author = "Niko Partanen and Michael Rie{\ss}ler",
year = "2019",
language = "English",
pages = "77--89",
booktitle = "Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages",
publisher = "The Association for Computational Linguistics",
address = "United States",

}

Partanen, N & Rießler, M 2019, An OCR system for the Unified Northern Alphabet. i Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, s. 77-89, International Workshop on Computational Linguistics for Uralic Languages
, Tartu, Estland, 07/01/2019.

An OCR system for the Unified Northern Alphabet. / Partanen, Niko; Rießler, Michael.

Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, 2019. s. 77-89.

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKapitelVetenskapligPeer review

TY - CHAP

T1 - An OCR system for the Unified Northern Alphabet

AU - Partanen, Niko

AU - Rießler, Michael

PY - 2019

Y1 - 2019

N2 - This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.

AB - This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.

KW - 6121 Languages

M3 - Chapter

SP - 77

EP - 89

BT - Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages

PB - The Association for Computational Linguistics

ER -

Partanen N, Rießler M. An OCR system for the Unified Northern Alphabet. I Proceedings of the fifth Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics. 2019. s. 77-89