An OCR system for the Unified Northern Alphabet

Niko Partanen, Michael Rießler

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKapitelVetenskapligPeer review

Sammanfattning

This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.
Originalspråkengelska
Titel på gästpublikationProceedings of the fifth Workshop on Computational Linguistics for Uralic Languages
Antal sidor13
FörlagThe Association for Computational Linguistics
Utgivningsdatum2019
Sidor77-89
ISBN (elektroniskt) 978-1-948087-92-6
StatusPublicerad - 2019
Externt publiceradJa
MoE-publikationstypA3 Del av bok eller annan forskningsbok
EvenemangInternational Workshop on Computational Linguistics for Uralic Languages
- Tartu, Estland
Varaktighet: 7 jan 20199 jan 2019
Konferensnummer: 5

Vetenskapsgrenar

  • 6121 Språkvetenskaper

Citera det här