An OCR system for the Unified Northern Alphabet

Niko Partanen, Michael Rießler

Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review


This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions.
Original languageEnglish
Title of host publicationProceedings of the fifth Workshop on Computational Linguistics for Uralic Languages
Number of pages13
PublisherThe Association for Computational Linguistics
Publication date2019
ISBN (Electronic) 978-1-948087-92-6
Publication statusPublished - 2019
Externally publishedYes
MoE publication typeA3 Book chapter
EventInternational Workshop on Computational Linguistics for Uralic Languages
- Tartu, Estonia
Duration: 7 Jan 20199 Jan 2019
Conference number: 5

Fields of Science

  • 6121 Languages

Cite this