Återgå till huvudnavigering Återgå till sök Gå direkt till huvudinnehållet

Towards the Corpus of Latvian Romani Texts: Deciphering the Manuscripts in Jānis Leimanis' Archive

  • Natalia Perkova
  • , Kirill Kozhanov

Forskningsoutput: TidskriftsbidragKonferensartikelVetenskapligPeer review

Sammanfattning

Latvian Romani is a Northeastern Romani dialect with a limited number of publicly available sources. Two large archival collections of texts in Latvian Romani, compiled primarily in the 1930s in Latvia and Estonia, have been recently digitized as images and made available online for a wider public. In our study, we focus on one of these collections, the Latvian Romani folklore texts collected by Jānis Leimanis in interwar Latvia. In this paper, we describe how initial manual transcriptions, most of which have been created with the help of a special crowdsourcing platform, were integrated in the handwritten text recognition (HTR) workflow in Transkribus. We present two HTR models trained on the basis of Leimanis' collection and discuss various issues related to the work on these texts.

Originalspråkengelska
TidskriftCEUR Workshop Proceedings
Volym3232
Sidor (från-till)381-389
Antal sidor9
ISSN1613-0073
StatusPublicerad - 2022
MoE-publikationstypA4 Artikel i en konferenspublikation
Evenemang6th Digital Humanities in the Nordic and Baltic Countries Conference, DHNB 2022 - Uppsala, Sverige
Varaktighet: 15 mars 202218 mars 2022

Bibliografisk information

Publisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)

Vetenskapsgrenar

  • 6122 Litteraturforskning
  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här