Sammanfattning
Latvian Romani is a Northeastern Romani dialect with a limited number of publicly available sources. Two large archival collections of texts in Latvian Romani, compiled primarily in the 1930s in Latvia and Estonia, have been recently digitized as images and made available online for a wider public. In our study, we focus on one of these collections, the Latvian Romani folklore texts collected by Jānis Leimanis in interwar Latvia. In this paper, we describe how initial manual transcriptions, most of which have been created with the help of a special crowdsourcing platform, were integrated in the handwritten text recognition (HTR) workflow in Transkribus. We present two HTR models trained on the basis of Leimanis' collection and discuss various issues related to the work on these texts.
| Originalspråk | engelska |
|---|---|
| Tidskrift | CEUR Workshop Proceedings |
| Volym | 3232 |
| Sidor (från-till) | 381-389 |
| Antal sidor | 9 |
| ISSN | 1613-0073 |
| Status | Publicerad - 2022 |
| MoE-publikationstyp | A4 Artikel i en konferenspublikation |
| Evenemang | 6th Digital Humanities in the Nordic and Baltic Countries Conference, DHNB 2022 - Uppsala, Sverige Varaktighet: 15 mars 2022 → 18 mars 2022 |
Bibliografisk information
Publisher Copyright:© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Vetenskapsgrenar
- 6122 Litteraturforskning
- 6121 Språkvetenskaper
- 113 Data- och informationsvetenskap
Citera det här
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver