Transforming Archived Resources with Language Technology: From Manuscripts to Language Documentation

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu


Transcriptions in different languages are a ubiquitous data format in linguistics and in many other fields in the humanities. However, the majority of these resources remain both under-used and under-studied. This may be the case even when the materials have been published in print, but is certainly the case for the majority of unpublished transcriptions. Our paper presents a workflow adapted in the research project Language Documentation Meets Language Technology, which combines text recognition, automatic transliteration and forced alignment into a process which allows us to convert earlier transcribed documents to a structure that is comparable with contemporary language documentation corpora. This has complex practical and methodological considerations.

OtsikkoProceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022)
ToimittajatKarl Berglund, Matti La Mela, Inge Zwart
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaDigital Humanities in the Nordic and Baltic Countries 6th Conference - Uppsala, Ruotsi
Kesto: 15 maalisk. 202218 maalisk. 2022
Konferenssinumero: 6


NimiCEUR Workshop Proceedings
ISSN (elektroninen)1613-0073


Publisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)


  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä