Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages

Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Many endangered Uralic languages have multilingual machine readable dictionaries saved in an XML format. However, the dictionaries cover translations very inconsistently between language pairs, for instance, the Livonian dictionary has some translations to Finnish, Latvian and Estonian, and the Komi-Zyrian dictionary has some translations to Finnish, English and Russian. We utilize graph-based approaches to augment such dictionaries by predicting new translations to existing and new languages based on different dictionaries for endangered languages and Wiktionaries. Our study focuses on the lexical resources for Komi-Zyrian (kpv), Erzya (myv) and Livonian (liv). We evaluate our approach by human judges fluent in the three endangered languages in question. Based on the evaluation, the method predicted good or acceptable translations 77% of the time. Furthermore, we train a neural prediction model to predict the quality of the automatically predicted translations with an 81% accuracy. The resulting extensions to the dictionaries are made available on the online dictionary platform used by the speakers of these languages.

Alkuperäiskielienglanti
OtsikkoProceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
ToimittajatSarah Moeller, Antonios Anastasopoulos, Antti Arppe, Aditi Chaudhary, Atticus Harrigan, Josh Holden, Jordan Lachler, Alexis Palmer, Shruti Rijhwani, Lane Schwartz
Sivumäärä10
JulkaisupaikkaStroudsburg
KustantajaThe Association for Computational Linguistics
Julkaisupäivä2022
Sivut139-148
ISBN (elektroninen)978-1-955917-30-8
DOI - pysyväislinkit
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaWorkshop on the Use of Computational Methods in the Study of Endangered Languages - Dublin, Irlanti
Kesto: 26 toukok. 202227 toukok. 2022
Konferenssinumero: 5

Lisätietoja

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä