Sammanfattning
We present, to our knowledge, the first ever published morphological analyser and generator for Sakha, a marginalised language of Siberia. The transducer, developed using HFST, has coverage of solidly above 90%, and high precision. In the development of the analyser, we have expanded linguistic knowledge about Sakha, and developed strategies for complex grammatical patterns. The transducer is already being used in downstream tasks, including computer assisted language learning applications for linguistic maintenance and computational linguistic shared tasks.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | LREC 2022, THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION : LREC 2022 Conference Proceedings |
Antal sidor | 6 |
Förlag | European Languages Resources Association (ELRA) |
Utgivningsdatum | juni 2022 |
Sidor | 5137-5142 |
ISBN (elektroniskt) | 979-10-95546-72-6 |
Status | Publicerad - juni 2022 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | Language Resources and Evaluation Conference - Marseille, Frankrike Varaktighet: 21 juni 2022 → 23 juni 2022 Konferensnummer: 13 https://lrec2022.lrec-conf.org/en/ |
Vetenskapsgrenar
- 113 Data- och informationsvetenskap