Phonotactics as an Aid in Low Resource Loan Word Detection and Morphological Analysis in Sakha

Petter Mahlum, Sardana Ivanova

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Obtaining information about loan words and irregular morphological patterns can be difficult for low-resource languages. Using Sakha as an example, we show that it is possible to exploit known phonemic regularities such as vowel harmony and consonant distributions to identify loan words and irregular patterns, which can be helpful in rule-based downstream tasks such as parsing and POS-tagging. We evaluate phonemically inspired methods for loanword detection, combined with bigram vowel transition probabilities to inspect irregularities in the morphology of loanwords. We show that both these techniques can be useful for the detection of such patterns. Finally, we inspect the plural suffix -πAp [-LAr] to observe some of the variation in morphology between native and foreign words.

Alkuperäiskielienglanti
OtsikkoProceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)
ToimittajatNikolai Ilinykh, Felix Morger, Dana Dannells, Simon Dobnik, Beata Megyesi, Joakim Nivre
Sivumäärä10
JulkaisupaikkaStroudsburg
KustantajaThe Association for Computational Linguistics
Julkaisupäivä2023
Sivut111-120
ISBN (elektroninen)978-1-959429-73-9
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaWorkshop on Resources and Representations for Under-Resourced Languages and Domains - Torshavn, Färsaaret
Kesto: 22 toukok. 202322 toukok. 2023
Konferenssinumero: 2

Lisätietoja

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä