Abstrakti
We present algorithms that learn to segment words in morphologically
rich languages, in an unsupervised fashion. Morphology of many languages can
be modeled by finite state machines (FSMs). We start with a baseline MDL-
based learning algorithm. We then formulate well-motivated and general linguistic
principles about morphology, and incorporate them into the algorithm as heuristics,
to constrain the search space. We evaluate the algorithm on three highly-inflecting
languages. Evaluation of segmentation shows gains in performance compared to
the state of the art. We conclude with a discussion about how the learned model
relates to a morphological FSM, which is the ultimate goal.
rich languages, in an unsupervised fashion. Morphology of many languages can
be modeled by finite state machines (FSMs). We start with a baseline MDL-
based learning algorithm. We then formulate well-motivated and general linguistic
principles about morphology, and incorporate them into the algorithm as heuristics,
to constrain the search space. We evaluate the algorithm on three highly-inflecting
languages. Evaluation of segmentation shows gains in performance compared to
the state of the art. We conclude with a discussion about how the learned model
relates to a morphological FSM, which is the ultimate goal.
Alkuperäiskieli | englanti |
---|---|
Otsikko | Statistical Language and Speech Processing : 5th International Conference, SLSP 2017, Le Mans, France, October 23-25, 2017, Proceedings |
Toimittajat | Nathalie Camelin, Yannick Estève, Carlos Martín-Vide |
Julkaisupaikka | Cham |
Kustantaja | Springer International Publishing AG |
Julkaisupäivä | 27 syysk. 2017 |
Sivut | 44-57 |
ISBN (painettu) | 978-3-319-68455-0 |
ISBN (elektroninen) | 978-3-319-68456-7 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 27 syysk. 2017 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | International Conference on Statistical Language and Speech Processing - Le Mans, Ranska Kesto: 23 lokak. 2017 → 25 lokak. 2017 Konferenssinumero: 5 |
Julkaisusarja
Nimi | Lecture Notes in Artificial Intelligence |
---|---|
Kustantaja | Springer International Publishing AG |
Vuosikerta | 10583 |
ISSN (painettu) | 0302-9743 |
ISSN (elektroninen) | 1611-3349 |
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet
Projektit
-
LLL: Language Learning Lab
Yangarber, R. (Projektinjohtaja), Katinskaia, A. (Osallistuja), Hou, J. (Osallistuja), Furlan, G. (Osallistuja) & Kylliäinen, I. P. (Osallistuja)
Projekti: Tutkimusprojekti
-
Revita: Language learning and AI
Yangarber, R. (Projektinjohtaja), Katinskaia, A. (Osallistuja), Hou, J. (Osallistuja), Furlan, G. (Osallistuja) & Kylliäinen, I. P. (Osallistuja)
Projekti: Tutkimusprojekti