Sammanfattning
We present algorithms that learn to segment words in morphologically
rich languages, in an unsupervised fashion. Morphology of many languages can
be modeled by finite state machines (FSMs). We start with a baseline MDL-
based learning algorithm. We then formulate well-motivated and general linguistic
principles about morphology, and incorporate them into the algorithm as heuristics,
to constrain the search space. We evaluate the algorithm on three highly-inflecting
languages. Evaluation of segmentation shows gains in performance compared to
the state of the art. We conclude with a discussion about how the learned model
relates to a morphological FSM, which is the ultimate goal.
rich languages, in an unsupervised fashion. Morphology of many languages can
be modeled by finite state machines (FSMs). We start with a baseline MDL-
based learning algorithm. We then formulate well-motivated and general linguistic
principles about morphology, and incorporate them into the algorithm as heuristics,
to constrain the search space. We evaluate the algorithm on three highly-inflecting
languages. Evaluation of segmentation shows gains in performance compared to
the state of the art. We conclude with a discussion about how the learned model
relates to a morphological FSM, which is the ultimate goal.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | Statistical Language and Speech Processing : 5th International Conference, SLSP 2017, Le Mans, France, October 23-25, 2017, Proceedings |
Redaktörer | Nathalie Camelin, Yannick Estève, Carlos Martín-Vide |
Utgivningsort | Cham |
Förlag | Springer International Publishing AG |
Utgivningsdatum | 27 sep. 2017 |
Sidor | 44-57 |
ISBN (tryckt) | 978-3-319-68455-0 |
ISBN (elektroniskt) | 978-3-319-68456-7 |
DOI | |
Status | Publicerad - 27 sep. 2017 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | International Conference on Statistical Language and Speech Processing - Le Mans, Frankrike Varaktighet: 23 okt. 2017 → 25 okt. 2017 Konferensnummer: 5 |
Publikationsserier
Namn | Lecture Notes in Artificial Intelligence |
---|---|
Förlag | Springer International Publishing AG |
Volym | 10583 |
ISSN (tryckt) | 0302-9743 |
ISSN (elektroniskt) | 1611-3349 |
Vetenskapsgrenar
- 113 Data- och informationsvetenskap
Projekt
-
Revita: Language learning and AI
Yangarber, R., Katinskaia, A., Hou, J., Furlan, G. & Kylliäinen, I. P.
Projekt: Forskningsprojekt
-
LLL: Language Learning Lab
Yangarber, R., Katinskaia, A., Hou, J., Furlan, G. & Kylliäinen, I. P.
Projekt: Forskningsprojekt