Learning Morphology of Natural Language as a Finite-state Grammar

Javad Nouri, Roman Yangarber

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

We present algorithms that learn to segment words in morphologically
rich languages, in an unsupervised fashion. Morphology of many languages can
be modeled by finite state machines (FSMs). We start with a baseline MDL-
based learning algorithm. We then formulate well-motivated and general linguistic
principles about morphology, and incorporate them into the algorithm as heuristics,
to constrain the search space. We evaluate the algorithm on three highly-inflecting
languages. Evaluation of segmentation shows gains in performance compared to
the state of the art. We conclude with a discussion about how the learned model
relates to a morphological FSM, which is the ultimate goal.
Originalspråkengelska
Titel på värdpublikationStatistical Language and Speech Processing : 5th International Conference, SLSP 2017, Le Mans, France, October 23-25, 2017, Proceedings
RedaktörerNathalie Camelin, Yannick Estève, Carlos Martín-Vide
UtgivningsortCham
FörlagSpringer International Publishing AG
Utgivningsdatum27 sep. 2017
Sidor44-57
ISBN (tryckt)978-3-319-68455-0
ISBN (elektroniskt)978-3-319-68456-7
DOI
StatusPublicerad - 27 sep. 2017
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Conference on Statistical Language and Speech Processing - Le Mans, Frankrike
Varaktighet: 23 okt. 201725 okt. 2017
Konferensnummer: 5

Publikationsserier

NamnLecture Notes in Artificial Intelligence
FörlagSpringer International Publishing AG
Volym10583
ISSN (tryckt)0302-9743
ISSN (elektroniskt)1611-3349

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här