Abstract
Unsupervised learning of morphological segmentation of words in a language, based only on a large corpus of words, is a challenging task. Evaluation of the learned segmentations is a challenge in itself, due to the inherent ambiguity of the segmentation task. There is no way to posit unique “correct” segmentation for a set of data in an objective way. Two models may arrive at different ways of segmenting the data, which may nonetheless both be valid. Several evaluation methods have been proposed to date, but they do not insist on consistency of the evaluated model. We introduce a new evaluation methodology, which enforces correctness of segmentation boundaries while also assuring consistency of segmentation decisions across the corpus.
Original language | English |
---|---|
Title of host publication | LREC 2016, Tenth International Conference on Language Resources and Evaluation |
Editors | Nicoletta Calzolari , Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis |
Number of pages | 9 |
Place of Publication | Paris |
Publisher | European Language Resources Association (ELRA) |
Publication date | 2016 |
Pages | 3102-3109 |
ISBN (Electronic) | 978-2-9517408-9-1 |
Publication status | Published - 2016 |
MoE publication type | A4 Article in conference proceedings |
Event | International Conference on Language Resources and Evaluation - Portorož, Slovenia Duration: 23 May 2016 → 28 May 2016 Conference number: 10 |
Fields of Science
- 113 Computer and information sciences
Projects
-
LLL: Language Learning Lab
Yangarber, R. (Project manager), Katinskaia, A. (Participant), Hou, J. (Participant), Furlan, G. (Participant) & Kylliäinen, I. P. (Participant)
Project: Research project
-
Revita: Language learning and AI
Yangarber, R. (Project manager), Katinskaia, A. (Participant), Hou, J. (Participant), Furlan, G. (Participant) & Kylliäinen, I. P. (Participant)
Project: Research project