Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

Jörg Tiedemann, Johanna Nichols, Ronald Sprouse

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.
Originalspråkengelska
Titel på gästpublikationLanguage Technology Resources and Tools for Digital Humanities (LT4DH) : Proceedings of the Workshop
Antal sidor8
UtgivningsortOsaka
Utgivningsdatum1 dec 2016
Sidor148-155
ISBN (elektroniskt)978-4-87974-708-2
StatusPublicerad - 1 dec 2016
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangCOLING Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH) - Osaka, Japan
Varaktighet: 11 dec 201611 dec 2016

Vetenskapsgrenar

  • 6121 Språkvetenskaper

Citera det här