Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

Jörg Tiedemann, Johanna Nichols, Ronald Sprouse

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.
Original languageEnglish
Title of host publicationLanguage Technology Resources and Tools for Digital Humanities (LT4DH) : Proceedings of the Workshop
Number of pages8
Place of PublicationOsaka
Publication date1 Dec 2016
Pages148-155
ISBN (Electronic)978-4-87974-708-2
Publication statusPublished - 1 Dec 2016
MoE publication typeA4 Article in conference proceedings
EventCOLING Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH) - Osaka, Japan
Duration: 11 Dec 201611 Dec 2016

Fields of Science

  • 6121 Languages

Cite this

Tiedemann, J., Nichols, J., & Sprouse, R. (2016). Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. In Language Technology Resources and Tools for Digital Humanities (LT4DH): Proceedings of the Workshop (pp. 148-155). Osaka.
Tiedemann, Jörg ; Nichols, Johanna ; Sprouse, Ronald. / Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. Language Technology Resources and Tools for Digital Humanities (LT4DH): Proceedings of the Workshop. Osaka, 2016. pp. 148-155
@inproceedings{83134ad5fbee4b7e9cb974032c523162,
title = "Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work",
abstract = "This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.",
keywords = "6121 Languages",
author = "J{\"o}rg Tiedemann and Johanna Nichols and Ronald Sprouse",
note = "Volume: Proceeding volume:",
year = "2016",
month = "12",
day = "1",
language = "English",
pages = "148--155",
booktitle = "Language Technology Resources and Tools for Digital Humanities (LT4DH)",

}

Tiedemann, J, Nichols, J & Sprouse, R 2016, Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. in Language Technology Resources and Tools for Digital Humanities (LT4DH): Proceedings of the Workshop. Osaka, pp. 148-155, COLING Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), Osaka, Japan, 11/12/2016.

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. / Tiedemann, Jörg; Nichols, Johanna; Sprouse, Ronald.

Language Technology Resources and Tools for Digital Humanities (LT4DH): Proceedings of the Workshop. Osaka, 2016. p. 148-155.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

AU - Tiedemann, Jörg

AU - Nichols, Johanna

AU - Sprouse, Ronald

N1 - Volume: Proceeding volume:

PY - 2016/12/1

Y1 - 2016/12/1

N2 - This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.

AB - This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.

KW - 6121 Languages

M3 - Conference contribution

SP - 148

EP - 155

BT - Language Technology Resources and Tools for Digital Humanities (LT4DH)

CY - Osaka

ER -

Tiedemann J, Nichols J, Sprouse R. Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. In Language Technology Resources and Tools for Digital Humanities (LT4DH): Proceedings of the Workshop. Osaka. 2016. p. 148-155