Towards an open-source universal-dependency treebank for Erzya

Jack Michael Rueter, Francis M. Tyers

Research output: Conference materialsPaper

Abstract

This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.
Original languageEnglish
Number of pages13
Publication statusPublished - 2018
EventInternational Workshop for Computational Linguistics of Uralic Languages - University of Helsinki, Department of Modern Languages, Helsinki, Finland
Duration: 8 Jan 20189 Jan 2018
Conference number: 4
http://blogs.helsinki.fi/language-technology/iwclul-2018/

Workshop

WorkshopInternational Workshop for Computational Linguistics of Uralic Languages
Abbreviated titleIWCLUL
CountryFinland
CityHelsinki
Period08/01/201809/01/2018
Internet address

Fields of Science

  • 6121 Languages

Cite this

Rueter, J. M., & Tyers, F. M. (2018). Towards an open-source universal-dependency treebank for Erzya. Paper presented at International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.
Rueter, Jack Michael ; Tyers, Francis M. / Towards an open-source universal-dependency treebank for Erzya. Paper presented at International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.13 p.
@conference{be657b287b984b5ab3cf8d67eab23d4c,
title = "Towards an open-source universal-dependency treebank for Erzya",
abstract = "This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.",
keywords = "6121 Languages",
author = "Rueter, {Jack Michael} and Tyers, {Francis M.}",
year = "2018",
language = "English",
note = "International Workshop for Computational Linguistics of Uralic Languages, IWCLUL ; Conference date: 08-01-2018 Through 09-01-2018",
url = "http://blogs.helsinki.fi/language-technology/iwclul-2018/",

}

Rueter, JM & Tyers, FM 2018, 'Towards an open-source universal-dependency treebank for Erzya' Paper presented at International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland, 08/01/2018 - 09/01/2018, .

Towards an open-source universal-dependency treebank for Erzya. / Rueter, Jack Michael; Tyers, Francis M.

2018. Paper presented at International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.

Research output: Conference materialsPaper

TY - CONF

T1 - Towards an open-source universal-dependency treebank for Erzya

AU - Rueter, Jack Michael

AU - Tyers, Francis M.

PY - 2018

Y1 - 2018

N2 - This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.

AB - This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.

KW - 6121 Languages

M3 - Paper

ER -

Rueter JM, Tyers FM. Towards an open-source universal-dependency treebank for Erzya. 2018. Paper presented at International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.