Towards an open-source universal-dependency treebank for Erzya

Jack Michael Rueter, Francis M. Tyers

Forskningsoutput: KonferensbidragKonferenspapper

Sammanfattning

This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.
Originalspråkengelska
Antal sidor13
StatusPublicerad - 2018
EvenemangInternational Workshop for Computational Linguistics of Uralic Languages - University of Helsinki, Department of Modern Languages, Helsinki, Finland
Varaktighet: 8 jan 20189 jan 2018
Konferensnummer: 4
http://blogs.helsinki.fi/language-technology/iwclul-2018/

Workshop

WorkshopInternational Workshop for Computational Linguistics of Uralic Languages
Förkortad titelIWCLUL
LandFinland
OrtHelsinki
Period08/01/201809/01/2018
Internetadress

Vetenskapsgrenar

  • 6121 Språkvetenskaper

Citera det här

Rueter, J. M., & Tyers, F. M. (2018). Towards an open-source universal-dependency treebank for Erzya. Artikel presenterad vid International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.
Rueter, Jack Michael ; Tyers, Francis M. / Towards an open-source universal-dependency treebank for Erzya. Artikel presenterad vid International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.13 s.
@conference{be657b287b984b5ab3cf8d67eab23d4c,
title = "Towards an open-source universal-dependency treebank for Erzya",
abstract = "This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.",
keywords = "6121 Languages",
author = "Rueter, {Jack Michael} and Tyers, {Francis M.}",
year = "2018",
language = "English",
note = "International Workshop for Computational Linguistics of Uralic Languages, IWCLUL ; Conference date: 08-01-2018 Through 09-01-2018",
url = "http://blogs.helsinki.fi/language-technology/iwclul-2018/",

}

Rueter, JM & Tyers, FM 2018, 'Towards an open-source universal-dependency treebank for Erzya', Artikel presenterad vid International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland, 08/01/2018 - 09/01/2018.

Towards an open-source universal-dependency treebank for Erzya. / Rueter, Jack Michael; Tyers, Francis M.

2018. Artikel presenterad vid International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.

Forskningsoutput: KonferensbidragKonferenspapper

TY - CONF

T1 - Towards an open-source universal-dependency treebank for Erzya

AU - Rueter, Jack Michael

AU - Tyers, Francis M.

PY - 2018

Y1 - 2018

N2 - This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.

AB - This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.

KW - 6121 Languages

M3 - Paper

ER -

Rueter JM, Tyers FM. Towards an open-source universal-dependency treebank for Erzya. 2018. Artikel presenterad vid International Workshop for Computational Linguistics of Uralic Languages, Helsinki, Finland.