Towards an open-source universal-dependency treebank for Erzya

Jack Michael Rueter, Francis M. Tyers

Tutkimustuotos: KonferenssimateriaalitKonferenssiesitysvertaisarvioitu

Abstrakti

This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.
Alkuperäiskielienglanti
Sivumäärä13
TilaJulkaistu - 2018
TapahtumaInternational Workshop for Computational Linguistics of Uralic Languages - University of Helsinki, Department of Modern Languages, Helsinki, Suomi
Kesto: 8 tammik. 20189 tammik. 2018
Konferenssinumero: 4
http://blogs.helsinki.fi/language-technology/iwclul-2018/

Työpaja

TyöpajaInternational Workshop for Computational Linguistics of Uralic Languages
LyhennettäIWCLUL
Maa/AlueSuomi
KaupunkiHelsinki
Ajanjakso08/01/201809/01/2018
www-osoite

Tieteenalat

  • 6121 Kielitieteet

Siteeraa tätä