Towards an open-source universal-dependency treebank for Erzya

Jack Michael Rueter, Francis M. Tyers

Research output: Conference materialsPaperpeer-review


This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.
Original languageEnglish
Number of pages13
Publication statusPublished - 2018
EventInternational Workshop for Computational Linguistics of Uralic Languages - University of Helsinki, Department of Modern Languages, Helsinki, Finland
Duration: 8 Jan 20189 Jan 2018
Conference number: 4


WorkshopInternational Workshop for Computational Linguistics of Uralic Languages
Abbreviated titleIWCLUL
Internet address

Fields of Science

  • 6121 Languages

Cite this