Towards an open-source universal-dependency treebank for Erzya

Jack Michael Rueter, Francis M. Tyers

Forskningsoutput: KonferensbidragKonferenspapperPeer review

Sammanfattning

This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.
Originalspråkengelska
Antal sidor13
StatusPublicerad - 2018
EvenemangInternational Workshop for Computational Linguistics of Uralic Languages - University of Helsinki, Department of Modern Languages, Helsinki, Finland
Varaktighet: 8 jan 20189 jan 2018
Konferensnummer: 4
http://blogs.helsinki.fi/language-technology/iwclul-2018/

Workshop

WorkshopInternational Workshop for Computational Linguistics of Uralic Languages
Förkortad titelIWCLUL
LandFinland
OrtHelsinki
Period08/01/201809/01/2018
Internetadress

Vetenskapsgrenar

  • 6121 Språkvetenskaper

Citera det här