Designing and evaluating Russian tagset

Mikhail Kopotev, Serge Sharoff, Tomaz Erjavec, Anna Feldman, Dagmar Divjak

    Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

    Sammanfattning

    This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset is based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 500 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set that can be shared with other researchers.
    Originalspråkengelska
    Titel på gästpublikationLREC 2008 : the Language Resources and Evaluation Conference
    Antal sidor6
    Utgivningsdatum2008
    Sidor279-285
    StatusPublicerad - 2008
    MoE-publikationstypA4 Artikel i en konferenspublikation

    Citera det här

    Kopotev, M., Sharoff, S., Erjavec, T., Feldman, A., & Divjak, D. (2008). Designing and evaluating Russian tagset. I LREC 2008: the Language Resources and Evaluation Conference (s. 279-285)