Semi-automatically Annotated Learner Corpus for Russian

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

We present ReLCo---the Revita Learner Corpus---a new semi-automatically annotated
learner corpus for Russian. The corpus was collected while several hundreds L2 learners
were performing exercises using the Revita language-learning system. All errors were
detected automatically by the system and annotated by type. Part of the corpus was
annotated manually---this part was created for further experiments on automatic
assessment of grammatical correctness. The Learner Corpus provides valuable data for
studying patterns of grammatical errors, experimenting with grammatical error detection
and grammatical error correction, and developing new exercises for language
learners. Automating the collection and annotation makes the process of building the
learner corpus much cheaper and faster, in contrast to the traditional approach of
building learner corpora. We make the data publicly available.
Originalspråkengelska
Titel på värdpublikationProceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022).
FörlagEUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Status!!Accepted/In press - 2022
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangLREC 2022 - Marseille, Frankrike
Varaktighet: 20 juni 202225 juni 2022
https://lrec2022.lrec-conf.org/en/

Citera det här