Semi-automatically Annotated Learner Corpus for Russian

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

We present ReLCo— the Revita Learner Corpus—a new semi-automatically annotated learner corpus for Russian. The corpus was collected while several hundreds L2 learners were performing exercises using the Revita language-learning system. All errors were detected automatically by the system and annotated by type. Part of the corpus was annotated manually—this part was created for further experiments on automatic assessment of grammatical correctness. The Learner Corpus provides valuable data for studying patterns of grammatical errors, experimenting with grammatical error detection and grammatical error correction, and developing new exercises for language learners. Automating the collection and annotation makes the process of building the learner corpus much cheaper and faster, in contrast to the traditional approach of building learner corpora. We make the data publicly available.

Alkuperäiskielienglanti
OtsikkoProceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022)
ToimittajatNicoletta Calzolari, Frédéric Béchet, Philippe Blache, et al.
Sivumäärä8
JulkaisupaikkaParis
KustantajaEuropean Language Resources Association (ELRA)
Julkaisupäivä2022
Sivut832-839
ISBN (elektroninen)9791095546726
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaLREC 2022 - Marseille, Ranska
Kesto: 20 kesäk. 202225 kesäk. 2022
Konferenssinumero: 13
https://lrec2022.lrec-conf.org/en/

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä