On Erzya and Moksha Corpora and Analyzer Development, ERME-PSLA 1950s

Jack Rueter, Olga Erina, Nadezhda Kabaeva

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper describes materials and annotation facilitation pertinent to the «Erzya-Moksha Electronic Resources and Linguistic Diversity» (EMERALD) project. It addresses work following the construction of finite-state analyzers for the Mordvin languages, the gathering of test corpora, and the development of metadata strategies for descriptive research. In this paper, we provide three descriptors for a set of new Erzya and Moksha research materials at the Language Bank of Finland. The descriptors illustrate (1) a low-annotation subcorpora set of the «Electronic Resources for Moksha and Erzya» (ERME); (2) the state of the open-source analyzers used in their automatic annotation, and (3) the development of metadata documentation for the «EMERALD» project, associated with this endeavor. Outcomes of the article include an introduction to new research materials, an illustration of the state of the Mordvin annotation pipeline, and perspectives for the further enhancement of the annotation pipeline.
Original languageEnglish
Title of host publicationProceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages
EditorsMika Hämäläinen , Flammie Pirinen, Melany Macias, Mario Crespo Avila
Number of pages9
Place of PublicationKerrville
PublisherThe Association for Computational Linguistics
Publication dateDec 2024
Pages67–75
ISBN (Electronic)979-8-89176-128-5
Publication statusPublished - Dec 2024
MoE publication typeA4 Article in conference proceedings
EventInternational Workshop on Computational Linguistics for Uralic Languages - Metropolia University of Applied Science, Helsinki, Finland
Duration: 28 Nov 202429 Nov 2024
Conference number: 9
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=180706

Fields of Science

  • 6121 Languages
  • Erzya language
  • Moksha language
  • Mordvin languages
  • text corpora
  • finite-state morphological analysis

Cite this