Projects per year
Abstract
The texts are scrambled at the paragraph level.
This new version contains the literature found in the older instance and has grown markedly. While the old version was merely text divided to sentence level, the new version has lemmatization and dependencies. At sentence level contextual translation (English or Finnish translation) may be present, while at word level there is morphological encoding, corresponding to each context. Preliminary morpho-syntactic analysis is carried out using HFST-based transducers and Constraint Grammar disambiguation, function and dependency tagging, which have been developed in the Giellatekno infrastructure of the University of Tromsø.
The grammatical analysis and labeling comply with the practices developed in the Giellatekno infrastructure of the University of Tromsø. These practices are applied in the documentation of several Uralic languages.
Amount of processed material: more than 2.8 million words.
The amount of the processed material is to be increased subsequently. Future versions will strive to improve upon the morphological disambiguation of the corpus texts, the constraint-grammar assignment of functions, and the conversion from CG output to UD-type dependencies.
Original language | English |
---|---|
Place of Publication | Helsinki |
Publisher | Kielipankki |
Media of output | internet |
Size | 289 735 sentences |
Publication status | Published - Mar 2023 |
MoE publication type | I2 ICT software |
Fields of Science
- 6121 Languages
- Erzya language
- Moksha language
- finite-state morphology
- HFST
- GiellaLT
- Native literature
- constraint grammar
- Uralic languages
-
Erzya-Moksha shallow-transfer machine translation for measurement of language diversity
Rueter, J. (Project manager), Erina, O. (Project manager) & Kabaeva, N. (Project manager)
26/06/2019 → …
Project: Research project
-
Experimental Treebanking for the Minority Moksha Language and Finite-State Descriptions
Rueter, J. (Project manager), Levina, M. (Participant) & Kabaeva, N. (Participant)
07/12/2018 → …
Project: Other project
-
Experimental Treebanking for Minority Languages with Finite-State Descriptions
Rueter, J. (Project manager), Tyers, F. M. (Participant), Klementeva, J. (Participant) & Erina, O. (Project manager)
01/10/2017 → …
Project: Other project
Activities
-
Acta Linguistica Academica (Journal)
Rueter, J. (Reviewer)
Nov 2024 → Dec 2024Activity: Publication peer-review and editorial work types › Peer review of manuscripts
-
Verbs of ingestion in Erzya, the ablative object?
Rueter, J. (Speaker)
Aug 2022Activity: Talk or presentation types › Oral presentation
File -
University of Turku, Department of Finnish and Finno-Ugric Languages
Rueter, J. (Visiting researcher)
1 Aug 2021 → 31 Jul 2022Activity: Visiting an external institution types › Academic visit to other institution