Projekt per år
Sammanfattning
We present lemmatization experiments on the unstandardized low-resourced languages Low Saxon and Occitan using two machine-learning-based approaches represented by MaChAmp and Stanza. We show different ways to increase training data by leveraging historical corpora, small amounts of gold data and dictionary information, and discuss the usefulness of this additional data. In the results, we find some differences in the performance of the models depending on the language. This variation is likely to be partly due to differences in the corpora we used, such as the amount of internal variation. However, we also observe common tendencies, for instance that sequential models trained only on gold-annotated data often yield the best overall performance and generalize better to unknown tokens.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) : Proceedings of the Workshop |
Redaktörer | Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri |
Antal sidor | 11 |
Utgivningsort | Stroudsburg |
Förlag | The Association for Computational Linguistics |
Utgivningsdatum | 5 maj 2023 |
Sidor | 163-173 |
ISBN (elektroniskt) | 978-1-959429-50-0 |
DOI | |
Status | Publicerad - 5 maj 2023 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | Workshop on NLP for Similar Languages, Varieties and Dialects - Dubrovnik, Kroatien Varaktighet: 5 maj 2023 → 6 maj 2023 Konferensnummer: 10 https://sites.google.com/view/vardial-2023 |
Vetenskapsgrenar
- 6121 Språkvetenskaper
- 113 Data- och informationsvetenskap
Projekt
- 1 Aktiv
-
CorCoDial: CorCoDial - Corpus-based computational dialectology: exploiting machine translation techniques to extract, visualize and interpret dialectal patterns
Scherrer, Y., Tiedemann, J., Kuparinen, O. V., Miletic Haddad, A., Siewert, J. & Siewert, J.
Suomen Akatemia Projektilaskutus
01/09/2021 → 31/08/2025
Projekt: Finlands Akademi: Akademiprojektsbidrag