Four Approaches to Low-Resource Multilingual NMT: The Helsinki Submission to the AmericasNLP 2023 Shared Task

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

The Helsinki-NLP team participated in the AmericasNLP 2023 Shared Task with 6 submissions for all 11 language pairs arising from 4 different multilingual systems. We provide a detailed look at the work that went into collecting and preprocessing the data that led to our submissions. We explore various setups for multilingual Neural Machine Translation (NMT), namely knowledge distillation and transfer learning, multilingual NMT including a high-resource language (English), language-specific fine-tuning, and multilingual NMT exclusively using low-resource data. Our multilingual Model B ranks first in 4 out of the 11 language pairs.
Original languageEnglish
Title of host publicationProceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
EditorsManuel Mager, Abteen Ebrahimi, Arturo Oncevay, et al.
Number of pages15
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication date1 Jul 2023
Pages177-191
ISBN (Electronic)978-1-959429-91-3
DOIs
Publication statusPublished - 1 Jul 2023
MoE publication typeA4 Article in conference proceedings
EventWorkshop on Natural Language Processing for indigenous Languages of the Americas - Toronto, Canada
Duration: 14 Jun 202314 Jun 2023
Conference number: 3
https://turing.iimas.unam.mx/americasnlp/2023_workshop.html

Fields of Science

  • 6121 Languages

Cite this