Dozens of Translation Directions or Millions of Shared Parameters? Comparing Two Types of Multilinguality in Modular Machine Translation

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

There are several ways of implementing multilingual NLP systems but little consensus as to whether different approaches exhibit similar effects. Are the trends that we observe when adding more languages the same as those we observe when sharing more parameters? We focus on encoder representations drawn from modular multilingual machine translation systems in an English-centric scenario, and study their quality from multiple aspects: how adequate they are for machine translation, how independent of the source language they are, and what semantic information they convey. Adding translation directions in English-centric scenarios does not conclusively lead to an increase in translation quality. Shared layers increase performance on zero-shot translation pairs and lead to more language-independent representations, but these improvements do not systematically align with more semantically accurate representations, from a monolingual standpoint.
Alkuperäiskielienglanti
OtsikkoProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
ToimittajatTanel Alumäe , Mark Fishel
Sivumäärä10
JulkaisupaikkaTartu
KustantajaUniversity of Tartu Library
Julkaisupäivätoukok. 2023
Sivut238–247
ISBN (elektroninen)978-9916-21-999-7
TilaJulkaistu - toukok. 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaNordic Conference on Computational Linguistics - Tórshavn, Färsaaret
Kesto: 22 toukok. 202324 toukok. 2023
Konferenssinumero: 24

Julkaisusarja

NimiNEALT Proceedings Series Publisher name
KustantajaUniversity of Tartu Library
Numero52
ISSN (painettu)1736-8197
ISSN (elektroninen)1736-6305

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä