Evaluating Morphological Generalisation in Machine Translation by Distribution-Based Compositionality Assessment

Anssi Moisio, Mathias Creutz, Mikko Kurimo

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Compositional generalisation refers to the ability to understand and generate a potentially infinite number of novel meanings using a finite group of known primitives and a set of rules to combine them. The degree to which artificial neural networks can learn this ability is an open question. Recently, some evaluation methods and benchmarks have been proposed to test compositional generalisation, but not many have focused on the morphological level of language. We propose an application of the previously developed distribution-based compositionality assessment method to assess morphological generalisation in NLP tasks, such as machine translation or paraphrase detection. We demonstrate the use of our method by comparing translation systems with different BPE vocabulary sizes. The evaluation method we propose suggests that small vocabularies help with morphological generalisation in NMT.
Alkuperäiskielienglanti
OtsikkoProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
ToimittajatTanel Alumäe, Mark Fishel
JulkaisupaikkaTartu
KustantajaTartu Ülikool
Julkaisupäivä24 toukok. 2023
ISBN (elektroninen)978-9916-21-999-7
TilaJulkaistu - 24 toukok. 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaNordic Conference on Computational Linguistics - Tórshavn, Färsaaret
Kesto: 22 toukok. 202324 toukok. 2023
Konferenssinumero: 24

Julkaisusarja

NimiNEALT proceedings series
KustantajaUniversity of Tartu Library
Numero52
ISSN (painettu)1736-8197
ISSN (elektroninen)1736-6305

Lisätietoja

Nordic Conference on Computational Linguistics, NoDaLiDa ; Conference date: 22-05-2023 Through 24-05-2023

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä