Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper.
Alkuperäiskielienglanti
OtsikkoProceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
ToimittajatAnna Rogers, Aleksandr Drozd, Anna Rumshisky, Yoav Goldberg
Sivumäärä8
JulkaisupaikkaStroudsburg
KustantajaThe Association for Computational Linguistics
Julkaisupäivä1 kesäk. 2019
Sivut35-42
ISBN (elektroninen)978-1-950737-05-5
TilaJulkaistu - 1 kesäk. 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
Tapahtuma Workshop on Evaluating Vector Space Representations for NLP - Minneapolis, Yhdysvallat (USA)
Kesto: 6 kesäk. 20196 kesäk. 2019
Konferenssinumero: 3

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet
  • 6121 Kielitieteet

Siteeraa tätä