Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper.
Original languageEnglish
Title of host publicationProceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
EditorsAnna Rogers, Aleksandr Drozd, Anna Rumshisky, Yoav Goldberg
Number of pages8
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics
Publication date1 Jun 2019
Pages35-42
ISBN (Electronic)978-1-950737-05-5
Publication statusPublished - 1 Jun 2019
MoE publication typeA4 Article in conference proceedings
Event Workshop on Evaluating Vector Space Representations for NLP - Minneapolis, United States
Duration: 6 Jun 20196 Jun 2019
Conference number: 3

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this

Tiedemann, J., & Scherrer, Y. (2019). Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks. In A. Rogers, A. Drozd, A. Rumshisky, & Y. Goldberg (Eds.), Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP (pp. 35-42). Stroudsburg: Association for Computational Linguistics.
Tiedemann, Jörg ; Scherrer, Yves. / Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks. Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP. editor / Anna Rogers ; Aleksandr Drozd ; Anna Rumshisky ; Yoav Goldberg. Stroudsburg : Association for Computational Linguistics, 2019. pp. 35-42
@inproceedings{e29c3705c35b42e384c3fd5d14313429,
title = "Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks",
abstract = "In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper.",
keywords = "113 Computer and information sciences, 6121 Languages",
author = "J{\"o}rg Tiedemann and Yves Scherrer",
year = "2019",
month = "6",
day = "1",
language = "English",
pages = "35--42",
editor = "Anna Rogers and Drozd, {Aleksandr } and Anna Rumshisky and Yoav Goldberg",
booktitle = "Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP",
publisher = "Association for Computational Linguistics",
address = "United States",

}

Tiedemann, J & Scherrer, Y 2019, Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks. in A Rogers, A Drozd, A Rumshisky & Y Goldberg (eds), Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics, Stroudsburg, pp. 35-42, Workshop on Evaluating Vector Space Representations for NLP, Minneapolis, United States, 06/06/2019.

Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks. / Tiedemann, Jörg; Scherrer, Yves.

Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP. ed. / Anna Rogers; Aleksandr Drozd; Anna Rumshisky; Yoav Goldberg. Stroudsburg : Association for Computational Linguistics, 2019. p. 35-42.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

AU - Tiedemann, Jörg

AU - Scherrer, Yves

PY - 2019/6/1

Y1 - 2019/6/1

N2 - In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper.

AB - In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper.

KW - 113 Computer and information sciences

KW - 6121 Languages

M3 - Conference contribution

SP - 35

EP - 42

BT - Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

A2 - Rogers, Anna

A2 - Drozd, Aleksandr

A2 - Rumshisky, Anna

A2 - Goldberg, Yoav

PB - Association for Computational Linguistics

CY - Stroudsburg

ER -

Tiedemann J, Scherrer Y. Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks. In Rogers A, Drozd A, Rumshisky A, Goldberg Y, editors, Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP. Stroudsburg: Association for Computational Linguistics. 2019. p. 35-42