The MuCoW Test Suite at WMT 2019

Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs. One of the key features of a correct translation is the ability to perform word sense disambiguation (WSD), i.e., to translate an ambiguous word with its correct sense. Existing evaluation benchmarks on WSD capabilities of translation systems rely heavily on manual work and cover only few language pairs and few word types. We present MuCoW, a multilingual contrastive test suite that covers 16 language pairs with more than 200 thousand contrastive sentence pairs, automatically built from word-aligned parallel corpora and the wide-coverage multilingual sense inventory of BabelNet. We evaluate the quality of the ambiguity lexicons and of the resulting test suite on all submissions from 9 language pairs presented in the WMT19 news shared translation task, plus on other 5 language pairs using NMT pretrained models. The MuCoW test suite is available at http://github.com/Helsinki-NLP/MuCoW.
Originalspråkengelska
Titel på gästpublikationFourth Conference on Machine Translation : Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1)
RedaktörerOndřej Bojar, Rajen Chatterjee, Christian Federmann, et al.
Antal sidor11
UtgivningsortStroudsburg
FörlagThe Association for Computational Linguistics
Utgivningsdatum1 aug 2019
Sidor470-480
ISBN (elektroniskt)9781950737277
StatusPublicerad - 1 aug 2019
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangConference on Machine Translation: WMT19 - Florence, Italien
Varaktighet: 1 aug 20192 aug 2019
Konferensnummer: 4

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap
  • 6121 Språkvetenskaper

Citera det här

Raganato, A., Scherrer, Y., & Tiedemann, J. (2019). The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation. I O. Bojar, R. Chatterjee, C. Federmann, & E. A. (Red.), Fourth Conference on Machine Translation : Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1) (s. 470-480). Stroudsburg: The Association for Computational Linguistics.
Raganato, Alessandro ; Scherrer, Yves ; Tiedemann, Jörg. / The MuCoW Test Suite at WMT 2019 : Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation. Fourth Conference on Machine Translation : Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1). redaktör / Ondřej Bojar ; Rajen Chatterjee ; Christian Federmann ; et al. Stroudsburg : The Association for Computational Linguistics, 2019. s. 470-480
@inproceedings{e298e990763b44aca8500f2e77979a8c,
title = "The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation",
abstract = "Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs. One of the key features of a correct translation is the ability to perform word sense disambiguation (WSD), i.e., to translate an ambiguous word with its correct sense. Existing evaluation benchmarks on WSD capabilities of translation systems rely heavily on manual work and cover only few language pairs and few word types. We present MuCoW, a multilingual contrastive test suite that covers 16 language pairs with more than 200 thousand contrastive sentence pairs, automatically built from word-aligned parallel corpora and the wide-coverage multilingual sense inventory of BabelNet. We evaluate the quality of the ambiguity lexicons and of the resulting test suite on all submissions from 9 language pairs presented in the WMT19 news shared translation task, plus on other 5 language pairs using NMT pretrained models. The MuCoW test suite is available at http://github.com/Helsinki-NLP/MuCoW.",
keywords = "113 Computer and information sciences, 6121 Languages",
author = "Alessandro Raganato and Yves Scherrer and J{\"o}rg Tiedemann",
year = "2019",
month = "8",
day = "1",
language = "English",
pages = "470--480",
editor = "Bojar, {Ondřej } and Chatterjee, {Rajen } and Christian Federmann and {et al.}",
booktitle = "Fourth Conference on Machine Translation",
publisher = "The Association for Computational Linguistics",
address = "United States",

}

Raganato, A, Scherrer, Y & Tiedemann, J 2019, The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation. i O Bojar, R Chatterjee, C Federmann & EA (red), Fourth Conference on Machine Translation : Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1). The Association for Computational Linguistics, Stroudsburg, s. 470-480, Conference on Machine Translation, Florence, Italien, 01/08/2019.

The MuCoW Test Suite at WMT 2019 : Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation. / Raganato, Alessandro; Scherrer, Yves; Tiedemann, Jörg.

Fourth Conference on Machine Translation : Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1). red. / Ondřej Bojar; Rajen Chatterjee; Christian Federmann; et al. Stroudsburg : The Association for Computational Linguistics, 2019. s. 470-480.

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

TY - GEN

T1 - The MuCoW Test Suite at WMT 2019

T2 - Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation

AU - Raganato, Alessandro

AU - Scherrer, Yves

AU - Tiedemann, Jörg

PY - 2019/8/1

Y1 - 2019/8/1

N2 - Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs. One of the key features of a correct translation is the ability to perform word sense disambiguation (WSD), i.e., to translate an ambiguous word with its correct sense. Existing evaluation benchmarks on WSD capabilities of translation systems rely heavily on manual work and cover only few language pairs and few word types. We present MuCoW, a multilingual contrastive test suite that covers 16 language pairs with more than 200 thousand contrastive sentence pairs, automatically built from word-aligned parallel corpora and the wide-coverage multilingual sense inventory of BabelNet. We evaluate the quality of the ambiguity lexicons and of the resulting test suite on all submissions from 9 language pairs presented in the WMT19 news shared translation task, plus on other 5 language pairs using NMT pretrained models. The MuCoW test suite is available at http://github.com/Helsinki-NLP/MuCoW.

AB - Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs. One of the key features of a correct translation is the ability to perform word sense disambiguation (WSD), i.e., to translate an ambiguous word with its correct sense. Existing evaluation benchmarks on WSD capabilities of translation systems rely heavily on manual work and cover only few language pairs and few word types. We present MuCoW, a multilingual contrastive test suite that covers 16 language pairs with more than 200 thousand contrastive sentence pairs, automatically built from word-aligned parallel corpora and the wide-coverage multilingual sense inventory of BabelNet. We evaluate the quality of the ambiguity lexicons and of the resulting test suite on all submissions from 9 language pairs presented in the WMT19 news shared translation task, plus on other 5 language pairs using NMT pretrained models. The MuCoW test suite is available at http://github.com/Helsinki-NLP/MuCoW.

KW - 113 Computer and information sciences

KW - 6121 Languages

M3 - Conference contribution

SP - 470

EP - 480

BT - Fourth Conference on Machine Translation

A2 - Bojar, Ondřej

A2 - Chatterjee, Rajen

A2 - Federmann, Christian

A2 - null, et al.

PB - The Association for Computational Linguistics

CY - Stroudsburg

ER -

Raganato A, Scherrer Y, Tiedemann J. The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation. I Bojar O, Chatterjee R, Federmann C, EA, redaktörer, Fourth Conference on Machine Translation : Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1). Stroudsburg: The Association for Computational Linguistics. 2019. s. 470-480