An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i.e., translating an ambiguous word with its correct sense. In this respect, previous work has shown that the translation quality of neural machine translation systems can be improved by explicitly modeling the senses of ambiguous words. Recently, several evaluation test sets have been proposed to measure the word sense disambiguation (WSD) capability of machine translation systems. However, to date, these evaluation test sets do not include any training data that would provide a fair setup measuring the sense distributions present within the training data itself. In this paper, we present an evaluation benchmark on WSD for machine translation for 10 language pairs, comprising training data with known sense distributions. Our approach for the construction of the benchmark builds upon the wide-coverage multilingual sense inventory of BabelNet, the multilingual neural parsing pipeline TurkuNLP, and the OPUS collection of translated texts from the web. The test suite is available at
Original languageEnglish
Title of host publicationProceedings of The 12th Language Resources and Evaluation Conference
EditorsNicoletta Calzolari [et al.]
Number of pages8
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Publication date1 May 2020
ISBN (Electronic)979-10-95546-34-4
Publication statusPublished - 1 May 2020
MoE publication typeA4 Article in conference proceedings
EventLanguage Resources and Evaluation Conference - [LREC 2020 was cancelled]
Duration: 11 May 202016 May 2020
Conference number: 12

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this