Analysing concatenation approaches to document-level NMT in two different domains

Yves Scherrer, Jörg Tiedemann, Sharid Loáiciga

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems. We describe two popular datasets covering news and movie subtitles and we provide a thorough analysis of the distribution of various document-level features in their domains. Furthermore, we train a set of context-aware MT models on both datasets and propose a comparative evaluation scheme that contrasts coherent context with artificially scrambled documents and absent context, arguing that the impact of discourse-aware MT models will become visible in this way. Our results show that the models are indeed affected by the manipulation of the test data, providing a different view on document-level translation quality than absolute sentence-level scores.
Original languageEnglish
Title of host publicationThe Fourth Workshop on Discourse in Machine Translation : Proceedings of the Workshop
Number of pages11
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication date1 Nov 2019
Pages51-61
ISBN (Electronic)978-1-950737-74-1
DOIs
Publication statusPublished - 1 Nov 2019
MoE publication typeA4 Article in conference proceedings
EventWorkshop on Discourse in Machine Translation - Hong Kong, China
Duration: 3 Nov 20193 Nov 2019
Conference number: 4

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this

Scherrer, Y., Tiedemann, J., & Loáiciga, S. (2019). Analysing concatenation approaches to document-level NMT in two different domains. In The Fourth Workshop on Discourse in Machine Translation: Proceedings of the Workshop (pp. 51-61). Stroudsburg: The Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-6506