Analysing concatenation approaches to document-level NMT in two different domains

Yves Scherrer, Jörg Tiedemann, Sharid Loáiciga

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems. We describe two popular datasets covering news and movie subtitles and we provide a thorough analysis of the distribution of various document-level features in their domains. Furthermore, we train a set of context-aware MT models on both datasets and propose a comparative evaluation scheme that contrasts coherent context with artificially scrambled documents and absent context, arguing that the impact of discourse-aware MT models will become visible in this way. Our results show that the models are indeed affected by the manipulation of the test data, providing a different view on document-level translation quality than absolute sentence-level scores.
Originalspråkengelska
Titel på gästpublikationThe Fourth Workshop on Discourse in Machine Translation : Proceedings of the Workshop
Antal sidor11
UtgivningsortStroudsburg
FörlagThe Association for Computational Linguistics
Utgivningsdatum1 nov 2019
Sidor51-61
ISBN (elektroniskt)978-1-950737-74-1
DOI
StatusPublicerad - 1 nov 2019
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangWorkshop on Discourse in Machine Translation - Hong Kong, Kina
Varaktighet: 3 nov 20193 nov 2019
Konferensnummer: 4

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap
  • 6121 Språkvetenskaper

Citera det här

Scherrer, Y., Tiedemann, J., & Loáiciga, S. (2019). Analysing concatenation approaches to document-level NMT in two different domains. I The Fourth Workshop on Discourse in Machine Translation: Proceedings of the Workshop (s. 51-61). Stroudsburg: The Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-6506
Scherrer, Yves ; Tiedemann, Jörg ; Loáiciga, Sharid. / Analysing concatenation approaches to document-level NMT in two different domains. The Fourth Workshop on Discourse in Machine Translation: Proceedings of the Workshop. Stroudsburg : The Association for Computational Linguistics, 2019. s. 51-61
@inproceedings{3ba0164606504d9d9e4bb71b91298ab8,
title = "Analysing concatenation approaches to document-level NMT in two different domains",
abstract = "In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems. We describe two popular datasets covering news and movie subtitles and we provide a thorough analysis of the distribution of various document-level features in their domains. Furthermore, we train a set of context-aware MT models on both datasets and propose a comparative evaluation scheme that contrasts coherent context with artificially scrambled documents and absent context, arguing that the impact of discourse-aware MT models will become visible in this way. Our results show that the models are indeed affected by the manipulation of the test data, providing a different view on document-level translation quality than absolute sentence-level scores.",
keywords = "113 Computer and information sciences, 6121 Languages",
author = "Yves Scherrer and J{\"o}rg Tiedemann and Sharid Lo{\'a}iciga",
year = "2019",
month = "11",
day = "1",
doi = "10.18653/v1/D19-6506",
language = "English",
pages = "51--61",
booktitle = "The Fourth Workshop on Discourse in Machine Translation",
publisher = "The Association for Computational Linguistics",
address = "United States",

}

Scherrer, Y, Tiedemann, J & Loáiciga, S 2019, Analysing concatenation approaches to document-level NMT in two different domains. i The Fourth Workshop on Discourse in Machine Translation: Proceedings of the Workshop. The Association for Computational Linguistics, Stroudsburg, s. 51-61, Workshop on Discourse in Machine Translation, Hong Kong, Kina, 03/11/2019. https://doi.org/10.18653/v1/D19-6506

Analysing concatenation approaches to document-level NMT in two different domains. / Scherrer, Yves; Tiedemann, Jörg; Loáiciga, Sharid.

The Fourth Workshop on Discourse in Machine Translation: Proceedings of the Workshop. Stroudsburg : The Association for Computational Linguistics, 2019. s. 51-61.

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

TY - GEN

T1 - Analysing concatenation approaches to document-level NMT in two different domains

AU - Scherrer, Yves

AU - Tiedemann, Jörg

AU - Loáiciga, Sharid

PY - 2019/11/1

Y1 - 2019/11/1

N2 - In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems. We describe two popular datasets covering news and movie subtitles and we provide a thorough analysis of the distribution of various document-level features in their domains. Furthermore, we train a set of context-aware MT models on both datasets and propose a comparative evaluation scheme that contrasts coherent context with artificially scrambled documents and absent context, arguing that the impact of discourse-aware MT models will become visible in this way. Our results show that the models are indeed affected by the manipulation of the test data, providing a different view on document-level translation quality than absolute sentence-level scores.

AB - In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems. We describe two popular datasets covering news and movie subtitles and we provide a thorough analysis of the distribution of various document-level features in their domains. Furthermore, we train a set of context-aware MT models on both datasets and propose a comparative evaluation scheme that contrasts coherent context with artificially scrambled documents and absent context, arguing that the impact of discourse-aware MT models will become visible in this way. Our results show that the models are indeed affected by the manipulation of the test data, providing a different view on document-level translation quality than absolute sentence-level scores.

KW - 113 Computer and information sciences

KW - 6121 Languages

U2 - 10.18653/v1/D19-6506

DO - 10.18653/v1/D19-6506

M3 - Conference contribution

SP - 51

EP - 61

BT - The Fourth Workshop on Discourse in Machine Translation

PB - The Association for Computational Linguistics

CY - Stroudsburg

ER -

Scherrer Y, Tiedemann J, Loáiciga S. Analysing concatenation approaches to document-level NMT in two different domains. I The Fourth Workshop on Discourse in Machine Translation: Proceedings of the Workshop. Stroudsburg: The Association for Computational Linguistics. 2019. s. 51-61 https://doi.org/10.18653/v1/D19-6506