The University of Helsinki submissions to the WMT19 news translation task

Research output: Contribution to journalConference articleScientificpeer-review

Abstract

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish.
Original languageEnglish
JournalProceedings of the Annual Meeting of the Association for Computational Linguistics
Publication statusAccepted/In press - 7 Jun 2019
MoE publication typeA4 Article in conference proceedings
EventFourth Conference on Machine Translation: WMT19 - Firenze, Italy
Duration: 1 Aug 20192 Aug 2019
Conference number: 4

Cite this

@article{8326a39257134d458dae56c457b37630,
title = "The University of Helsinki submissions to the WMT19 news translation task",
abstract = "In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish.",
author = "Aarne Talman and Umut Sulubacak and Raul Vazquez and Yves Scherrer and Sami Virpioja and Alessandro Raganato and Arvi Hurskainen and J{\"o}rg Tiedemann",
year = "2019",
month = "6",
day = "7",
language = "English",
journal = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

}

TY - JOUR

T1 - The University of Helsinki submissions to the WMT19 news translation task

AU - Talman, Aarne

AU - Sulubacak, Umut

AU - Vazquez , Raul

AU - Scherrer, Yves

AU - Virpioja, Sami

AU - Raganato, Alessandro

AU - Hurskainen, Arvi

AU - Tiedemann, Jörg

PY - 2019/6/7

Y1 - 2019/6/7

N2 - In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish.

AB - In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish.

M3 - Conference article

JO - Proceedings of the Annual Meeting of the Association for Computational Linguistics

JF - Proceedings of the Annual Meeting of the Association for Computational Linguistics

ER -