The University of Helsinki Submissions to the WMT19 Similar Language Translation Task

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

This paper describes the University of Helsinki Language Technology group's participation in the WMT 2019 similar language translation task. We trained neural machine translation models for the language pairs Czech textless-textgreater Polish and Spanish textless-textgreater Portuguese. Our experiments focused on different subword segmentation methods, and in particular on the comparison of a cognate-aware segmentation method, Cognate Morfessor, with character segmentation and unsupervised segmentation methods for which the data from different languages were simply concatenated. We did not observe major benefits from cognate-aware segmentation methods, but further research may be needed to explore larger parts of the parameter space. Character-level models proved to be competitive for translation between Spanish and Portuguese, but they are slower in training and decoding.
Originalspråkengelska
Titel på gästpublikationFourth Conference on Machine Translation: Proceedings of the Conference : Volume 3 (Shared Task Papers, Day 2)
RedaktörerOndřej Bojar, Rajen Chatterjee, Christian Federmann, et al.
Antal sidor9
UtgivningsortStroudsburg
FörlagThe Association for Computational Linguistics
Utgivningsdatum1 aug 2019
Sidor236-244
ISBN (elektroniskt)978-1-950737-27-7
StatusPublicerad - 1 aug 2019
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangConference on Machine Translation: WMT19 - Florence, Italien
Varaktighet: 1 aug 20192 aug 2019
Konferensnummer: 4

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap
  • 6121 Språkvetenskaper

Citera det här

Scherrer, Y., Vázquez, R., & Virpioja, S. (2019). The University of Helsinki Submissions to the WMT19 Similar Language Translation Task. I O. Bojar, R. Chatterjee, C. Federmann, & E. A. (Red.), Fourth Conference on Machine Translation: Proceedings of the Conference : Volume 3 (Shared Task Papers, Day 2) (s. 236-244). Stroudsburg: The Association for Computational Linguistics.
Scherrer, Yves ; Vázquez, Raúl ; Virpioja, Sami. / The University of Helsinki Submissions to the WMT19 Similar Language Translation Task. Fourth Conference on Machine Translation: Proceedings of the Conference : Volume 3 (Shared Task Papers, Day 2). redaktör / Ondřej Bojar ; Rajen Chatterjee ; Christian Federmann ; et al. Stroudsburg : The Association for Computational Linguistics, 2019. s. 236-244
@inproceedings{9b6ef679c8004c519a77d15098fc2433,
title = "The University of Helsinki Submissions to the WMT19 Similar Language Translation Task",
abstract = "This paper describes the University of Helsinki Language Technology group's participation in the WMT 2019 similar language translation task. We trained neural machine translation models for the language pairs Czech textless-textgreater Polish and Spanish textless-textgreater Portuguese. Our experiments focused on different subword segmentation methods, and in particular on the comparison of a cognate-aware segmentation method, Cognate Morfessor, with character segmentation and unsupervised segmentation methods for which the data from different languages were simply concatenated. We did not observe major benefits from cognate-aware segmentation methods, but further research may be needed to explore larger parts of the parameter space. Character-level models proved to be competitive for translation between Spanish and Portuguese, but they are slower in training and decoding.",
keywords = "113 Computer and information sciences, 6121 Languages",
author = "Yves Scherrer and Ra{\'u}l V{\'a}zquez and Sami Virpioja",
year = "2019",
month = "8",
day = "1",
language = "English",
pages = "236--244",
editor = "Bojar, {Ondřej } and Rajen Chatterjee and Federmann, {Christian } and {et al.}",
booktitle = "Fourth Conference on Machine Translation: Proceedings of the Conference",
publisher = "The Association for Computational Linguistics",
address = "United States",

}

Scherrer, Y, Vázquez, R & Virpioja, S 2019, The University of Helsinki Submissions to the WMT19 Similar Language Translation Task. i O Bojar, R Chatterjee, C Federmann & EA (red), Fourth Conference on Machine Translation: Proceedings of the Conference : Volume 3 (Shared Task Papers, Day 2). The Association for Computational Linguistics, Stroudsburg, s. 236-244, Conference on Machine Translation, Florence, Italien, 01/08/2019.

The University of Helsinki Submissions to the WMT19 Similar Language Translation Task. / Scherrer, Yves; Vázquez, Raúl; Virpioja, Sami.

Fourth Conference on Machine Translation: Proceedings of the Conference : Volume 3 (Shared Task Papers, Day 2). red. / Ondřej Bojar; Rajen Chatterjee; Christian Federmann; et al. Stroudsburg : The Association for Computational Linguistics, 2019. s. 236-244.

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

TY - GEN

T1 - The University of Helsinki Submissions to the WMT19 Similar Language Translation Task

AU - Scherrer, Yves

AU - Vázquez, Raúl

AU - Virpioja, Sami

PY - 2019/8/1

Y1 - 2019/8/1

N2 - This paper describes the University of Helsinki Language Technology group's participation in the WMT 2019 similar language translation task. We trained neural machine translation models for the language pairs Czech textless-textgreater Polish and Spanish textless-textgreater Portuguese. Our experiments focused on different subword segmentation methods, and in particular on the comparison of a cognate-aware segmentation method, Cognate Morfessor, with character segmentation and unsupervised segmentation methods for which the data from different languages were simply concatenated. We did not observe major benefits from cognate-aware segmentation methods, but further research may be needed to explore larger parts of the parameter space. Character-level models proved to be competitive for translation between Spanish and Portuguese, but they are slower in training and decoding.

AB - This paper describes the University of Helsinki Language Technology group's participation in the WMT 2019 similar language translation task. We trained neural machine translation models for the language pairs Czech textless-textgreater Polish and Spanish textless-textgreater Portuguese. Our experiments focused on different subword segmentation methods, and in particular on the comparison of a cognate-aware segmentation method, Cognate Morfessor, with character segmentation and unsupervised segmentation methods for which the data from different languages were simply concatenated. We did not observe major benefits from cognate-aware segmentation methods, but further research may be needed to explore larger parts of the parameter space. Character-level models proved to be competitive for translation between Spanish and Portuguese, but they are slower in training and decoding.

KW - 113 Computer and information sciences

KW - 6121 Languages

M3 - Conference contribution

SP - 236

EP - 244

BT - Fourth Conference on Machine Translation: Proceedings of the Conference

A2 - Bojar, Ondřej

A2 - Chatterjee, Rajen

A2 - Federmann, Christian

A2 - null, et al.

PB - The Association for Computational Linguistics

CY - Stroudsburg

ER -

Scherrer Y, Vázquez R, Virpioja S. The University of Helsinki Submissions to the WMT19 Similar Language Translation Task. I Bojar O, Chatterjee R, Federmann C, EA, redaktörer, Fourth Conference on Machine Translation: Proceedings of the Conference : Volume 3 (Shared Task Papers, Day 2). Stroudsburg: The Association for Computational Linguistics. 2019. s. 236-244