Projekt per år
Sammanfattning
There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.
Originalspråk | engelska |
---|---|
Tidskrift | Machine Translation |
Volym | 34 |
Sidor (från-till) | 251-286 |
Antal sidor | 36 |
ISSN | 0922-6567 |
DOI | |
Status | Publicerad - 30 jan. 2021 |
MoE-publikationstyp | A1 Tidskriftsartikel-refererad |
Vetenskapsgrenar
- 113 Data- och informationsvetenskap
- 6121 Språkvetenskaper
Projekt
- 1 Aktiv
-
FoTran: Found in Translation - Natural Language Understanding with Cross-Lingual Grounding
Tiedemann, J., Celikkanat, H., Raganato, A., Silfverberg, M., Sulubacak, U., Vazquez , R., Apidianaki, M., Attieh, J., Aulamo, M., Boggia, M., Celikkanat, H., De Gibert Bonet, O., Grönroos, S., Mickus, T., Raganato, A., Scherrer, Y., Silfverberg, M., Sjöblom, E. I., Talman, A., Vazquez , R., Virpioja, S. P., Yli-Jyrä, A. & Zosa, E.
01/09/2018 → 29/02/2024
Projekt: EU Horizon 2020: European Research Council: Consolidator Grant (H2020-ERC-COG)