Projects per year
Abstract
This paper describes the development of a new benchmark for machine translation that provides training and test data for thousands of language pairs covering over 500 languages and tools for creating state-of-the-art translation models from that collection. The main goal is to trigger the development of open translation tools and models with a much broader coverage of the World's languages. Using the package it is possible to work on realistic low-resource scenarios avoiding artificially reduced setups that are common when demonstrating zero-shot or few-shot learning. For the first time, this package provides a comprehensive collection of diverse data sets in hundreds of languages with systematic language and script annotation and data splits to extend the narrow coverage of existing benchmarks. Together with the data release, we also provide a growing number of pre-trained baseline models for individual language pairs and selected language groups.
Original language | English |
---|---|
Title of host publication | Proceedings of the Fifth Conference on Machine Translation |
Editors | Loïc Barrault [et al.] |
Number of pages | 9 |
Place of Publication | Stroudsburg |
Publisher | The Association for Computational Linguistics |
Publication date | 1 Nov 2020 |
Pages | 1174-1182 |
ISBN (Electronic) | 978-1-948087-81-0 |
Publication status | Published - 1 Nov 2020 |
MoE publication type | A4 Article in conference proceedings |
Event | The 2020 Conference on Empirical Methods in Natural Language Processing - [Virtual conference] Duration: 16 Nov 2020 → 20 Nov 2020 https://2020.emnlp.org/ |
Fields of Science
- 6121 Languages
- 113 Computer and information sciences
-
FoTran: Found in Translation - Natural Language Understanding with Cross-Lingual Grounding
Tiedemann, J., Celikkanat, H., Raganato, A., Silfverberg, M., Sulubacak, U., Vazquez , R., Apidianaki, M., Aulamo, M., Boggia, M., Celikkanat, H., De Gibert Bonet, O., Grönroos, S., Mickus, T., Raganato, A., Scherrer, Y., Silfverberg, M., Sjöblom, E. I., Talman, A., Vazquez , R., Virpioja, S. P., Yli-Jyrä, A. & Zosa, E.
01/09/2018 → 29/02/2024
Project: EU Horizon 2020: European Research Council: Consolidator Grant (H2020-ERC-COG)
-
-
fiskmö: Creation of a parallel corpus of translated documents and machine translation for Finnish and Swedish
Tiedemann, J., Ginter, F., Papula, N., Aulamo, M., Nieminen, T., Kanerva, J. & Eskola, K.
01/05/2018 → 31/03/2021
Project: Research project