Projekteja vuodessa
Abstrakti
This paper presents our on-going efforts to develop a com-
prehensive data set and benchmark for machine translation beyond high-
resource languages. The current release includes 500GB of compressed
parallel data for almost 3,000 language pairs covering over 500 languages
and language variants. We present the structure of the data set and
demonstrate its use for systematic studies based on baseline experiments
with multilingual neural machine translation between Uralic languages
and other language groups. Our initial results show the capabilities of
training effective multilingual translation models with skewed training
data but also stress the shortcomings with low-resource settings and
the difficulties to obtain sufficient information through straightforward
transfer from related languages.
prehensive data set and benchmark for machine translation beyond high-
resource languages. The current release includes 500GB of compressed
parallel data for almost 3,000 language pairs covering over 500 languages
and language variants. We present the structure of the data set and
demonstrate its use for systematic studies based on baseline experiments
with multilingual neural machine translation between Uralic languages
and other language groups. Our initial results show the capabilities of
training effective multilingual translation models with skewed training
data but also stress the shortcomings with low-resource settings and
the difficulties to obtain sufficient information through straightforward
transfer from related languages.
Alkuperäiskieli | englanti |
---|---|
Otsikko | Multilingual Facilitation |
Toimittajat | Mika Hämäläinen, Niko Partanen, Khalid Alnajjar |
Sivumäärä | 15 |
Julkaisupaikka | Helsinki |
Kustantaja | University of Helsinki |
Julkaisupäivä | 2021 |
Sivut | 248-262 |
ISBN (painettu) | 979-871-33-6227-0 |
ISBN (elektroninen) | 978-951-51-5025-7 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2021 |
OKM-julkaisutyyppi | A3 Kirjan tai muun kokoomateoksen osa |
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet
- 6121 Kielitieteet
-
FoTran: Found in Translation - Natural Language Understanding with Cross-Lingual Grounding
Tiedemann, J., Celikkanat, H., Raganato, A., Silfverberg, M., Sulubacak, U., Vazquez , R., Apidianaki, M., Aulamo, M., Boggia, M., Celikkanat, H., De Gibert Bonet, O., Grönroos, S., Mickus, T., Raganato, A., Scherrer, Y., Silfverberg, M., Sjöblom, E. I., Talman, A., Vazquez , R., Virpioja, S. P., Yli-Jyrä, A. & Zosa, E.
01/09/2018 → 29/02/2024
Projekti: EU Horizon 2020: European Research Council: Consolidator Grant (H2020-ERC-COG)
-
-
OPUS-MT: Open Translation Models, Tools and Services
Aulamo, M., Nieminen, T. J., Hardwick, S. & Tiedemann, J.
01/08/2020 → 31/08/2021
Projekti: Tutkimusprojekti