Neural machine translation for low-resource and variation-rich languages

Aktiviteetti: Puhe- tai esitystyypitKutsuesitelmä

Kuvaus

Current machine translation architectures are based on deep neural networks and provide impressive translation quality for the major language pairs. I will start by giving a high-level overview of neural machine translation architectures and then focus on two challenging application scenarios.
The first scenario concerns low-resource languages. I will present our participation in the AmericasNLP shared task, which focuses on machine translation from Spanish to eleven indigenous languages of the Americas. I describe how a combination of techniques, ranging from data collection to knowledge distillation and post-processing, helps improve the translation quality.
In the second scenario, I investigate the suitability of neural machine translation techniques for the automatic normalization of phonetic transcriptions in multi-dialectal corpora. In this case, our focus does not lie in optimal normalization performance, but rather what the model learns about the different dialects and their relation with each other during the training process. In our case study with large Finnish and Norwegian dialect corpora, the model successfully identified the major dialect areas known from prior dialectological research.
Aikajakso25 toukok. 2023
PidettyParis-Lodron University of Salzburg, Itävalta
Tunnustuksen arvoPaikallinen