The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

The Helsinki-NLP team participated in the NADI 2023 shared tasks on Arabic dialect translation with seven submissions. We used statistical (SMT) and neural machine translation (NMT) methods and explored character- and subword-based data preprocessing. Our submissions placed second in both tracks. In the open track, our winning submission is a character-level SMT system with additional Modern Standard Arabic language models. In the closed track, our best BLEU scores were obtained with the leave-as-is baseline, a simple copy of the input, and narrowly followed by SMT systems. In both tracks, fine-tuning existing multilingual models such as AraT5 or ByT5 did not yield superior performance compared to SMT.
Original languageEnglish
Title of host publicationProceedings of the The First Arabic Natural Language Processing Conference (ArabicNLP 2023)
EditorsHassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Number of pages8
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication date1 Dec 2023
Pages670-677
ISBN (Electronic)978-1-959429-27-2
DOIs
Publication statusPublished - 1 Dec 2023
MoE publication typeA4 Article in conference proceedings
EventArabic Natural Language Processing Conference - , Singapore
Duration: 7 Dec 20237 Dec 2023
https://arabicnlp2023.sigarab.org/

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this