On the differences between BERT and MT encoder spaces and how to address them in translation tasks

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to improve the applicability of BERT in MT.
Original languageEnglish
Title of host publicationProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing : Student Research Workshop
EditorsJad Kabbara, Haitao Lin, Amandalynne Paullada, Jannis Vamvas
Number of pages11
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication dateAug 2021
Pages337-347
ISBN (Print)978-1-954085-55-8
DOIs
Publication statusPublished - Aug 2021
MoE publication typeA4 Article in conference proceedings
EventAnnual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing - Bangkok [Online event]
Duration: 5 Aug 20216 Aug 2021
Conference number: 59/11

Fields of Science

  • 113 Computer and information sciences
  • 6121 Languages

Cite this