A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function. In other words, we can observe the contributions of different network components at training time. In this article, we systematically study masked language modeling, causal language modeling, and machine translation. We show that the choice of training objective leads to distinctive optimization procedures, even when performed on comparable Transformer architectures. We demonstrate how the various Transformer parameters are used during training, supporting that the feed-forward components of each layer are the main contributors to the optimization procedure. Finally, we find that the learning dynamics are not affected by data size and distribution but rather determined by the learning objective.
Originalspråkengelska
Titel på värdpublikationProceedings of the 29th International Conference on Computational Linguistics
RedaktörerNicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, et al.
Antal sidor13
UtgivningsortGyeongju
FörlagInternational Committee on Computational Linguistics
Utgivningsdatumokt. 2022
Sidor4788-4800
StatusPublicerad - okt. 2022
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Conference on Computational Linguistics - Gyeongju, Sydkorea
Varaktighet: 12 okt. 202217 okt. 2022
Konferensnummer: 29
https://coling2022.org/

Publikationsserier

NamnInternational conference on computational linguistics
FörlagInternational Committee on Computational Linguistics
Nummer1
Volym29
ISSN (tryckt)2951-2093

Vetenskapsgrenar

  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här