A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function. In other words, we can observe the contributions of different network components at training time. In this article, we systematically study masked language modeling, causal language modeling, and machine translation. We show that the choice of training objective leads to distinctive optimization procedures, even when performed on comparable Transformer architectures. We demonstrate how the various Transformer parameters are used during training, supporting that the feed-forward components of each layer are the main contributors to the optimization procedure. Finally, we find that the learning dynamics are not affected by data size and distribution but rather determined by the learning objective.
Alkuperäiskielienglanti
OtsikkoProceedings of the 29th International Conference on Computational Linguistics
ToimittajatNicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, et al.
Sivumäärä13
JulkaisupaikkaGyeongju
KustantajaInternational Committee on Computational Linguistics
Julkaisupäivälokak. 2022
Sivut4788-4800
TilaJulkaistu - lokak. 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInternational Conference on Computational Linguistics - Gyeongju, Korean tasavalta (Etelä-Korea)
Kesto: 12 lokak. 202217 lokak. 2022
Konferenssinumero: 29
https://coling2022.org/

Julkaisusarja

NimiInternational conference on computational linguistics
KustantajaInternational Committee on Computational Linguistics
Numero1
Vuosikerta29
ISSN (painettu)2951-2093

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä