Second-order Document Similarity Metrics for Transformers

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

The similarity of documents represented using static word embeddings is best measured using second-order metrics accounting for the covariance of the embeddings. Transformers provide superior representations for words compared to static embeddings, but document representation and similarity evaluation are currently often done using simple mean pooling. We explain how the second-order metrics can be used also with transformers, and evaluate the value of improved metrics in this context.
Originalspråkengelska
Titel på värdpublikationProceedings of the 5th International Conference on Natural Language and Speech Processing
RedaktörerMourad Abbas, Abed Alhakim Freihat
Antal sidor6
UtgivningsortStroudsburg
FörlagAssociation for Computational Linguistics (ACL)
Utgivningsdatumdec. 2022
Sidor128-133
ISBN (elektroniskt)9781959429364
StatusPublicerad - dec. 2022
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Conference on Natural Language and Speech Processing - [Virtual event]
Varaktighet: 16 dec. 202217 dec. 2022
Konferensnummer: 5

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här