Optimizing singular value based similarity measures for document similarity comparisons

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

The similarity of documents is typically computed using fairly simple similarity measures, such as mean or maximum pooling of word representations followed by vector cosine similarity. This results in fast computation but compared to second-order or matrix-based similarity measures loses information. In this work, we investigate the value of matrix similarity measures for document similarity comparison in full-length patent retrieval tasks and introduce two new metrics motivated by the Schatten $p$-norm. The new similarity measures are based on singular values and involve learnable parameters to be optimized for a given evaluation task. We show that tuning the similarity measures for a specific task improves the similarity comparison accuracy.
Originalspråkengelska
Titel på värdpublikationProceedings of the 5th International Conference on Natural Language and Speech Processing
RedaktörerMourad Abbas, Abed Alhakim Freihat
Antal sidor6
UtgivningsortStroudsburg
FörlagAssociation for Computational Linguistics (ACL)
Utgivningsdatumdec. 2022
Sidor113-118
ISBN (elektroniskt)9781959429364
StatusPublicerad - dec. 2022
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Conference on Natural Language and Speech Processing - [Virtual event]
Varaktighet: 16 dec. 202217 dec. 2022
Konferensnummer: 5

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här