A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Elaine Zosa, Mark Granroth-Wilding, Lidia Pivovarova

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review


We address the problem of linking related documents across languages in a multilingual collection. We evaluate three diverse unsupervised methods to represent and compare documents: (1) multilingual topic model; (2) cross-lingual document embeddings; and (3) Wasserstein distance. We test the performance of these methods in retrieving news articles in Swedish that are known to be related to a given Finnish article. The results show that ensembles of the methods outperform the stand-alone methods, suggesting that they capture complementary characteristics of the documents.
Titel på värdpublikationProceedings of the LREC 2020 Workshop on Cross-Language Search and Summarization of Text and Speech
RedaktörerKathy McKeown, Douglas W. Oard, Elizabeth Boschee, Richard Schwartz
Antal sidor6
FörlagEuropean Language Resources Association (ELRA)
Utgivningsdatum16 maj 2020
ISBN (tryckt)978-10-95546-55-9
StatusPublicerad - 16 maj 2020
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangLREC 2020 Workshop on Cross-Language Search and Summarization of Text and Speech - Originally Scheduled for May 16, 2020 Palais du Pharo, Marseilles, France LREC has announced that the conference is cancelled. Reviewing for this workshop will continue, and the proceedings will be published., Marseilles, Frankrike
Varaktighet: 16 maj 2020 → …


  • 113 Data- och informationsvetenskap

Citera det här