A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Elaine Zosa, Mark Granroth-Wilding, Lidia Pivovarova

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


We address the problem of linking related documents across languages in a multilingual collection. We evaluate three diverse unsupervised methods to represent and compare documents: (1) multilingual topic model; (2) cross-lingual document embeddings; and (3) Wasserstein distance. We test the performance of these methods in retrieving news articles in Swedish that are known to be related to a given Finnish article. The results show that ensembles of the methods outperform the stand-alone methods, suggesting that they capture complementary characteristics of the documents.
Original languageEnglish
Title of host publicationProceedings of the LREC 2020 Workshop on Cross-Language Search and Summarization of Text and Speech
EditorsKathy McKeown, Douglas W. Oard, Elizabeth Boschee, Richard Schwartz
Number of pages6
PublisherEuropean Language Resources Association (ELRA)
Publication date16 May 2020
ISBN (Print)978-10-95546-55-9
Publication statusPublished - 16 May 2020
MoE publication typeA4 Article in conference proceedings
EventLREC 2020 Workshop on Cross-Language Search and Summarization of Text and Speech - Originally Scheduled for May 16, 2020 Palais du Pharo, Marseilles, France LREC has announced that the conference is cancelled. Reviewing for this workshop will continue, and the proceedings will be published., Marseilles, France
Duration: 16 May 2020 → …

Fields of Science

  • 113 Computer and information sciences

Cite this