A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Elaine Zosa, Mark Granroth-Wilding, Lidia Pivovarova

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

We address the problem of linking related documents across languages in a multilingual collection. We evaluate three diverse unsupervised methods to represent and compare documents: (1) multilingual topic model; (2) cross-lingual document embeddings; and (3) Wasserstein distance. We test the performance of these methods in retrieving news articles in Swedish that are known to be related to a given Finnish article. The results show that ensembles of the methods outperform the stand-alone methods, suggesting that they capture complementary characteristics of the documents.
Alkuperäiskielienglanti
OtsikkoProceedings of the LREC 2020 Workshop on Cross-Language Search and Summarization of Text and Speech
ToimittajatKathy McKeown, Douglas W. Oard, Elizabeth Boschee, Richard Schwartz
Sivumäärä6
KustantajaEuropean Language Resources Association (ELRA)
Julkaisupäivä16 toukok. 2020
Sivut32-37
ISBN (painettu)978-10-95546-55-9
TilaJulkaistu - 16 toukok. 2020
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaLREC 2020 Workshop on Cross-Language Search and Summarization of Text and Speech - Originally Scheduled for May 16, 2020 Palais du Pharo, Marseilles, France LREC has announced that the conference is cancelled. Reviewing for this workshop will continue, and the proceedings will be published., Marseilles, Ranska
Kesto: 16 toukok. 2020 → …

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä