Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents

Heikki Keskustalo, Kimmo Tapio Kettunen, Sanna Kumpulainen, Nicola Ferro, Gianmaria Silvello, Anni Järvelin, Jaana Kekäläinen, Paavo Arvola, Miamaria Saastamoinen, Eero Sormunen, Kalervo Järvelin

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, and
errors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such different
types of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set of
expansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compounding
language, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.
Originalspråkengelska
Titel på gästpublikationiConference 2015 Proceedings
Antal sidor7
FörlagiSchools
Utgivningsdatum2015
StatusPublicerad - 2015
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangiConference - Newport Beach, Förenta Staterna (USA)
Varaktighet: 24 mar 201527 mar 2015
Konferensnummer: 2015

Publikationsserier

NamniConference
Förlag iSchools
ISSN (tryckt)2325-6850

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här

Keskustalo, H., Kettunen, K. T., Kumpulainen, S., Ferro, N., Silvello, G., Järvelin, A., ... Järvelin, K. (2015). Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. I iConference 2015 Proceedings (iConference). iSchools.
Keskustalo, Heikki ; Kettunen, Kimmo Tapio ; Kumpulainen, Sanna ; Ferro, Nicola ; Silvello, Gianmaria ; Järvelin, Anni ; Kekäläinen, Jaana ; Arvola, Paavo ; Saastamoinen, Miamaria ; Sormunen, Eero ; Järvelin, Kalervo. / Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. iConference 2015 Proceedings. iSchools, 2015. (iConference).
@inproceedings{fd19c4091b19414bb64c009d94ffa7b2,
title = "Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents",
abstract = "Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, anderrors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such differenttypes of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set ofexpansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compoundinglanguage, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.",
keywords = "113 Computer and information sciences",
author = "Heikki Keskustalo and Kettunen, {Kimmo Tapio} and Sanna Kumpulainen and Nicola Ferro and Gianmaria Silvello and Anni J{\"a}rvelin and Jaana Kek{\"a}l{\"a}inen and Paavo Arvola and Miamaria Saastamoinen and Eero Sormunen and Kalervo J{\"a}rvelin",
note = "Volume: Proceeding volume:",
year = "2015",
language = "English",
series = "iConference",
publisher = "iSchools",
booktitle = "iConference 2015 Proceedings",
address = "United Kingdom",

}

Keskustalo, H, Kettunen, KT, Kumpulainen, S, Ferro, N, Silvello, G, Järvelin, A, Kekäläinen, J, Arvola, P, Saastamoinen, M, Sormunen, E & Järvelin, K 2015, Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. i iConference 2015 Proceedings. iConference, iSchools, iConference, Newport Beach, Förenta Staterna (USA), 24/03/2015.

Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. / Keskustalo, Heikki; Kettunen, Kimmo Tapio; Kumpulainen, Sanna; Ferro, Nicola; Silvello, Gianmaria; Järvelin, Anni; Kekäläinen, Jaana; Arvola, Paavo; Saastamoinen, Miamaria; Sormunen, Eero; Järvelin, Kalervo.

iConference 2015 Proceedings. iSchools, 2015. (iConference).

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

TY - GEN

T1 - Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents

AU - Keskustalo, Heikki

AU - Kettunen, Kimmo Tapio

AU - Kumpulainen, Sanna

AU - Ferro, Nicola

AU - Silvello, Gianmaria

AU - Järvelin, Anni

AU - Kekäläinen, Jaana

AU - Arvola, Paavo

AU - Saastamoinen, Miamaria

AU - Sormunen, Eero

AU - Järvelin, Kalervo

N1 - Volume: Proceeding volume:

PY - 2015

Y1 - 2015

N2 - Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, anderrors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such differenttypes of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set ofexpansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compoundinglanguage, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.

AB - Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, anderrors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such differenttypes of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set ofexpansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compoundinglanguage, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.

KW - 113 Computer and information sciences

UR - http://hdl.handle.net/2142/73430

M3 - Conference contribution

T3 - iConference

BT - iConference 2015 Proceedings

PB - iSchools

ER -

Keskustalo H, Kettunen KT, Kumpulainen S, Ferro N, Silvello G, Järvelin A et al. Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. I iConference 2015 Proceedings. iSchools. 2015. (iConference).