Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents

Heikki Keskustalo, Kimmo Tapio Kettunen, Sanna Kumpulainen, Nicola Ferro, Gianmaria Silvello, Anni Järvelin, Jaana Kekäläinen, Paavo Arvola, Miamaria Saastamoinen, Eero Sormunen, Kalervo Järvelin

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, and
errors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such different
types of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set of
expansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compounding
language, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.
Original languageEnglish
Title of host publicationiConference 2015 Proceedings
Number of pages7
PublisheriSchools
Publication date2015
Publication statusPublished - 2015
MoE publication typeA4 Article in conference proceedings
EventiConference - Newport Beach, United States
Duration: 24 Mar 201527 Mar 2015
Conference number: 2015

Publication series

NameiConference
Publisher iSchools
ISSN (Print)2325-6850

Fields of Science

  • 113 Computer and information sciences

Cite this

Keskustalo, H., Kettunen, K. T., Kumpulainen, S., Ferro, N., Silvello, G., Järvelin, A., ... Järvelin, K. (2015). Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. In iConference 2015 Proceedings (iConference). iSchools.
Keskustalo, Heikki ; Kettunen, Kimmo Tapio ; Kumpulainen, Sanna ; Ferro, Nicola ; Silvello, Gianmaria ; Järvelin, Anni ; Kekäläinen, Jaana ; Arvola, Paavo ; Saastamoinen, Miamaria ; Sormunen, Eero ; Järvelin, Kalervo. / Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. iConference 2015 Proceedings. iSchools, 2015. (iConference).
@inproceedings{fd19c4091b19414bb64c009d94ffa7b2,
title = "Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents",
abstract = "Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, anderrors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such differenttypes of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set ofexpansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compoundinglanguage, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.",
keywords = "113 Computer and information sciences",
author = "Heikki Keskustalo and Kettunen, {Kimmo Tapio} and Sanna Kumpulainen and Nicola Ferro and Gianmaria Silvello and Anni J{\"a}rvelin and Jaana Kek{\"a}l{\"a}inen and Paavo Arvola and Miamaria Saastamoinen and Eero Sormunen and Kalervo J{\"a}rvelin",
note = "Volume: Proceeding volume:",
year = "2015",
language = "English",
series = "iConference",
publisher = "iSchools",
booktitle = "iConference 2015 Proceedings",
address = "United Kingdom",

}

Keskustalo, H, Kettunen, KT, Kumpulainen, S, Ferro, N, Silvello, G, Järvelin, A, Kekäläinen, J, Arvola, P, Saastamoinen, M, Sormunen, E & Järvelin, K 2015, Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. in iConference 2015 Proceedings. iConference, iSchools, iConference, Newport Beach, United States, 24/03/2015.

Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. / Keskustalo, Heikki; Kettunen, Kimmo Tapio; Kumpulainen, Sanna; Ferro, Nicola; Silvello, Gianmaria; Järvelin, Anni; Kekäläinen, Jaana; Arvola, Paavo; Saastamoinen, Miamaria; Sormunen, Eero; Järvelin, Kalervo.

iConference 2015 Proceedings. iSchools, 2015. (iConference).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents

AU - Keskustalo, Heikki

AU - Kettunen, Kimmo Tapio

AU - Kumpulainen, Sanna

AU - Ferro, Nicola

AU - Silvello, Gianmaria

AU - Järvelin, Anni

AU - Kekäläinen, Jaana

AU - Arvola, Paavo

AU - Saastamoinen, Miamaria

AU - Sormunen, Eero

AU - Järvelin, Kalervo

N1 - Volume: Proceeding volume:

PY - 2015

Y1 - 2015

N2 - Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, anderrors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such differenttypes of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set ofexpansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compoundinglanguage, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.

AB - Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, anderrors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such differenttypes of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set ofexpansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compoundinglanguage, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.

KW - 113 Computer and information sciences

UR - http://hdl.handle.net/2142/73430

M3 - Conference contribution

T3 - iConference

BT - iConference 2015 Proceedings

PB - iSchools

ER -

Keskustalo H, Kettunen KT, Kumpulainen S, Ferro N, Silvello G, Järvelin A et al. Targeted Query Expansions as a Method for Searching: Mixed Quality Digitized Cultural Heritage Documents. In iConference 2015 Proceedings. iSchools. 2015. (iConference).