Multilingual Dictionary Linking and Aggregation: Quality from Consistency

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

The growth of Web-accessible dictionaries and term data
has led to a proliferation of platforms distributing the same lexical resources in different combinations and packagings. Finding the right word or translation is like finding a needle in a haystack. The quantity of the
data is undercut by the doubtful quality of the resources.
Our aim is to cut down the quantity and raise the quality by matching and aggregating entries within and across dictionaries. In this exploratory paper, our goal is to see how far we can get by using information extracted from multiple dictionaries themselves. Our hypothesis is that the more limited quantity of data in dictionaries is compensated by their richer structure and more concentrated information content. We hope to take advantage of the structure of dictionaries by basing quality criteria and measures on linguistic and terminological considerations. The plan of campaign is to derive quality criteria to recognize well-constructed
dictionary entries from a model dictionary, and then attempt to convert the criteria into language-independent frequency-based measures. As a model dictionary we use the Princeton WordNet. The
measures derived from it are tested against data extracted from BabelNet.
Original languageEnglish
Title of host publication15th International Semantic Web Conference (ISWC 2016) : the Fourth International Workshop on Linked Data for Information Extraction (LD4IE 2016)
EditorsAnna Lisa Gentile, Claudia d'Amato, Ziqi Zhang, Heiko Paulheim
Number of pages12
Volume1699
Place of PublicationKobe
PublisherCEUR-WS.org
Publication date4 Oct 2016
Pages51-62
Publication statusPublished - 4 Oct 2016
MoE publication typeA4 Article in conference proceedings
EventInternational Semantic Web Conference - Kobe, Japan
Duration: 1 Jan 1800 → …

Publication series

NameCEUR Workshop Proceedings
ISSN (Electronic)1613-0073

Fields of Science

  • 113 Computer and information sciences
  • Information extraction
  • Linked data
  • Edit distance
  • 6160 Other humanities
  • Quality checking
  • Terminology
  • Aggregation

Cite this

Ji, K., Wang, S., & Carlson, L. H. (2016). Multilingual Dictionary Linking and Aggregation: Quality from Consistency. In A. L. Gentile, C. d'Amato, Z. Zhang, & H. Paulheim (Eds.), 15th International Semantic Web Conference (ISWC 2016): the Fourth International Workshop on Linked Data for Information Extraction (LD4IE 2016) (Vol. 1699, pp. 51-62). (CEUR Workshop Proceedings). Kobe: CEUR-WS.org.
Ji, Kun ; Wang, Shanshan ; Carlson, Lauri Henrik. / Multilingual Dictionary Linking and Aggregation: Quality from Consistency. 15th International Semantic Web Conference (ISWC 2016): the Fourth International Workshop on Linked Data for Information Extraction (LD4IE 2016). editor / Anna Lisa Gentile ; Claudia d'Amato ; Ziqi Zhang ; Heiko Paulheim. Vol. 1699 Kobe : CEUR-WS.org, 2016. pp. 51-62 (CEUR Workshop Proceedings).
@inproceedings{dee98c82a4174feba19c951a5bfd9d59,
title = "Multilingual Dictionary Linking and Aggregation: Quality from Consistency",
abstract = "The growth of Web-accessible dictionaries and term datahas led to a proliferation of platforms distributing the same lexical resources in different combinations and packagings. Finding the right word or translation is like finding a needle in a haystack. The quantity of thedata is undercut by the doubtful quality of the resources.Our aim is to cut down the quantity and raise the quality by matching and aggregating entries within and across dictionaries. In this exploratory paper, our goal is to see how far we can get by using information extracted from multiple dictionaries themselves. Our hypothesis is that the more limited quantity of data in dictionaries is compensated by their richer structure and more concentrated information content. We hope to take advantage of the structure of dictionaries by basing quality criteria and measures on linguistic and terminological considerations. The plan of campaign is to derive quality criteria to recognize well-constructeddictionary entries from a model dictionary, and then attempt to convert the criteria into language-independent frequency-based measures. As a model dictionary we use the Princeton WordNet. Themeasures derived from it are tested against data extracted from BabelNet.",
keywords = "113 Computer and information sciences, Information extraction, Linked data, Edit distance, 6160 Other humanities, Quality checking, Terminology, Aggregation",
author = "Kun Ji and Shanshan Wang and Carlson, {Lauri Henrik}",
note = "Volume: Proceeding volume: 1699",
year = "2016",
month = "10",
day = "4",
language = "English",
volume = "1699",
series = "CEUR Workshop Proceedings",
publisher = "CEUR-WS.org",
pages = "51--62",
editor = "Gentile, {Anna Lisa} and Claudia d'Amato and Ziqi Zhang and Heiko Paulheim",
booktitle = "15th International Semantic Web Conference (ISWC 2016)",
address = "Germany",

}

Ji, K, Wang, S & Carlson, LH 2016, Multilingual Dictionary Linking and Aggregation: Quality from Consistency. in AL Gentile, C d'Amato, Z Zhang & H Paulheim (eds), 15th International Semantic Web Conference (ISWC 2016): the Fourth International Workshop on Linked Data for Information Extraction (LD4IE 2016). vol. 1699, CEUR Workshop Proceedings, CEUR-WS.org, Kobe, pp. 51-62, International Semantic Web Conference, Kobe, Japan, 01/01/1800.

Multilingual Dictionary Linking and Aggregation: Quality from Consistency. / Ji, Kun; Wang, Shanshan; Carlson, Lauri Henrik.

15th International Semantic Web Conference (ISWC 2016): the Fourth International Workshop on Linked Data for Information Extraction (LD4IE 2016). ed. / Anna Lisa Gentile; Claudia d'Amato; Ziqi Zhang; Heiko Paulheim. Vol. 1699 Kobe : CEUR-WS.org, 2016. p. 51-62 (CEUR Workshop Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Multilingual Dictionary Linking and Aggregation: Quality from Consistency

AU - Ji, Kun

AU - Wang, Shanshan

AU - Carlson, Lauri Henrik

N1 - Volume: Proceeding volume: 1699

PY - 2016/10/4

Y1 - 2016/10/4

N2 - The growth of Web-accessible dictionaries and term datahas led to a proliferation of platforms distributing the same lexical resources in different combinations and packagings. Finding the right word or translation is like finding a needle in a haystack. The quantity of thedata is undercut by the doubtful quality of the resources.Our aim is to cut down the quantity and raise the quality by matching and aggregating entries within and across dictionaries. In this exploratory paper, our goal is to see how far we can get by using information extracted from multiple dictionaries themselves. Our hypothesis is that the more limited quantity of data in dictionaries is compensated by their richer structure and more concentrated information content. We hope to take advantage of the structure of dictionaries by basing quality criteria and measures on linguistic and terminological considerations. The plan of campaign is to derive quality criteria to recognize well-constructeddictionary entries from a model dictionary, and then attempt to convert the criteria into language-independent frequency-based measures. As a model dictionary we use the Princeton WordNet. Themeasures derived from it are tested against data extracted from BabelNet.

AB - The growth of Web-accessible dictionaries and term datahas led to a proliferation of platforms distributing the same lexical resources in different combinations and packagings. Finding the right word or translation is like finding a needle in a haystack. The quantity of thedata is undercut by the doubtful quality of the resources.Our aim is to cut down the quantity and raise the quality by matching and aggregating entries within and across dictionaries. In this exploratory paper, our goal is to see how far we can get by using information extracted from multiple dictionaries themselves. Our hypothesis is that the more limited quantity of data in dictionaries is compensated by their richer structure and more concentrated information content. We hope to take advantage of the structure of dictionaries by basing quality criteria and measures on linguistic and terminological considerations. The plan of campaign is to derive quality criteria to recognize well-constructeddictionary entries from a model dictionary, and then attempt to convert the criteria into language-independent frequency-based measures. As a model dictionary we use the Princeton WordNet. Themeasures derived from it are tested against data extracted from BabelNet.

KW - 113 Computer and information sciences

KW - Information extraction

KW - Linked data

KW - Edit distance

KW - 6160 Other humanities

KW - Quality checking

KW - Terminology

KW - Aggregation

M3 - Conference contribution

VL - 1699

T3 - CEUR Workshop Proceedings

SP - 51

EP - 62

BT - 15th International Semantic Web Conference (ISWC 2016)

A2 - Gentile, Anna Lisa

A2 - d'Amato, Claudia

A2 - Zhang, Ziqi

A2 - Paulheim, Heiko

PB - CEUR-WS.org

CY - Kobe

ER -

Ji K, Wang S, Carlson LH. Multilingual Dictionary Linking and Aggregation: Quality from Consistency. In Gentile AL, d'Amato C, Zhang Z, Paulheim H, editors, 15th International Semantic Web Conference (ISWC 2016): the Fourth International Workshop on Linked Data for Information Extraction (LD4IE 2016). Vol. 1699. Kobe: CEUR-WS.org. 2016. p. 51-62. (CEUR Workshop Proceedings).