Quality Checking and Matching Linked Dictionary Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

The growth of web accessible dictionary and term data has led to a proliferation
of platforms distributing the same lexical resources in different combinations
and packagings. Finding the right word or translation is like finding a needle in
a haystack. The quantity of the data is undercut by the redundancy and doubtful quality of the resources. In this paper, we develop ways to assess the quality of multilingual lexical web and linked data resources by internal consistency. Concretely, we deconstruct Princeton WordNet [1] to its component word senses or word labels, with the properties they have or inherit from their synsets, and see to what extent these properties allow reconstructing the synsets they came from. The methods developed should then be applicable to aggregation of term data coming from different term sources - to find which entries coming from different sources could be similarly pooled together, to cut redundancy and improve coverage and reliability. The multilingual dictionary BabelNet [2] can be used for evaluation. We restrain our current research to dictionary data and improving language models rather than introducing external sources.
Original languageEnglish
Title of host publication15th International Semantic Web Conference (ISWC 2016) : the 11th International Workshop on Ontology Matching
EditorsPavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh, Ryutaro Ichise
Number of pages2
Volume1766
Publication date22 Oct 2016
Pages241-242
Publication statusPublished - 22 Oct 2016
MoE publication typeA4 Article in conference proceedings
EventInternational Semantic Web Conference - Kobe, Japan
Duration: 1 Jan 1800 → …
Conference number: 15

Publication series

NameCEUR workshop proceedings
ISSN (Electronic)1613-0073

Fields of Science

  • 113 Computer and information sciences
  • Information extraction
  • Linked data
  • Edit distance
  • 6160 Other humanities
  • Quality checking
  • Terminology
  • Aggregation

Cite this

Ji, K., Wang, S., & Carlson, L. H. (2016). Quality Checking and Matching Linked Dictionary Data. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh, & R. Ichise (Eds.), 15th International Semantic Web Conference (ISWC 2016): the 11th International Workshop on Ontology Matching (Vol. 1766, pp. 241-242). (CEUR workshop proceedings).
Ji, Kun ; Wang, Shanshan ; Carlson, Lauri Henrik. / Quality Checking and Matching Linked Dictionary Data. 15th International Semantic Web Conference (ISWC 2016): the 11th International Workshop on Ontology Matching. editor / Pavel Shvaiko ; Jérôme Euzenat ; Ernesto Jiménez-Ruiz ; Michelle Cheatham ; Oktie Hassanzadeh ; Ryutaro Ichise. Vol. 1766 2016. pp. 241-242 (CEUR workshop proceedings).
@inproceedings{d36cf138763641818ff5ef0d6899f31d,
title = "Quality Checking and Matching Linked Dictionary Data",
abstract = "The growth of web accessible dictionary and term data has led to a proliferationof platforms distributing the same lexical resources in different combinationsand packagings. Finding the right word or translation is like finding a needle ina haystack. The quantity of the data is undercut by the redundancy and doubtful quality of the resources. In this paper, we develop ways to assess the quality of multilingual lexical web and linked data resources by internal consistency. Concretely, we deconstruct Princeton WordNet [1] to its component word senses or word labels, with the properties they have or inherit from their synsets, and see to what extent these properties allow reconstructing the synsets they came from. The methods developed should then be applicable to aggregation of term data coming from different term sources - to find which entries coming from different sources could be similarly pooled together, to cut redundancy and improve coverage and reliability. The multilingual dictionary BabelNet [2] can be used for evaluation. We restrain our current research to dictionary data and improving language models rather than introducing external sources.",
keywords = "113 Computer and information sciences, Information extraction, Linked data, Edit distance, 6160 Other humanities, Quality checking, Terminology, Aggregation",
author = "Kun Ji and Shanshan Wang and Carlson, {Lauri Henrik}",
note = "Volume: Proceeding volume: 1766",
year = "2016",
month = "10",
day = "22",
language = "English",
volume = "1766",
series = "CEUR workshop proceedings",
pages = "241--242",
editor = "Shvaiko, {Pavel } and Euzenat, {J{\'e}r{\^o}me } and Jim{\'e}nez-Ruiz, {Ernesto } and Cheatham, {Michelle } and Hassanzadeh, {Oktie } and { Ichise}, Ryutaro",
booktitle = "15th International Semantic Web Conference (ISWC 2016)",

}

Ji, K, Wang, S & Carlson, LH 2016, Quality Checking and Matching Linked Dictionary Data. in P Shvaiko, J Euzenat, E Jiménez-Ruiz, M Cheatham, O Hassanzadeh & R Ichise (eds), 15th International Semantic Web Conference (ISWC 2016): the 11th International Workshop on Ontology Matching. vol. 1766, CEUR workshop proceedings, pp. 241-242, International Semantic Web Conference, Kobe, Japan, 01/01/1800.

Quality Checking and Matching Linked Dictionary Data. / Ji, Kun; Wang, Shanshan; Carlson, Lauri Henrik.

15th International Semantic Web Conference (ISWC 2016): the 11th International Workshop on Ontology Matching. ed. / Pavel Shvaiko; Jérôme Euzenat; Ernesto Jiménez-Ruiz; Michelle Cheatham; Oktie Hassanzadeh; Ryutaro Ichise. Vol. 1766 2016. p. 241-242 (CEUR workshop proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Quality Checking and Matching Linked Dictionary Data

AU - Ji, Kun

AU - Wang, Shanshan

AU - Carlson, Lauri Henrik

N1 - Volume: Proceeding volume: 1766

PY - 2016/10/22

Y1 - 2016/10/22

N2 - The growth of web accessible dictionary and term data has led to a proliferationof platforms distributing the same lexical resources in different combinationsand packagings. Finding the right word or translation is like finding a needle ina haystack. The quantity of the data is undercut by the redundancy and doubtful quality of the resources. In this paper, we develop ways to assess the quality of multilingual lexical web and linked data resources by internal consistency. Concretely, we deconstruct Princeton WordNet [1] to its component word senses or word labels, with the properties they have or inherit from their synsets, and see to what extent these properties allow reconstructing the synsets they came from. The methods developed should then be applicable to aggregation of term data coming from different term sources - to find which entries coming from different sources could be similarly pooled together, to cut redundancy and improve coverage and reliability. The multilingual dictionary BabelNet [2] can be used for evaluation. We restrain our current research to dictionary data and improving language models rather than introducing external sources.

AB - The growth of web accessible dictionary and term data has led to a proliferationof platforms distributing the same lexical resources in different combinationsand packagings. Finding the right word or translation is like finding a needle ina haystack. The quantity of the data is undercut by the redundancy and doubtful quality of the resources. In this paper, we develop ways to assess the quality of multilingual lexical web and linked data resources by internal consistency. Concretely, we deconstruct Princeton WordNet [1] to its component word senses or word labels, with the properties they have or inherit from their synsets, and see to what extent these properties allow reconstructing the synsets they came from. The methods developed should then be applicable to aggregation of term data coming from different term sources - to find which entries coming from different sources could be similarly pooled together, to cut redundancy and improve coverage and reliability. The multilingual dictionary BabelNet [2] can be used for evaluation. We restrain our current research to dictionary data and improving language models rather than introducing external sources.

KW - 113 Computer and information sciences

KW - Information extraction

KW - Linked data

KW - Edit distance

KW - 6160 Other humanities

KW - Quality checking

KW - Terminology

KW - Aggregation

M3 - Conference contribution

VL - 1766

T3 - CEUR workshop proceedings

SP - 241

EP - 242

BT - 15th International Semantic Web Conference (ISWC 2016)

A2 - Shvaiko, Pavel

A2 - Euzenat, Jérôme

A2 - Jiménez-Ruiz, Ernesto

A2 - Cheatham, Michelle

A2 - Hassanzadeh, Oktie

A2 - Ichise, Ryutaro

ER -

Ji K, Wang S, Carlson LH. Quality Checking and Matching Linked Dictionary Data. In Shvaiko P, Euzenat J, Jiménez-Ruiz E, Cheatham M, Hassanzadeh O, Ichise R, editors, 15th International Semantic Web Conference (ISWC 2016): the 11th International Workshop on Ontology Matching. Vol. 1766. 2016. p. 241-242. (CEUR workshop proceedings).