Abstrakti
The growth of web accessible dictionary and term data has led to a proliferation
of platforms distributing the same lexical resources in different combinations
and packagings. Finding the right word or translation is like finding a needle in
a haystack. The quantity of the data is undercut by the redundancy and doubtful quality of the resources. In this paper, we develop ways to assess the quality of multilingual lexical web and linked data resources by internal consistency. Concretely, we deconstruct Princeton WordNet [1] to its component word senses or word labels, with the properties they have or inherit from their synsets, and see to what extent these properties allow reconstructing the synsets they came from. The methods developed should then be applicable to aggregation of term data coming from different term sources - to find which entries coming from different sources could be similarly pooled together, to cut redundancy and improve coverage and reliability. The multilingual dictionary BabelNet [2] can be used for evaluation. We restrain our current research to dictionary data and improving language models rather than introducing external sources.
of platforms distributing the same lexical resources in different combinations
and packagings. Finding the right word or translation is like finding a needle in
a haystack. The quantity of the data is undercut by the redundancy and doubtful quality of the resources. In this paper, we develop ways to assess the quality of multilingual lexical web and linked data resources by internal consistency. Concretely, we deconstruct Princeton WordNet [1] to its component word senses or word labels, with the properties they have or inherit from their synsets, and see to what extent these properties allow reconstructing the synsets they came from. The methods developed should then be applicable to aggregation of term data coming from different term sources - to find which entries coming from different sources could be similarly pooled together, to cut redundancy and improve coverage and reliability. The multilingual dictionary BabelNet [2] can be used for evaluation. We restrain our current research to dictionary data and improving language models rather than introducing external sources.
Alkuperäiskieli | englanti |
---|---|
Otsikko | 15th International Semantic Web Conference (ISWC 2016) : the 11th International Workshop on Ontology Matching |
Toimittajat | Pavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh, Ryutaro Ichise |
Sivumäärä | 2 |
Vuosikerta | 1766 |
Julkaisupäivä | 22 lokak. 2016 |
Sivut | 241-242 |
Tila | Julkaistu - 22 lokak. 2016 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | International Semantic Web Conference - Kobe, Japani Kesto: 17 lokak. 2016 → 21 lokak. 2016 Konferenssinumero: 15 |
Julkaisusarja
Nimi | CEUR workshop proceedings |
---|---|
ISSN (elektroninen) | 1613-0073 |
Lisätietoja
Proceeding volume: 1766
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet
- 6160 Muut humanistiset tieteet