Abstrakti
The growth of Web-accessible dictionaries and term data
has led to a proliferation of platforms distributing the same lexical resources in different combinations and packagings. Finding the right word or translation is like finding a needle in a haystack. The quantity of the
data is undercut by the doubtful quality of the resources.
Our aim is to cut down the quantity and raise the quality by matching and aggregating entries within and across dictionaries. In this exploratory paper, our goal is to see how far we can get by using information extracted from multiple dictionaries themselves. Our hypothesis is that the more limited quantity of data in dictionaries is compensated by their richer structure and more concentrated information content. We hope to take advantage of the structure of dictionaries by basing quality criteria and measures on linguistic and terminological considerations. The plan of campaign is to derive quality criteria to recognize well-constructed
dictionary entries from a model dictionary, and then attempt to convert the criteria into language-independent frequency-based measures. As a model dictionary we use the Princeton WordNet. The
measures derived from it are tested against data extracted from BabelNet.
has led to a proliferation of platforms distributing the same lexical resources in different combinations and packagings. Finding the right word or translation is like finding a needle in a haystack. The quantity of the
data is undercut by the doubtful quality of the resources.
Our aim is to cut down the quantity and raise the quality by matching and aggregating entries within and across dictionaries. In this exploratory paper, our goal is to see how far we can get by using information extracted from multiple dictionaries themselves. Our hypothesis is that the more limited quantity of data in dictionaries is compensated by their richer structure and more concentrated information content. We hope to take advantage of the structure of dictionaries by basing quality criteria and measures on linguistic and terminological considerations. The plan of campaign is to derive quality criteria to recognize well-constructed
dictionary entries from a model dictionary, and then attempt to convert the criteria into language-independent frequency-based measures. As a model dictionary we use the Princeton WordNet. The
measures derived from it are tested against data extracted from BabelNet.
Alkuperäiskieli | englanti |
---|---|
Otsikko | 15th International Semantic Web Conference (ISWC 2016) : the Fourth International Workshop on Linked Data for Information Extraction (LD4IE 2016) |
Toimittajat | Anna Lisa Gentile, Claudia d'Amato, Ziqi Zhang, Heiko Paulheim |
Sivumäärä | 12 |
Vuosikerta | 1699 |
Julkaisupaikka | Kobe |
Kustantaja | CEUR-WS.org |
Julkaisupäivä | 4 lokak. 2016 |
Sivut | 51-62 |
Tila | Julkaistu - 4 lokak. 2016 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | International Semantic Web Conference - Kobe, Japani Kesto: 17 lokak. 2016 → 21 lokak. 2016 Konferenssinumero: 15 https://iswc2016.semanticweb.org/ |
Julkaisusarja
Nimi | CEUR Workshop Proceedings |
---|---|
ISSN (elektroninen) | 1613-0073 |
Lisätietoja
Proceeding volume: 1699
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet
- 6160 Muut humanistiset tieteet