Is It Possible to Create a Very Large WordNet in 100 days? -- an Evaluation

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Abstrakti

Wordnets are large-scale lexical databases of related words and concepts, useful for language-aware software applications. They have recently been built for many languages by using various approaches.

The Finnish wordnet, FinnWordNet (FiWN), was created by translating the more than 200,000 word senses in the English Princeton WordNet (PWN) 3.0 in 100 days. To ensure quality, they were translated by professional translators. The direct translation approach was based on the assumption that most synsets in PWN represent language-independent real-world concepts. Thus also the semantic relations between synsets were assumed mostly language-independent, so the structure of PWN could be reused as well. This approach allowed the creation of an extensive Finnish wordnet directly aligned with PWN and also provided us with a translation relation and thus a bilingual wordnet usable as a
dictionary.

In this paper, we address several concerns raised with regard to  our approach in one single paper, many of them for the first time. We evaluate the craftsmanship of the translators by checking the spelling and translation quality, the viability of the approach by assessing the synonym quality both on the lexeme and concept level, as well as the usefulness of the resulting lexical resource both for humans and in a language-technological task. We discovered no new problems compared
with those already known in PWN. As a whole, the paper contributes to the scientific discourse on what it takes to create a very large wordnet.

As a side-effect of the evaluation, we extended FiWN to contain 208,645 word senses in 120,449 synsets, effectively making version 2.0 of FiWN the currently largest wordnet in the world by these statistics.

Alkuperäiskielienglanti
LehtiLanguage Resources and Evaluation
Vuosikerta48
Numero2
Sivut191-201
Sivumäärä10
ISSN1574-020X
DOI - pysyväislinkit
TilaJulkaistu - 2013
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 6121 Kielitieteet

Siteeraa tätä