Wiktextract: A utility for extracting data from Wiktionary

Research output: Non-textual formSoftwareScientific

Abstract

This tool extracts glosses, parts-of-speech, declension/conjugation information when available, translations for all languages when available, pronunciations (including audio file links), qualifiers including usage notes, word forms, links between words including hypernyms, hyponym, holonyms, meronyms, related words, derived terms, compounds, alternative forms, etc. For many classes of words, a word sense is annotated with specific information such as what ward it is a form of, what is the RGB value of the color it represents, what is the numeric value of the number, what SI unit it represents, etc.
Original languageEnglish
Place of Publicationgithub
PublisherTatu Ylonen
Publication statusPublished - 1 Jan 2019
MoE publication typeI2 ICT software

Fields of Science

  • 6121 Languages
  • lexicon

Cite this