Synchronized Mediawiki based analyzer dictionary development

Jack Rueter, Mika Hämäläinen

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” ; line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology.
Original languageEnglish
Title of host publication3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) : St. Petersburg, Russia 23 – 24 January 2017
EditorsFrancis M. Tyers, Michael Rießler, Tommi A. Pirinen , Trond Trosterud
Number of pages7
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication date2017
Pages1-7
Article number2
ISBN (Print)978-1-5108-3665-5
DOIs
Publication statusPublished - 2017
MoE publication typeA4 Article in conference proceedings
EventInternational Workshop for
Computational Linguistics of Uralic Languages
- St. Petersburg, Russian Federation
Duration: 23 Jan 201724 Jan 2017
Conference number: 3

Fields of Science

  • 6121 Languages
  • Open-source
  • Analyzer dictionary development
  • Wiki-based dictionary
  • Synchronized dictionary editing
  • Uralic Languages
  • Semantics
  • Morphology
  • Morpho-syntactic data
  • Etymology
  • Harmonization of Saami Language Infrastructure 2019

    Trond Trosterud (Speaker: Chair), Jack Rueter (Speaker: Presenter), Joshua Wilbur (Speaker: Presenter) & Antonsen, Lene (Speaker: Presenter)

    4 Apr 20199 Apr 2019

    Activity: Participating in or organising an event typesOrganisation and participation in conferences, workshops, courses, seminars

  • Mari FST and Corpus work

    Jack Rueter (Consultant), Trond Trosterud (Consultant) & Jeremy Bradley (Consultant)

    6 Jan 20198 Jan 2019

    Activity: Consultancy typesConsultancy

  • Research Data and Humanities 2019

    Jack Rueter (Speaker: Presenter), Mika Hämäläinen (Speaker: Presenter) & Khalid Alnajjar (Speaker: Presenter)

    14 Aug 201916 Aug 2019

    Activity: Participating in or organising an event typesOrganisation and participation in conferences, workshops, courses, seminars

Cite this