Synchronized Mediawiki based analyzer dictionary development

Jack Rueter, Mika Hämäläinen

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review


Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer->. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” ; line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology.
Titel på värdpublikation3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) : St. Petersburg, Russia 23 – 24 January 2017
RedaktörerFrancis M. Tyers, Michael Rießler, Tommi A. Pirinen , Trond Trosterud
Antal sidor7
FörlagThe Association for Computational Linguistics
ISBN (tryckt)978-1-5108-3665-5
StatusPublicerad - 2017
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Workshop for
Computational Linguistics of Uralic Languages
- St. Petersburg, Ryssland
Varaktighet: 23 jan. 201724 jan. 2017
Konferensnummer: 3


  • 6121 Språkvetenskaper

Citera det här