Synchronized Mediawiki based analyzer dictionary development

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” ; line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology.
Original languageEnglish
Title of host publication3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017) : St. Petersburg, Russia 23 – 24 January 2017
EditorsFrancis M. Tyers, Michael Rießler, Tommi A. Pirinen , Trond Trosterud
Number of pages7
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication date2017
Pages1-7
Article number2
ISBN (Print)978-1-5108-3665-5
Publication statusPublished - 2017
MoE publication typeA4 Article in conference proceedings
EventInternational Workshop for
Computational Linguistics of Uralic Languages
- St. Petersburg, Russian Federation
Duration: 23 Jan 201724 Jan 2017
Conference number: 3

Fields of Science

  • 6121 Languages
  • Open-source
  • Analyzer dictionary development
  • Wiki-based dictionary
  • Synchronized dictionary editing
  • Uralic Languages
  • Semantics
  • Morphology
  • Morpho-syntactic data
  • Etymology

Cite this

Rueter, J., & Hämäläinen, M. (2017). Synchronized Mediawiki based analyzer dictionary development. In F. M. Tyers, M. Rießler, T. A. Pirinen , & T. Trosterud (Eds.), 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017): St. Petersburg, Russia 23 – 24 January 2017 (pp. 1-7). [2] Stroudsburg: The Association for Computational Linguistics.
Rueter, Jack ; Hämäläinen, Mika. / Synchronized Mediawiki based analyzer dictionary development. 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017): St. Petersburg, Russia 23 – 24 January 2017. editor / Francis M. Tyers ; Michael Rießler ; Tommi A. Pirinen ; Trond Trosterud . Stroudsburg : The Association for Computational Linguistics, 2017. pp. 1-7
@inproceedings{263590f5c0614385a3fe982ba43fd84e,
title = "Synchronized Mediawiki based analyzer dictionary development",
abstract = "Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Troms{\o} demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” ; line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology.",
keywords = "6121 Languages, Open-source, Analyzer dictionary development, Wiki-based dictionary, Synchronized dictionary editing, Uralic Languages, Semantics, Morphology, Morpho-syntactic data, Etymology",
author = "Jack Rueter and Mika H{\"a}m{\"a}l{\"a}inen",
note = "Volume: Proceeding volume:",
year = "2017",
language = "English",
isbn = "978-1-5108-3665-5",
pages = "1--7",
editor = "Tyers, {Francis M. } and { Rie{\ss}ler}, Michael and { Pirinen }, { Tommi A. } and {Trosterud }, {Trond }",
booktitle = "3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017)",
publisher = "The Association for Computational Linguistics",
address = "United States",

}

Rueter, J & Hämäläinen, M 2017, Synchronized Mediawiki based analyzer dictionary development. in FM Tyers, M Rießler, TA Pirinen & T Trosterud (eds), 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017): St. Petersburg, Russia 23 – 24 January 2017., 2, The Association for Computational Linguistics, Stroudsburg, pp. 1-7, International Workshop for
Computational Linguistics of Uralic Languages, St. Petersburg, Russian Federation, 23/01/2017.

Synchronized Mediawiki based analyzer dictionary development. / Rueter, Jack; Hämäläinen, Mika.

3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017): St. Petersburg, Russia 23 – 24 January 2017. ed. / Francis M. Tyers; Michael Rießler; Tommi A. Pirinen ; Trond Trosterud . Stroudsburg : The Association for Computational Linguistics, 2017. p. 1-7 2.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Synchronized Mediawiki based analyzer dictionary development

AU - Rueter, Jack

AU - Hämäläinen, Mika

N1 - Volume: Proceeding volume:

PY - 2017

Y1 - 2017

N2 - Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” ; line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology.

AB - Open-source analyzer dictionary development is being implemented for Skolt Sami, Ingrian, Moksha-Mordvin, etc. in the Helsinki CSC infrastructure; home of the Finnish Kielipankki ’Language Bank’ and Termipankki ’Term Bank’. The proximity of minority-language corpora in need of annotation and the multiple usage of controlled wikimedia-type dictionaries make CSC an attractive site for synchronized transducer dictionary development. The open-source FST develop- ment of Uralic and other minority languages at Giellatekno-Divvun in Tromsø demonstrates a vast potential for reusage of FST-s, only augmented by open- source work in OmorFi, Apertium and Universal Dependency <http://univer- saldependencies.org/#language-urj>. The initial idea is to allow synchronized editing of Giellatekno xml and CSC wiki structures via github. In addition to allowing for simple lexc LEMMA:STEM CONTINUATION_LEXICON ”TRANS- LATION” ; line exports, the parallel dictionaries will provide for documentation of derivation, morpho-syntactic information on valency and government, seman- tics and etymology.

KW - 6121 Languages

KW - Open-source

KW - Analyzer dictionary development

KW - Wiki-based dictionary

KW - Synchronized dictionary editing

KW - Uralic Languages

KW - Semantics

KW - Morphology

KW - Morpho-syntactic data

KW - Etymology

M3 - Conference contribution

SN - 978-1-5108-3665-5

SP - 1

EP - 7

BT - 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017)

A2 - Tyers, Francis M.

A2 - Rießler, Michael

A2 - Pirinen , Tommi A.

A2 - Trosterud , Trond

PB - The Association for Computational Linguistics

CY - Stroudsburg

ER -

Rueter J, Hämäläinen M. Synchronized Mediawiki based analyzer dictionary development. In Tyers FM, Rießler M, Pirinen TA, Trosterud T, editors, 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017): St. Petersburg, Russia 23 – 24 January 2017. Stroudsburg: The Association for Computational Linguistics. 2017. p. 1-7. 2