Utilizing language technology in the documentation of endangered Uralic languages

Ciprian Gerstenberger, Niko Partanen, Michael Rießler, Joshua Wilbur

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Kuvaus

The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future.
Alkuperäiskielienglanti
LehtiNorthern European Journal of Language Technology
Vuosikerta4
Sivut29-47
Sivumäärä19
ISSN2000-1533
DOI - pysyväislinkit
TilaJulkaistu - 2016
Julkaistu ulkoisestiKyllä
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 6121 Kielitieteet

Lainaa tätä

@article{50622b167ae948a394d59a1696871396,
title = "Utilizing language technology in the documentation of endangered Uralic languages",
abstract = "The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future.",
keywords = "6121 Languages",
author = "Ciprian Gerstenberger and Niko Partanen and Michael Rie{\ss}ler and Joshua Wilbur",
year = "2016",
doi = "10.3384/nejlt.2000-1533.1643",
language = "English",
volume = "4",
pages = "29--47",
journal = "Northern European Journal of Language Technology",
issn = "2000-1533",
publisher = "Link{\"o}ping University Electronic Press (LiU E-Press)",

}

Utilizing language technology in the documentation of endangered Uralic languages. / Gerstenberger, Ciprian; Partanen, Niko; Rießler, Michael; Wilbur, Joshua.

julkaisussa: Northern European Journal of Language Technology, Vuosikerta 4, 2016, s. 29-47.

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

TY - JOUR

T1 - Utilizing language technology in the documentation of endangered Uralic languages

AU - Gerstenberger, Ciprian

AU - Partanen, Niko

AU - Rießler, Michael

AU - Wilbur, Joshua

PY - 2016

Y1 - 2016

N2 - The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future.

AB - The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future.

KW - 6121 Languages

U2 - 10.3384/nejlt.2000-1533.1643

DO - 10.3384/nejlt.2000-1533.1643

M3 - Article

VL - 4

SP - 29

EP - 47

JO - Northern European Journal of Language Technology

JF - Northern European Journal of Language Technology

SN - 2000-1533

ER -