Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks

Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Ragheb AlGhezi, Mietta Lennes, Tamás Grósz, Krister Lindén, Mikko Kurimo

Forskningsoutput: TidskriftsbidragArtikelVetenskapligPeer review

Sammanfattning

In 2020-2021, the Donate Speech campaign gathered approximately 3600 h of ordinary, colloquial Finnish speech for the Lahjoita puhetta (Donate Speech) corpus, which includes over twenty thousand speakers from all the regions of Finland and from all age brackets. The goals of the collection were to create a representative, large-scale resource of spontaneous spoken Finnish to accelerate the development of language technology and speech-based services.
Originalspråkengelska
TidskriftLanguage Resources and Evaluation
Volym57
Sidor (från-till)1295–1327
Antal sidor33
ISSN1574-020X
DOI
StatusPublicerad - 9 aug. 2022
MoE-publikationstypA1 Tidskriftsartikel-refererad

Bibliografisk information

Publisher Copyright:
© 2022, The Author(s).

Vetenskapsgrenar

  • 6121 Språkvetenskaper

Citera det här