Keyword spotting for audiovisual archival search in Uralic languages

Nils Hjortnæs, Niko Partanen, Francis M. Tyers

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

In this study we investigate the potential of using Automatic Speech Recognition (ASR) for keyword spotting for four Uralic languages: Finnish, Hungarian, Estonian and Komi. These languages also represent different levels on the high and low resource continuum. Although the accuracy of the ASR systems show there is a long way to go, we show that they still have potential to be useful for downstream tasks such as keyword spotting. By using a simple text search after running ASR, we are already able to achieve an F1 score of between 0.15 and 0.33, a precision of nearly 0.90 for Estonian and Hungarian, and a precision of 0.76 for Komi.

Alkuperäiskielienglanti
OtsikkoIWCLUL 2021 - 7th International Workshop on Computational Linguistics of Uralic Languages, Proceedings
Sivumäärä7
KustantajaAssociation for Computational Linguistics (ACL)
Julkaisupäivä2021
Sivut20-26
ISBN (elektroninen)9781954085824
TilaJulkaistu - 2021
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
Tapahtuma7th International Workshop on Computational Linguistics of Uralic Languages, IWCLUL 2021 - Virtual, Syktyvkar, Venäjä
Kesto: 23 syysk. 202124 syysk. 2021

Julkaisusarja

NimiIWCLUL 2021 - 7th International Workshop on Computational Linguistics of Uralic Languages, Proceedings

Lisätietoja

Publisher Copyright:
© 2021 IWCLUL 2021 - 7th International Workshop on Computational Linguistics of Uralic Languages, Proceedings. All rights reserved.

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä