Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text. To our knowledge this will be the largest publicly available dataset with prosodic labels. We describe the dataset construction and the resulting benchmark dataset in detail and train a number of different models ranging from feature-based classifiers to neural network systems for the prediction of discretized prosodic prominence. We show that pre-trained contextualized word representations from BERT outperform the other models even with less than 10% of the training data. Finally we discuss the dataset in light of the results and point to future research and plans for further improving both the dataset and methods of predicting prosodic prominence from text. The dataset and the code for the models are publicly available.
Originalspråkengelska
Titel på värdpublikation22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference
RedaktörerMareike Hartmann, Barbara Plank
Antal sidor10
UtgivningsortLinköping
FörlagLinköping University Electronic Press
Utgivningsdatum30 sep. 2019
Sidor281–290
ISBN (elektroniskt)978-91-7929-995-8
StatusPublicerad - 30 sep. 2019
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangNodalida - Nordic Conference on Computational Linguistics - Turku, Finland
Varaktighet: 30 sep. 20192 okt. 2019
Konferensnummer: 22
https://nodalida2019.org/

Publikationsserier

NamnLinköping Electronic Conference Proceedings
FörlagLinköping University Electronic Press
Nummer167
ISSN (tryckt)1650-3686
ISSN (elektroniskt)1650-3740
NamnNEALT Proceedings Series
Nummer42

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap
  • 6121 Språkvetenskaper

Citera det här