XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Originalspråkengelska
Titel på värdpublikationProceedings of the 28th International Conference on Computational Linguistics
RedaktörerDonia Scott, Nuria Bel, Chengqing Zong
Antal sidor11
FörlagInternational Committee on Computational Linguistics
Utgivningsdatum2020
Sidor6542–6552
ISBN (elektroniskt)978-1-952148-27-9
DOI
StatusPublicerad - 2020
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Conference on Computational Linguistics - [Online event]
Varaktighet: 8 dec. 202013 dec. 2020
Konferensnummer: 28

Vetenskapsgrenar

  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här