XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Original languageEnglish
Title of host publicationProceedings of the 28th International Conference on Computational Linguistics
EditorsDonia Scott, Nuria Bel, Chengqing Zong
Number of pages11
PublisherInternational Committee on Computational Linguistics
Publication date2020
Pages6542–6552
ISBN (Electronic)978-1-952148-27-9
DOIs
Publication statusPublished - 2020
MoE publication typeA4 Article in conference proceedings
EventInternational Conference on Computational Linguistics - [Online event]
Duration: 8 Dec 202013 Dec 2020
Conference number: 28

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this