XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Original languageEnglish
Title of host publicationThe 28th International Conference on Computational Linguistics : COLING 2020
Publication statusAccepted/In press - 2020
MoE publication typeA4 Article in conference proceedings
EventCOLING 2020 - Online
Duration: 8 Dec 202013 Dec 2020
https://coling2020.org/

Cite this