Abstract
We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Original language | English |
---|---|
Title of host publication | Proceedings of the 28th International Conference on Computational Linguistics |
Editors | Donia Scott, Nuria Bel, Chengqing Zong |
Number of pages | 11 |
Publisher | International Committee on Computational Linguistics |
Publication date | 2020 |
Pages | 6542–6552 |
ISBN (Electronic) | 978-1-952148-27-9 |
DOIs | |
Publication status | Published - 2020 |
MoE publication type | A4 Article in conference proceedings |
Event | International Conference on Computational Linguistics - [Online event] Duration: 8 Dec 2020 → 13 Dec 2020 Conference number: 28 |
Fields of Science
- 6121 Languages
- 113 Computer and information sciences
Datasets
-
XED
Kajava, K. (Contributor) & Öhman, E. (Creator), Github, 2020
https://github.com/Helsinki-NLP/XED
Dataset
-