Sammanfattning
We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | Proceedings of the 28th International Conference on Computational Linguistics |
Redaktörer | Donia Scott, Nuria Bel, Chengqing Zong |
Antal sidor | 11 |
Förlag | International Committee on Computational Linguistics |
Utgivningsdatum | 2020 |
Sidor | 6542–6552 |
ISBN (elektroniskt) | 978-1-952148-27-9 |
DOI | |
Status | Publicerad - 2020 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | International Conference on Computational Linguistics - [Online event] Varaktighet: 8 dec. 2020 → 13 dec. 2020 Konferensnummer: 28 |
Vetenskapsgrenar
- 6121 Språkvetenskaper
- 113 Data- och informationsvetenskap
Forskningsdatauppsättningar
-
XED
Kajava, K. (Medverkande) & Öhman, E. (Skapad av), Github, 2020
https://github.com/Helsinki-NLP/XED
Datauppsättning
-