Sammanfattning
The main motivation for this work lies in the need to track discourse dynamics in historical corpora. However, in many real use cases ground truth is not available and annotating discourses on a corpus-level is hardly possible. We propose a novel procedure to generate synthetic datasets for this task, a novel evaluation framework and a set of benchmarking models. Finally, we run large-scale experiments using these synthetic datasets and demonstrate that a model trained on such a dataset can obtain meaningful results when applied to a real dataset, without any adjustments of the model.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021) |
Redaktörer | Yasunobu Sumikawa , Ryohei Ikejiri , Antoine Doucet, Eva Pfanzelter, Mohammed Hasanuzzaman, Gaël Dias, Ian Milligan, Adam Jatowt |
Antal sidor | 12 |
Utgivningsort | Aachen |
Förlag | CEUR-WS.org |
Utgivningsdatum | sep. 2021 |
Status | Publicerad - sep. 2021 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | International Workshop on Computational History - Varaktighet: 30 sep. 2021 → 1 okt. 2021 Konferensnummer: 6 https://sites.google.com/view/histoinformatics2021workshop/home |
Publikationsserier
Namn | CEUR workshop proceedings |
---|---|
Förlag | CEUR-WS.org |
Volym | 2981 |
ISSN (elektroniskt) | 1613-0073 |
Vetenskapsgrenar
- 113 Data- och informationsvetenskap