Abstract
The main motivation for this work lies in the need to track discourse dynamics in historical corpora. However, in many real use cases ground truth is not available and annotating discourses on a corpus-level is hardly possible. We propose a novel procedure to generate synthetic datasets for this task, a novel evaluation framework and a set of benchmarking models. Finally, we run large-scale experiments using these synthetic datasets and demonstrate that a model trained on such a dataset can obtain meaningful results when applied to a real dataset, without any adjustments of the model.
Original language | English |
---|---|
Title of host publication | Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021) |
Editors | Yasunobu Sumikawa , Ryohei Ikejiri , Antoine Doucet, Eva Pfanzelter, Mohammed Hasanuzzaman, Gaël Dias, Ian Milligan, Adam Jatowt |
Number of pages | 12 |
Place of Publication | Aachen |
Publisher | CEUR-WS.org |
Publication date | Sept 2021 |
Publication status | Published - Sept 2021 |
MoE publication type | A4 Article in conference proceedings |
Event | International Workshop on Computational History - Duration: 30 Sept 2021 → 1 Oct 2021 Conference number: 6 https://sites.google.com/view/histoinformatics2021workshop/home |
Publication series
Name | CEUR workshop proceedings |
---|---|
Publisher | CEUR-WS.org |
Volume | 2981 |
ISSN (Electronic) | 1613-0073 |
Fields of Science
- 113 Computer and information sciences