Benchmarks for Unsupervised Discourse Change Detection

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

The main motivation for this work lies in the need to track discourse dynamics in historical corpora. However, in many real use cases ground truth is not available and annotating discourses on a corpus-level is hardly possible. We propose a novel procedure to generate synthetic datasets for this task, a novel evaluation framework and a set of benchmarking models. Finally, we run large-scale experiments using these synthetic datasets and demonstrate that a model trained on such a dataset can obtain meaningful results when applied to a real dataset, without any adjustments of the model.
Original languageEnglish
Title of host publicationProceedings of the 6th International Workshop on Computational History (HistoInformatics 2021)
EditorsYasunobu Sumikawa , Ryohei Ikejiri , Antoine Doucet, Eva Pfanzelter, Mohammed Hasanuzzaman, Gaël Dias, Ian Milligan, Adam Jatowt
Number of pages12
Place of PublicationAachen
PublisherCEUR-WS.org
Publication dateSept 2021
Publication statusPublished - Sept 2021
MoE publication typeA4 Article in conference proceedings
EventInternational Workshop on Computational History -
Duration: 30 Sept 20211 Oct 2021
Conference number: 6
https://sites.google.com/view/histoinformatics2021workshop/home

Publication series

NameCEUR workshop proceedings
PublisherCEUR-WS.org
Volume2981
ISSN (Electronic)1613-0073

Fields of Science

  • 113 Computer and information sciences

Cite this