Detecting Sequential Genre Change in Eighteenth-Century Texts

Jinbin Zhang, Yann Ciarán Ryan, Iiro Rastas, Filip Ginter, Mikko Tolonen, Rohit Babbar

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Machine classification of historical books into genres is a common task for NLP-based classifiers and has a number of applications, from literary analysis to information retrieval. However it is not a straightforward task, as genre labels can be ambiguous and subject to temporal change, and moreoever many books consist of mixed or miscellaneous genres. In this paper we describe a work-in-progress method by which genre predictions can be used to determine longer sequences of genre change within books, which we test out with visualisations of some hand-picked texts. We apply state-of-the-art methods to the task, including a BERT-based transformer and character-level Perceiver model, both pre-trained on a large collection of eighteenth century works (ECCO), using a new set of hand-annotated documents created to reflect historical divisions. Results show that both models perform significantly better than a linear baseline, particularly when ECCO-BERT is combined with tfidf features, though for this task the character-level model provides no obvious advantage. Initial evaluation of the genre sequence method shows it may in the future be useful in determining and dividing the multiple genres of miscellaneous and hybrid historical texts.
Alkuperäiskielienglanti
OtsikkoProceedings of the Computational Humanities Research Conference 2022
ToimittajatFolgert Karsdorp, Alie Lassche, Kristoffer Nielbo
Sivumäärä13
JulkaisupaikkaAachen
KustantajaCEUR-WS.org
Julkaisupäivä12 jouluk. 2022
Sivut243-255
TilaJulkaistu - 12 jouluk. 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaComputational Humanities Research Conference - Antwerp, Belgia
Kesto: 12 jouluk. 202214 jouluk. 2022
Konferenssinumero: 3

Julkaisusarja

NimiCEUR Workshop Proceedings
Kustantaja CEUR-WS.org
Vuosikerta3290
ISSN (elektroninen)1613-0073

Tieteenalat

  • 615 Historia ja arkeologia

Siteeraa tätä