Word Clustering for Historical Newspapers Analysis

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

This paper is a part of a collaboration between computer scientists and historians aimed at development of novel methods for historical newspapers analysis. We present a case study of ideological termsending with -ism suffix in nineteenthcentury Finnish newspapers. We propose a two-step procedure to trace differences in word usages over time: trainingof diachronic embeddings on several timeslices and when clustering embeddings ofselected words together with their neighbours to obtain historical context. The obtained clusters turn out to be useful for historical studies. The paper also discusses specific difficulties related to development of historian-oriented tools.
Originalspråkengelska
Titel på gästpublikationWorkshop on Language Technology for Digital Historical Archives : with a Special Focus on Central-, (South-)Eastern Europe, Middle East and North Africa (LT-DHA2019)
Antal sidor8
UtgivningsortRed Hook, NY
FörlagCurran Associates Inc.
Utgivningsdatum2019
Sidor3-10
ISBN (tryckt)978-1-7138-0298-3
DOI
StatusPublicerad - 2019
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangWorkshop on Language Technology for Digital Historical Archives : with a Special Focus on Central-, (South-)Eastern Europe, Middle East and North Africa (LT-DHA2019) - Varna, Bulgarien
Varaktighet: 5 aug 20195 sep 2019
https://www.inf.uni-hamburg.de/inst/dmp/hercore/publications/ltdha.html

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här