Word Clustering for Historical Newspapers Analysis

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper is a part of a collaboration between computer scientists and historians aimed at development of novel methods for historical newspapers analysis. We present a case study of ideological termsending with -ism suffix in nineteenthcentury Finnish newspapers. We propose a two-step procedure to trace differences in word usages over time: trainingof diachronic embeddings on several timeslices and when clustering embeddings ofselected words together with their neighbours to obtain historical context. The obtained clusters turn out to be useful for historical studies. The paper also discusses specific difficulties related to development of historian-oriented tools.
Original languageEnglish
Title of host publicationWorkshop on Language Technology for Digital Historical Archives : with a Special Focus on Central-, (South-)Eastern Europe, Middle East and North Africa (LT-DHA2019)
Number of pages8
Place of PublicationRed Hook, NY
PublisherCurran Associates Inc.
Publication date2019
Pages3-10
ISBN (Print)978-1-7138-0298-3
DOIs
Publication statusPublished - 2019
MoE publication typeA4 Article in conference proceedings
EventWorkshop on Language Technology for Digital Historical Archives : with a Special Focus on Central-, (South-)Eastern Europe, Middle East and North Africa (LT-DHA2019) - Varna, Bulgaria
Duration: 5 Aug 20195 Sep 2019
https://www.inf.uni-hamburg.de/inst/dmp/hercore/publications/ltdha.html

Fields of Science

  • 113 Computer and information sciences

Cite this