High Performance Computing for the Detection and Analysis of Historical Discourses

Projekt: Forskningsprojekt


Beskrivning (abstrakt)

This project will use HPC to detect discourses from large historical corpora of the eighteenth century (e.g., books, pamphlets, newspapers), and study the interconnections and evolution of the detected discourses. The approach is to analyse historical corpora in a nuanced, thorough fashion: nuanced, because we analyse the available corpora at various levels of conceptual granularity, starting from the raw documents as first elements, and then progressively discovering intermediate linguistic elements (keywords, topics, genres) and higher-level notions (concepts such as “the economy” or “the state” and discourses about them); and thorough, in the sense that the analysis is performed jointly over the entire corpora (billions of words, comprising a large fraction of all existing literature from the period). This approach contrasts traditional historical scholarship, which often uses a single element as a starting point (e.g., a passage attributed to a single well-known historical figure) and then aims to generalize from it, typically using a limited number of documents as corroborating sources. In addition, our approach also contrasts modern historical scholarship, which uses “big data” but performs the analysis at a very aggregate level. Compared to previous scholarship, our approach has the potential to discover unknown and richer insights from historical corpora that traditional approaches have missed.
Gällande start-/slutdatum01/01/202231/12/2024


  • Academy of Finland: 1 085 855,00 €
  • Suomen Akatemia Projektilaskutus: 209 351,00 €