• PL 24 (Unioninkatu 40)

    00014

    Finland

  • Finland

20022025

Forskningsoutput per år

Personlig profil

Information om forskning och undervisning

Mikko Tolonen is professor of Digital Humanities at the Faculty of Arts at the University of Helsinki. He has a PhD in intellectual history (2010) and he is the PI and founder of the Helsinki Computational History Group (COMHIS) at the Department of Digital Humanities.

Tolonen's main research focus is on the Enlightenment Era and integrated interdisciplinary studies of public discourse, knowledge production and book & intellectual history that combines metadata from library catalogues as well as full-text and image libraries of books, newspapers and periodicals in early modern Europe. He is one of the editors of Hume's History of England for Oxford University Press. In 2016, he was awarded an Open Science and Research Award by Finnish Ministry of Education. In 2025 he gave the annual Voltaire Foundation lecture on Digital Enlightenment Studies at the University of Oxford. 

In digital humanities education, he believes in project based teaching exemplified in the annual award-winning Helsinki Digital Humanities Hackathon that he founded in 2015. Tolonen is also the local PI at UH within two Marie Curie Training Networks (CASCADE and MECANO).

Tolonen is active in in development of infrastructures and collaborative networks. He has served in the executive board of European Association for Digital Humanities (EADH) and as the chair of Digital Humanities in the Nordic and Baltic Countries (DHNB). He has also done his share to advance humanities research infrastructure building in Finland (FIN-CLARIAH and DARIAH-FI).

Tolonen supervises work across COMHIS’s wide research spectrum and welcomes contact from talented early-career scholars keen to collaborate with his group. Former international postdocs from the group have since secured lecturer positions at leading European universities.

For interviews, see: CSC interview (in Finnish) & 375 humanists (in English and other languages) & for bluesky (bluesky (@tolonen.bsky.social) and twitter (@mikko_tolonen).

Tolonen's work

Mikko Tolonen’s work spans digital humanities, book history and intellectual history, focusing on early modern print culture and computational analysis. The COMHIS approach towards books as cultural artefacts, data and vehicles of meaning can be described as holistic (in other words, books are not treated merely as texts).

Tolonen's research has been published in central digital humanities forums (Journal of Cultural AnalyticsDigital Scholarship in the Humanities, Historical MethodsDigital Enlightenment StudiesJournal of Open Humanities Data and main NLP related venues such as ACL). At the same time, he has been publishing also in well-established traditional journals, such as Historical Journal, Eighteenth-Century StudiesHuntington Library Quarterly, Explorations in Economic History and in books by Oxford University Press and Cambridge University Press. This versatility is crucial for computational history.

Publishing in best traditional forums functions as the litmus test for the relevance of computation in historical research. These results reflect COMHIS’s long-term strategy: most publications stem from integrated interdisciplinary work and are co-authored by group members, at times together with external partners.

Below, some of Tolonen's publications are grouped into thematic categories, with each category highlighting the evolution of computational methods (from early text-reuse detection to advanced machine learning) applied to those topics. Each category lists selected publications (with title and year and link to the full publication details including access to the article itself).

Book History and Bibliographic Data Science

Tolonen studies early modern book history and print culture using large-scale bibliographic data. His work integrates library catalogues and addresses challenges of data quality and completeness in national bibliographies. Early studies introduced “bibliographic data science” to map publishing trends (e.g. book formats, vernacularization), while later research employs quantitative analysis (statistical and data-driven models) to uncover patterns in publication language, canon formation and economic aspects of the book trade. 

  • Book Printing in Latin and Vernacular Languages in Northern Europe, 1500–1800 (2025)
  • Quantifying the Presence of Ancient Greek and Latin Classics in Early Modern Britain (2025)
  • The Anatomy of Eighteenth Century Collections Online (ECCO) (2022)
  • Print Culture and Economic Constraints: A Quantitative Analysis of Book Prices in Eighteenth-Century Britain (2024)
  • Examining the Early Modern Canon: The English Short Title Catalogue and Large-Scale Patterns of Cultural Production (2021)
  • Bibliographic Data Science and the History of the Book (c. 1500–1800) (2019)

Publishing Networks and Knowledge Dissemination

This category covers intellectual and publishing networks of the Enlightenment and beyond. Tolonen applies social network analysis and bibliometric methods to understand how ideas and publications spread through networks of authors, publishers and communities. Using large-scale bibliographic datasets, his studies reveal structural patterns—for example, mapping the Scottish Enlightenment print ecosystem uncovered distinct publisher roles in Edinburgh vs. London and emphasized relational careers of key figures. Early work in this area introduced data-driven network analysis of Enlightenment publishing, and recent publications in leading journals contextualize major and marginal players in the knowledge distribution networks over time.

  • Networks of Influence in Scottish Enlightenment Publishing (2024)
  • The Evolution of Scottish Enlightenment Publishing (2024)
  • Communication and Idea Transmission across Historical Communities: A Quantitative Analysis of Early Modern Nonconformist Networks (2023)
  • Distinguishing Discourses: A Data-Driven Analysis of Works and Publishing Networks of the Scottish Enlightenment (2022)

Historical Newspapers and Public Discourse

Tolonen’s research also explores newspaper archives and the emergence of a public sphere. He employs computational text analysis to study changing vocabularies, discourse dynamics, and the reach of periodicals across time and geography. Earlier studies analyzed basic features (language, location, publication frequency) of Finnish newspapers to delineate a national public sphere. Subsequently, more sophisticated natural language processing techniques were introduced. For example, one study used dependency parsing and neural word embeddings to trace how the concept of “nation” evolved semantically across four languages’ newspaper corpora. Such work illustrates a shift from manual analysis to data-driven methods that detect long-term conceptual changes and cross-lingual trends in public discourse.

  • A Data-Driven Approach to Studying Changing Vocabularies in Historical Newspaper Collections (2021)
  • Topic Modelling Discourse Dynamics in Historical Newspapers (2021)
  • A National Public Sphere? Analysing the Language, Location and Form of Newspapers in Finland, 1771–1917 (2019)

Text Reuse and Reception Studies

A significant thread in Tolonen’s work is the study of text reuse and intertextuality, shedding light on how ideas and texts were circulated and re-purposed in the early modern period. Recognizing that matching shared passages across documents can reveal the spread and evolution of ideas, he helped develop computational tools for large-scale text reuse detection. The Reception Reader web tool, for instance, enables scholars to visually explore the reuse of texts in Early English Books Online and ECCO, revealing previously hidden patterns of reception across time. Tolonen’s publications trace a methodological progression from targeted case studies (e.g. comparing text similarity in one author’s works) to building optimized, big-data systems. A recent contribution reports on handling “billions of text reuse instances” from nearly all 18th-century printed texts, detailing how a hybrid data pipeline (SQL databases combined with Spark big-data processing) was optimized to support interactive humanities research.

  • Reception Reader: Exploring Text Reuse in Early Modern British Publications (2023)
  • Text Reuse in Large Historical Corpora: Insights from the Optimization of a Data Science System (2025)
  • The Reception of David Hume's Essays in Eighteenth-Century Britain (2025)
  • A Comparative Text Similarity Analysis of the Works of Bernard Mandeville (2023)

Historical Language and Semantic Change

Tolonen’s interdisciplinary work extends to language change and stylistic analysis in historical texts. Using corpora like ECCO and newspapers, these studies quantify how language and genres evolved over the Enlightenment and modern era. Earlier efforts applied statistical techniques to detect shifts within texts (for example, identifying genre changes or register shifts in multi-genre works). More recently, Tolonen’s collaborations have leveraged modern language models. One project trained a BERT-based model on 18th-century text data to predict publication dates. Other works similarly harness transformer models and distributional semantics to classify text registers and track domain-specific vocabulary change (e.g. the rise of economic terminology) over time. This trajectory illustrates the field’s shift from rule-based or manual analysis to explainable AI approaches in studying historical language.

  • Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model (2022)
  • Detecting Sequential Genre Change in Eighteenth-Century Texts (2022)
  • Towards Automatic Register Classification in Unrestricted Databases of Historical English (2024)
  • Dimensions of Incoming Economic Vocabulary in Eighteenth-Century Britain (2023)
  • Measuring the Distribution of Hume’s Scotticisms in the ECCO Collection (2023)

Computational Methods and Infrastructure for Humanities

Across all topics, Tolonen contributes to developing computational methods, tools and data workflows that advance digital humanities research. These publications focus on the infrastructure and methodological frameworks enabling large-scale analysis of historical data. In the bibliographic domain, Tolonen has helped formalize best practices for data curation and integration – from automatically determining edition groupings via metadata, to defining open workflows for multilingual library data (through DARIAH’s working groups on bibliographical data). He also explored statistical approaches to uncertainty in historical datasets, using probabilistic programming to model biases and gaps in sources. Collectively, these works show a progression toward more robust, scalable and transparent computational pipelines for humanities, often blending computer science techniques with domain-specific knowledge.

  • Semi-Supervised Contrastive Training for Similar Image Identification in a Large Collection of Historical Books (2025)
  • Document Layout Error Rate (DLER) Metric to Evaluate Image Segmentation Methods (2024)
  • Open Bibliographical Data Workflows and the Multilinguality Challenge (2024)
  • Integrated Interdisciplinary Workflows for Research on Historical Newspapers: Perspectives from Humanities Scholars, Computer Scientists, and Librarians (2022)
  • Quantifying Bias and Uncertainty in Historical Data Collections with Probabilistic Programming (2020)
  • Analytical Determination of Editions from Bibliographic Metadata (2019)

Enlightenment Studies

While focusing on use of computation in Digital Enlightenment Studies, Tolonen has also kept active publishing also more traditional pieces especially on the Scottish Enlightenment; actively working also in the archives thinking about the future and possibilities of the digital and computation.

  • Bernard Mandeville (in Stanford Encyclopedia of Philosophy) (2024)
  • Berkeley and Mandeville, Oxford Handbook of Berkeley (2022)
  • Kielellinen kontekstualismi ja aatehistoria (2022)
  • Pierre Nicole and amour-propre (2020)
  • Talous ja moraali (2016)
  • Hume in and out of the Scottish Enlightenment (2015)
  • The gothic origin of modern civility (2013)
  • Mandeville and Hume: anatomists of civil society (2013)

Vetenskapsgrenar

  • 615 Historia och arkeologi