Automated Extraction and Analysis of Sentences under Production: A Theoretical Framework and Its Evaluation

Malgorzata Anna Ulasik, Aleksandra Miletić

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Sentences are generally understood to be essential communicative units in writing that are built to express thoughts and meanings. Studying sentence production provides a valuable opportunity to shed new light on the writing process itself and on the underlying cognitive processes. Nevertheless, research on the production of sentences in writing remains scarce. We propose a theoretical framework and an open-source implementation that aim to facilitate the study of sentence production based on keystroke logs. We centre our approach around the notion of sentence history: all the versions of a given sentence during the production of a text. The implementation takes keystroke logs as input and extracts sentence versions, aggregates them into sentence histories and evaluates the sentencehood of each sentence version. We provide detailed evaluation of the implementation based on a manually annotated corpus of texts in French, German and English. The implementation yields strong results on the three processing aspects.

Original languageEnglish
Article number71
JournalLanguages
Volume9
Issue number3
Number of pages33
ISSN2226-471X
DOIs
Publication statusPublished - 22 Feb 2024
MoE publication typeA1 Journal article-refereed

Bibliographical note

Publisher Copyright:
© 2024 by the authors.

Fields of Science

  • 6121 Languages
  • 6162 Cognitive science
  • keystroke logging
  • linguistic modelling
  • sentence history
  • sentence production
  • text history
  • writing process

Cite this