Subregisters on Reddit: Functional variation across text lengths

Forskningsoutput: AvhandlingDoktorsavhandlingSamling av artiklar


This thesis comprises four studies which focus on register variation—the way language is used differently in different situational contexts and for different communicative purposes—within the social media platform Reddit. In particular, the focus of the present work is on variation in communicative function across Reddit comments of different lengths. Even though text length is often considered a confounding factor in corpus-linguistic studies, its role in various types of linguistic variation, including register variation, has received remarkably little study.

In order to study register variation across Reddit, the present work makes use of large-scale datasets of Reddit comments. First, I implement a multi-dimensional register analysis (Biber, 1988), and extract three dimensions of register variation from comment threads from a group of thirty-seven subreddits. This study acts as a proof-of-concept pilot study to confirm that register analysis is a meaningful approach to Reddit data.

In the three following studies, I propose and develop the idea of lengthwise methods, which make use of the fact that texts which are different in length can be difficult to compare with each other, but texts of the exact same length can be compared trivially. I then make use of such methods and a large-scale one-month dataset of Reddit comments to investigate the relationship between situationally and communicatively motivated linguistic choices, i.e. register variation, and the length of Reddit comments.

The results show that comment length and communicative function are linked. Looking at Reddit as a whole, there are clear tendencies in feature distributions which suggest that, for example, narrative content tends to favor longer comments more, whereas interpersonal content tends to favor shorter comments. However, further analysis breaking the data into subcorpora for different subreddits, thematic subforums of Reddit, shows that in many cases, the functional associations of comments of various lengths may differ greatly from one subreddit to another. In other words, there is no single communicative function fulfilled by comments of specific length. The functions nonetheless follow interpretable patterns, but the exact patterns depend on the register. These results highlight the importance of taking into consideration an often overlooked variable, text length, in many linguistic analyses.
  • Hiltunen, Turo, Handledare
  • Nevalainen, Terttu, Handledare
Tryckta ISBN978-951-51-8457-3
Elektroniska ISBN978-951-51-8458-0
StatusPublicerad - sep. 2022
MoE-publikationstypG5 Doktorsavhandling (artikel)


  • 6121 Språkvetenskaper

Citera det här