Human perception relies on chunking up an incoming information stream into smaller units to make sense of it. Evidence of chunking has been found across different domains, including visual events, music, and dance movement. It is largely uncontested that language processing must also proceed in smaller chunks of some kind. What these online chunks consist in is much less understood. In this paper, we propose that cognitively relevant chunks can be identified by crowdsourcing listener perceptions of chunk boundaries in real-time speech, even if the listeners are non-native speakers of the language. We present a paradigm in which experiment participants simultaneously listen to short extracts of authentic speech and mark chunk boundaries using a custom-built tablet application. We then test the internal validity of the method by measuring the extent to which fluent L2 listeners agree on chunk boundaries. To do this, we use three datasets collected within the paradigm and a suite of different statistical methods. The external validity of the method is studied in a separate paper and is briefly discussed at the end.
- 6121 Kielitieteet