Occitan in Wikipedia Discussions: Initial Findings

Activity: Talk or presentation typesOral presentation


Occitan is a regional language spoken in southern France and in parts of Italy and Spain. Like many such languages, it has only recently started to enter the digital era. Basic digital tools and resources (text databases, electronic dictionaries, text-to-speech tools) have been created and Occitan Wikipedia is also being developed.

We present OcWikiDisc, a 500,000-word corpus extracted from Occitan Wikipedia’s discussion pages. It contains direct user-to-user interactions on various topics. We analyze Occitan dialects and spelling norms on a corpus sample in a first attempt to model the use of Occitan on this medium.
Period6 Oct 2022
Event title8th Estonian Digital Humanities Conference: Shifts in language and culture: computational approaches to variation and change
Event typeConference
LocationTallinn, EstoniaShow on map
Degree of RecognitionInternational