Occitan in Wikipedia Discussions: Initial Findings

Aktiviteetti: Puhe- tai esitystyypitSuullinen esitys

Kuvaus

Occitan is a regional language spoken in southern France and in parts of Italy and Spain. Like many such languages, it has only recently started to enter the digital era. Basic digital tools and resources (text databases, electronic dictionaries, text-to-speech tools) have been created and Occitan Wikipedia is also being developed.

We present OcWikiDisc, a 500,000-word corpus extracted from Occitan Wikipedia’s discussion pages. It contains direct user-to-user interactions on various topics. We analyze Occitan dialects and spelling norms on a corpus sample in a first attempt to model the use of Occitan on this medium.
Aikajakso6 lokak. 2022
Tapahtuman otsikko8th Estonian Digital Humanities Conference: Shifts in language and culture: computational approaches to variation and change
Tapahtuman tyyppiKonferenssi
SijaintiTallinn, ViroNäytä kartalla
Tunnustuksen arvoKansainvälinen