Description
Occitan is a regional language spoken in southern France and in parts of Italy and Spain. Like many such languages, it has only recently started to enter the digital era. Basic digital tools and resources (text databases, electronic dictionaries, text-to-speech tools) have been created and Occitan Wikipedia is also being developed.We present OcWikiDisc, a 500,000-word corpus extracted from Occitan Wikipedia’s discussion pages. It contains direct user-to-user interactions on various topics. We analyze Occitan dialects and spelling norms on a corpus sample in a first attempt to model the use of Occitan on this medium.
Period | 6 Oct 2022 |
---|---|
Event title | 8th Estonian Digital Humanities Conference: Shifts in language and culture: computational approaches to variation and change |
Event type | Conference |
Location | Tallinn, EstoniaShow on map |
Degree of Recognition | International |
Documents & Links
Related content
-
Projects
-
Corpus-based computational dialectology: exploiting machine translation techniques to extract, visualize and interpret dialectal patterns
Project: Research Council of Finland: Academy Project