Abstract
This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).
Original language | English |
---|---|
Title of host publication | Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3) |
Editors | Francesco Mambrini, Marco Passarotti, Caroline Sporleder |
Number of pages | 12 |
Place of Publication | Sofia, Bulgaria |
Publisher | Institute of Information and Communication Technologies, Bulgarian Academy of Sciences |
Publication date | Dec 2013 |
Pages | 61-72 |
ISBN (Print) | 978-954-91700-5-4 |
Publication status | Published - Dec 2013 |
MoE publication type | A4 Article in conference proceedings |
Event | The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3) - Sofia, Bulgaria Duration: 12 Dec 2013 → 12 Dec 2013 Conference number: 3 |
Fields of Science
- 6121 Languages
- Latin linguistics
- 113 Computer and information sciences
- digital humanities
- 615 History and Archaeology
- medieval studies
- palaeography
- diplomatics