Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material

Timo Korkiakangas, Matti Lassila

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


    This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).
    Original languageEnglish
    Title of host publicationProceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)
    EditorsFrancesco Mambrini, Marco Passarotti, Caroline Sporleder
    Number of pages12
    Place of PublicationSofia, Bulgaria
    PublisherInstitute of Information and Communication Technologies, Bulgarian Academy of Sciences
    Publication dateDec 2013
    ISBN (Print)978-954-91700-5-4
    Publication statusPublished - Dec 2013
    MoE publication typeA4 Article in conference proceedings
    EventThe Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3) - Sofia, Bulgaria
    Duration: 12 Dec 201312 Dec 2013
    Conference number: 3

    Fields of Science

    • 6121 Languages
    • Latin linguistics
    • 113 Computer and information sciences
    • digital humanities
    • 615 History and Archaeology
    • medieval studies
    • palaeography
    • diplomatics

    Cite this