Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material

Timo Korkiakangas, Matti Lassila

    Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

    Sammanfattning

    This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).
    Originalspråkengelska
    Titel på värdpublikationProceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)
    RedaktörerFrancesco Mambrini, Marco Passarotti, Caroline Sporleder
    Antal sidor12
    UtgivningsortSofia, Bulgaria
    FörlagInstitute of Information and Communication Technologies, Bulgarian Academy of Sciences
    Utgivningsdatumdec. 2013
    Sidor61-72
    ISBN (tryckt)978-954-91700-5-4
    StatusPublicerad - dec. 2013
    MoE-publikationstypA4 Artikel i en konferenspublikation
    EvenemangThe Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3) - Sofia, Bulgarien
    Varaktighet: 12 dec. 201312 dec. 2013
    Konferensnummer: 3

    Vetenskapsgrenar

    • 6121 Språkvetenskaper
    • 113 Data- och informationsvetenskap
    • 615 Historia och arkeologi

    Citera det här