Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material

Timo Korkiakangas, Matti Lassila

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).
Alkuperäiskielienglanti
OtsikkoProceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)
ToimittajatFrancesco Mambrini, Marco Passarotti, Caroline Sporleder
Sivumäärä12
JulkaisupaikkaSofia, Bulgaria
KustantajaInstitute of Information and Communication Technologies, Bulgarian Academy of Sciences
Julkaisupäiväjoulukuuta 2013
Sivut61-72
ISBN (painettu)978-954-91700-5-4
TilaJulkaistu - joulukuuta 2013
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaThe Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3) - Sofia, Bulgaria
Kesto: 12 joulukuuta 201312 joulukuuta 2013
Konferenssinumero: 3

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet
  • 615 Historia ja arkeologia

Siteeraa tätä