Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material

Timo Korkiakangas, Matti Lassila

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).
Original languageEnglish
Title of host publicationProceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)
EditorsFrancesco Mambrini, Marco Passarotti, Caroline Sporleder
Number of pages12
Place of PublicationSofia, Bulgaria
PublisherInstitute of Information and Communication Technologies, Bulgarian Academy of Sciences
Publication dateDec 2013
Pages61-72
ISBN (Print)978-954-91700-5-4
Publication statusPublished - Dec 2013
MoE publication typeA4 Article in conference proceedings
EventThe Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3) - Sofia, Bulgaria
Duration: 12 Dec 201312 Dec 2013
Conference number: 3

Fields of Science

  • 6121 Languages
  • Latin linguistics
  • 113 Computer and information sciences
  • digital humanities
  • 615 History and Archaeology
  • medieval studies
  • palaeography
  • diplomatics

Cite this

Korkiakangas, T., & Lassila, M. (2013). Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material. In F. Mambrini, M. Passarotti, & C. Sporleder (Eds.), Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3) (pp. 61-72). Sofia, Bulgaria: Institute of Information and Communication Technologies, Bulgarian Academy of Sciences.
Korkiakangas, Timo ; Lassila, Matti. / Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material. Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3). editor / Francesco Mambrini ; Marco Passarotti ; Caroline Sporleder. Sofia, Bulgaria : Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, 2013. pp. 61-72
@inproceedings{3a09ed589b2346f1a07ef5b6f8f1e222,
title = "Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material",
abstract = "This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).",
keywords = "6121 Languages, Latin linguistics, 113 Computer and information sciences, digital humanities, 615 History and Archaeology, medieval studies, palaeography, diplomatics",
author = "Timo Korkiakangas and Matti Lassila",
note = "Volume: Proceeding volume:",
year = "2013",
month = "12",
language = "English",
isbn = "978-954-91700-5-4",
pages = "61--72",
editor = "Francesco Mambrini and Marco Passarotti and Caroline Sporleder",
booktitle = "Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)",
publisher = "Institute of Information and Communication Technologies, Bulgarian Academy of Sciences",
address = "Bulgaria",

}

Korkiakangas, T & Lassila, M 2013, Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material. in F Mambrini, M Passarotti & C Sporleder (eds), Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3). Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria, pp. 61-72, The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3), Sofia, Bulgaria, 12/12/2013.

Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material. / Korkiakangas, Timo; Lassila, Matti.

Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3). ed. / Francesco Mambrini; Marco Passarotti; Caroline Sporleder. Sofia, Bulgaria : Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, 2013. p. 61-72.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material

AU - Korkiakangas, Timo

AU - Lassila, Matti

N1 - Volume: Proceeding volume:

PY - 2013/12

Y1 - 2013/12

N2 - This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).

AB - This article proposes a method that makes possible the linguistic study of textually difficult hand-written materials which are imperfectly preserved. These materials include medieval manuscripts, letters, and legal as well as private documents. With these, the normal treebanking procedure is not sufficient. We present the case of medieval Latin charter texts, i.e., private documents, that 1) are partly fragmentary and 2) exhibit massive use of abbreviations, e.g., chartul for chartulam ‘charter’. In addition, 3) charter texts are highly formulaic and display passages that differ from each other in their language use. It is not possible to ascertain the inflexional endings of most of the fragmentary and abbreviated words, so a method of excluding them from morphological (but not from syntactic) analysis is needed. Moreover, due to the varying degree of formulaicity in certain parts of charter texts, the language of these parts must be studied separately. Therefore, a method of merging two XML layers is introduced. One layer that contains lemmatic, morphological, and syntactic analysis according to the Perseus Latin Dependency Treebank standard is aligned with the other layer that contains textual information (abbreviations, fragmentary words, diplomatic segmentation).

KW - 6121 Languages

KW - Latin linguistics

KW - 113 Computer and information sciences

KW - digital humanities

KW - 615 History and Archaeology

KW - medieval studies

KW - palaeography

KW - diplomatics

M3 - Conference contribution

SN - 978-954-91700-5-4

SP - 61

EP - 72

BT - Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)

A2 - Mambrini, Francesco

A2 - Passarotti, Marco

A2 - Sporleder, Caroline

PB - Institute of Information and Communication Technologies, Bulgarian Academy of Sciences

CY - Sofia, Bulgaria

ER -

Korkiakangas T, Lassila M. Abbreviations, fragmentary words, formulaic language: treebanking mediaeval charter material. In Mambrini F, Passarotti M, Sporleder C, editors, Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3). Sofia, Bulgaria: Institute of Information and Communication Technologies, Bulgarian Academy of Sciences. 2013. p. 61-72