Late Latin Charter Treebank: contents and annotation

Forskningsoutput: TidskriftsbidragArtikelVetenskapligPeer review

Sammanfattning

This paper describes the construction and annotation of the Late Latin Charter Treebank, a set of three dependency treebanks (LLCT1, LLCT2 and LLCT3) which together contain 1,261 Early Medieval Latin documentary texts (i.e., original charters) written in Italy between AD 714 and 1000 (about 594,000 tokens). The paper focusses on matters which a linguistically or philologically inclined user of LLCT needs to know: the criteria on which the charters were selected, the special characteristics of the annotation types utilised, and the geographical and chronological distribution of the data. In addition to normal queries on forms, lemmas, morphology and syntax, complex philological research settings are enabled by the textual annotation layer of LLCT, which indicates abbreviated and damaged words, as well as the formulaic and non-formulaic passages of each charter.

Originalspråkengelska
TidskriftCorpora
Volym16
Nummer2
Sidor (från-till)191-203
Antal sidor13
ISSN1749-5032
DOI
StatusPublicerad - aug. 2021
MoE-publikationstypA1 Tidskriftsartikel-refererad

Vetenskapsgrenar

  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här