Late Latin Charter Treebank: contents and annotation

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Abstrakti

This paper describes the construction and annotation of the Late Latin Charter Treebank, a set of
three dependency treebanks (LLCT1, LLCT2, and LLCT3) which contain together 1,261 Early
Medieval Latin documentary texts (i.e., original charters) written in Italy between AD 714 and 1000
(c. 594,000 tokens). The paper focuses on issues which a linguistically or philologically inclined
user of LLCT needs to know: the criteria on which the charters were selected, the special
characteristics of the annotation types utilized and the geographical and chronological distribution
of the data. In addition to normal queries on forms, lemmas, morphology and syntax, complex
philological research settings are enabled by the textual annotation layer of LLCT, which indicates
abbreviated and damaged words, as well as the formulaic and non-formulaic passages of each
charter.
Alkuperäiskielienglanti
LehtiCorpora
Vuosikerta2021
Numero16
ISSN1749-5032
TilaHyväksytty/In press - 2021
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä