Theoretical and pragmatic considerations on the lemmatization of non-standard Early Medieval Latin charters

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Abstrakti

This paper discusses the theoretical bases as well as the pragmatic implementation of the lemmatization of the Late Latin Charter Treebanks (LLCT). LLCT is a set of three dependency treebanks (LLCT1, LLCT2, LLCT3) of Early Medieval Latin documentary texts (charters) written in Italy between AD 714 and 1000 (c. 594,000 tokens). The original model for the lemmatization of LLCT was the Latin Dependency Treebank (LDT), which is mainly Classical standard Latin and based on the entries of Lewis and Short’s Latin Dictionary. Since LLCT reflects later linguistic developments of Latin and contains a plethora of non-standard proper names, particular attention is paid to how non-standard lexemes are lemmatized systematically to make the lemmatization maximally usable. The theoretical underpinnings to manage the lemmatization boil down to two principles: the evolutionary principle and the parsimony principle.
Alkuperäiskielienglanti
LehtiStudi e Saggi Linguistici
Vuosikerta58
Numero1
Sivut67-94
Sivumäärä28
ISSN0085-6827
TilaJulkaistu - 2020
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 6121 Kielitieteet

Siteeraa tätä