Theoretical and pragmatic considerations on the lemmatization of non-standard Early Medieval Latin charters

Forskningsoutput: TidskriftsbidragArtikelPeer review

Sammanfattning

This paper discusses the theoretical bases as well as the pragmatic implementation of the lemmatization of the Late Latin Charter Treebanks (LLCT). LLCT is a set of three dependency treebanks (LLCT1, LLCT2, LLCT3) of Early Medieval Latin documentary texts (charters) written in Italy between AD 714 and 1000 (c. 594,000 tokens). The original model for the lemmatization of LLCT was the Latin Dependency Treebank (LDT), which is mainly Classical standard Latin and based on the entries of Lewis and Short’s Latin Dictionary. Since LLCT reflects later linguistic developments of Latin and contains a plethora of non-standard proper names, particular attention is paid to how non-standard lexemes are lemmatized systematically to make the lemmatization maximally usable. The theoretical underpinnings to manage the lemmatization boil down to two principles: the evolutionary principle and the parsimony principle.
Originalspråkengelska
TidskriftStudi e Saggi Linguistici
Volym58
Utgåva1
Sidor (från-till)67-94
Antal sidor28
ISSN0085-6827
StatusPublicerad - 2020
MoE-publikationstypA1 Tidskriftsartikel-refererad

Vetenskapsgrenar

  • 6121 Språkvetenskaper

Citera det här