Theoretical and pragmatic considerations on the lemmatization of non-standard Early Medieval Latin charters

Research output: Contribution to journalArticleScientificpeer-review

Abstract

This paper discusses the theoretical bases as well as the pragmatic implementation of the lemmatization of the Late Latin Charter Treebanks (LLCT). LLCT is a set of three dependency treebanks (LLCT1, LLCT2, LLCT3) of Early Medieval Latin documentary texts (charters) written in Italy between AD 714 and 1000 (c. 594,000 tokens). The original model for the lemmatization of LLCT was the Latin Dependency Treebank (LDT), which is mainly Classical standard Latin and based on the entries of Lewis and Short’s Latin Dictionary. Since LLCT reflects later linguistic developments of Latin and contains a plethora of non-standard proper names, particular attention is paid to how non-standard lexemes are lemmatized systematically to make the lemmatization maximally usable. The theoretical underpinnings to manage the lemmatization boil down to two principles: the evolutionary principle and the parsimony principle.
Original languageEnglish
JournalStudi e Saggi Linguistici
Volume58
Issue number1
Pages (from-to)67-94
Number of pages28
ISSN0085-6827
Publication statusPublished - 2020
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 6121 Languages

Cite this