PapyGreek Treebanks: A Dataset of Linguistically Annotated Greek Documentary Papyri

Marja Vierros, Erik Henriksson

Research output: Contribution to journalArticleScientificpeer-review


The PapyGreek Treebanks dataset contains documentary texts written in Postclassical Greek (ca. 300 BCE–700 CE), morphosyntactically annotated according to Dependency Grammar. The source of the texts is the Duke Databank of Documentary Papyri (DDbDP), which preserves the modern editorial treatment of the documents in TEI Epidoc XML encoding. Aiming to expose linguistic variation in the DDbDP, we have annotated two versions of a selection of documents: the plain transcription and an editorially corrected version. The dataset also comprises metadata about the documents’ dating and provenance, text type, and the persons involved. Furthermore, it facilitates linguistic research on these texts
Original languageEnglish
Article number55
JournalJournal of open humanities data
Issue number7
Number of pages6
Publication statusPublished - 5 Nov 2021
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 6121 Languages
  • linguistic analysis
  • Language change
  • language variation
  • linguistic annotation
  • Greek language
  • Greek papyri

Cite this