Description
The Ramses Transliteration Corpus, published by Serge Rosmorduc in 2021, contains 71,263 lines of Manuel de Codage (MdC) encoded hieroglyphic sentences and their transliterations. The encoded sentences, where word boundaries have been indicated, and the respective transliterations are in the same order but in separate files. Furthermore, both forms of the sentences may contain editorial insertions that are not reflected in the other form.To align the encoded and transliterated versions of each sentence, I used an algorithm developed by S. B. Needleman and C. D. Wunch for comparing the Amino Acid Sequence of Two Proteins. The method works also on natural language text where the order of the words is the same in both sequences to be aligned and finds the words that are only present in one of them. The method compares each token in one sequence to all tokens in the other. For comparing the MdC and transliteration words, my implementation of the method uses a dictionary extracted from the lines where MdC and transliteration lines contain the same number of tokens, that is, where there are no insertions by editors. The algorithm works well in aligning the MdC with transliteration. However, since the corpus has been compiled by various scholars over several years, there are naturally some inconsistencies in the data. The aim was to be able to extract a language model – all word forms with their frequencies – from the aligned sentences. Therefore, additional rules specific to these texts were needed to extract as many word form attestations as possible.
In this paper, I will present the corpus and the alignment method. I will also discuss the additional steps needed to extract word forms from the sentences. The language models will be published openly for others to use in their research, while the alignment method can be used to align other hieroglyphic texts as well.
Period | 5 Jul 2023 |
---|---|
Event title | International Conference Ancient Egypt New Technology |
Event type | Conference |
Location | Napoli, ItalyShow on map |
Degree of Recognition | International |
Related content
-
Activities
-
International Conference Ancient Egypt New Technology
Activity: Participating in or organising an event types › Organisation and participation in conferences, workshops, courses, seminars
-
Publications
-
Transliteration Model for Egyptian Words
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
-
Projects
-
Centre of Excellence in Ancient Near Eastern Empires
Project: Research Council of Finland: Centre of Exellence
-
From Sherds of Pottery to Open Egyptological Data
Project: Research project
-
Machine Readable Texts for Egyptologists
Project: Research project
-
ANEE Team 1: Digital Humanities Approaches
Project: Research project