Kuvaus
With digital methods, researchers can study large amounts of textual data and find patterns they might not otherwise see. For the researchers of hieroglyphic texts, there are useful full-text-based digital dictionaries, such as Ramses Online and Thesaurus Linguae Aegyptiae (TLA). However, one must have openly available machine-readable texts to use methods other than those available at the online services. Despite the release of the TLA texts, the scarcity of machine-readable data hinders the digital study of hieroglyphic texts and the development of new methods. This leaves Egyptology at a disadvantage when competing for funding and students.There is, however, already a tradition of making machine-readable hieroglyphic texts. When using hieroglyphic text editors to produce properly positioned hieroglyphs for print, Egyptologists encode the texts with the Gardiner Sign List extracted codes. For example, the sentence iw=i m pr=i “I am in my house” would be encoded as ‘M17 Z7 A1 G17 O1 Z1 A1’. The encoded hieroglyphic texts can be produced in various, often platform-specific, software. Most of the hieroglyphic text editors use the so-called Manuel the Codage encoding. Such encoded texts are machine-readable and contain the information needed to study many aspects of ancient Egyptian language and culture and to develop new digital methods. Unfortunately, the binary encoding files are usually not published but, instead, are discarded after a picture of the hieroglyphs has been produced.
Some hieroglyphic text editors, such as JSesh and WinGlyph, save the encodings in mutually compatible files with the extension .gly. These binary files are meant to be handled with a hieroglyphic text editor and cannot be opened with a regular one. From JSesh, it is also possible to copy out the encoding of the text manually, but this requires many steps and is, thus, slow. In the copied encoding of the text, all codes are, furthermore, linked to neighboring ones with underscores and hyphens so that the previous example sentence is presented as ‘M17_-Z7_-A1_-G17_-O1_-Z1_- A1.’ To make the process faster, I have built a tool, Gly2mdc, for extracting and cleaning the encoded texts. The Java-based tool takes the binary gly-file as input and produces the encoded text without the noise in the encoding.
The encoded text can be saved as a text file or in JSON format. The choice of either Manuel de Codage encoding or Unicode characters is offered for the plain text file. In the JSON format, the signs are annotated with their Unicode characters when available. These machine-readable files can be used with digital methods, and they are readable by humans as well.
Aikajakso | 14 kesäk. 2023 |
---|---|
Tapahtuman otsikko | Tenth European Conference of Egyptologists: Egypt 2023: Perspectives of Research |
Tapahtuman tyyppi | Konferenssi |
Sijainti | Warszawa, PuolaNäytä kartalla |
Tunnustuksen arvo | Kansainvälinen |
Tähän liittyvä sisältö
-
Projektit
-
Muinaisen Lähi-idän imperiumit -huippuyksikkö
Projekti: Suomen Akatemia: Huippuyksikkö
-
From Sherds of Pottery to Open Egyptological Data
Projekti: Tutkimusprojekti
-
Machine Readable Texts for Egyptologists
Projekti: Tutkimusprojekti
-
ANEE tiimi 1: Digitaaliset ihmistieteet
Projekti: Tutkimusprojekti
-
Aktiviteetit
-
Tenth European Conference of Egyptologists
Aktiviteetti: Tapahtumaan osallistumisen ja tapahtuman järjestämisen tyypit › Konferensseihin, kursseille ja seminaareihin osallistuminen ja näiden järjestäminen