Gly2mdc: The Importance of Being Readable

Aktiviteetti: Puhe- tai esitystyypitSuullinen esitys

Kuvaus

With digital methods, researchers can study large amounts of textual data and find patterns they might not otherwise see. For the researchers of hieroglyphic texts, there are useful full-text-based digital dictionaries, such as Ramses Online and Thesaurus Linguae Aegyptiae (TLA). However, one must have openly available machine-readable texts to use methods other than those available at the online services. Despite the release of the TLA texts, the scarcity of machine-readable data hinders the digital study of hieroglyphic texts and the development of new methods. This leaves Egyptology at a disadvantage when competing for funding and students.
There is, however, already a tradition of making machine-readable hieroglyphic texts. When using hieroglyphic text editors to produce properly positioned hieroglyphs for print, Egyptologists encode the texts with the Gardiner Sign List extracted codes. For example, the sentence iw=i m pr=i “I am in my house” would be encoded as ‘M17 Z7 A1 G17 O1 Z1 A1’. The encoded hieroglyphic texts can be produced in various, often platform-specific, software. Most of the hieroglyphic text editors use the so-called Manuel the Codage encoding. Such encoded texts are machine-readable and contain the information needed to study many aspects of ancient Egyptian language and culture and to develop new digital methods. Unfortunately, the binary encoding files are usually not published but, instead, are discarded after a picture of the hieroglyphs has been produced.
Some hieroglyphic text editors, such as JSesh and WinGlyph, save the encodings in mutually compatible files with the extension .gly. These binary files are meant to be handled with a hieroglyphic text editor and cannot be opened with a regular one. From JSesh, it is also possible to copy out the encoding of the text manually, but this requires many steps and is, thus, slow. In the copied encoding of the text, all codes are, furthermore, linked to neighboring ones with underscores and hyphens so that the previous example sentence is presented as ‘M17_-Z7_-A1_-G17_-O1_-Z1_- A1.’ To make the process faster, I have built a tool, Gly2mdc, for extracting and cleaning the encoded texts. The Java-based tool takes the binary gly-file as input and produces the encoded text without the noise in the encoding.
The encoded text can be saved as a text file or in JSON format. The choice of either Manuel de Codage encoding or Unicode characters is offered for the plain text file. In the JSON format, the signs are annotated with their Unicode characters when available. These machine-readable files can be used with digital methods, and they are readable by humans as well.
Aikajakso14 kesäk. 2023
Tapahtuman otsikkoTenth European Conference of Egyptologists: Egypt 2023: Perspectives of Research
Tapahtuman tyyppiKonferenssi
SijaintiWarszawa, PuolaNäytä kartalla
Tunnustuksen arvoKansainvälinen