Abstract
In Technical Report 602, I described the process of converting printed text into machine-readable form. This report is an extension to it, and here I will go into more detail in describing and demonsrating the capabilities of the search system based on analysed text. All the material on Msimulizi (years 1888-1896) that is available on SOAS web page was processed into machine-readable form, including manual editing of the whole text. The second round of editing was done on the basis of computational analysis, which points out the remaining scanning mistakes. The clean text was then converted into an analysed format, which is optimal for information retrieval. The report demonstrates especially such search tasks, which are hardly possible using conventional string search, due to the complex word structure of Swahili.
Original language | English |
---|---|
Place of Publication | Helsinki |
Publisher | University of Helsinki, Institute for Asian and African Studies |
Number of pages | 20 |
Publication status | Published - Oct 2020 |
MoE publication type | D4 Published development or research report or study |
Fields of Science
- 6121 Languages
- 113 Computer and information sciences