Msimulizi as corpus for accurate search

Forskningsoutput: ArbetsdokumentVetenskaplig


In Technical Report 602, I described the process of converting printed text into machine-readable form. This report is an extension to it, and here I will go into more detail in describing and demonsrating the capabilities of the search system based on analysed text. All the material on Msimulizi (years 1888-1896) that is available on SOAS web page was processed into machine-readable form, including manual editing of the whole text. The second round of editing was done on the basis of computational analysis, which points out the remaining scanning mistakes. The clean text was then converted into an analysed format, which is optimal for information retrieval. The report demonstrates especially such search tasks, which are hardly possible using conventional string search, due to the complex word structure of Swahili.
UtgivareUniversity of Helsinki, Institute for Asian and African Studies
Antal sidor20
StatusPublicerad - okt. 2020
MoE-publikationstypD4 Publicerad utvecklings- eller forskningsrapport eller studie


  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här