Msimulizi as corpus for accurate search

Tutkimustuotos: TyöpaperiTieteellinen


In Technical Report 602, I described the process of converting printed text into machine-readable form. This report is an extension to it, and here I will go into more detail in describing and demonsrating the capabilities of the search system based on analysed text. All the material on Msimulizi (years 1888-1896) that is available on SOAS web page was processed into machine-readable form, including manual editing of the whole text. The second round of editing was done on the basis of computational analysis, which points out the remaining scanning mistakes. The clean text was then converted into an analysed format, which is optimal for information retrieval. The report demonstrates especially such search tasks, which are hardly possible using conventional string search, due to the complex word structure of Swahili.
KustantajaUniversity of Helsinki, Institute for Asian and African Studies
TilaJulkaistu - lokak. 2020
OKM-julkaisutyyppiD4 Julkaistu kehittämis- tai tutkimusraportti taikka -selvitys


  • 6121 Kielitieteet
  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä