Improving performance quality and user experience in the PULS News Mining system

Mian Du

Research output: ThesisMaster's thesis

Abstract

Pattern-based Understanding and Learning System (PULS) can be considered as one key component of a large distributed news surveillance system. It is formed by the following three parts,
\begin{enumerate}
\item an Information Extraction (IE) system running on the back-end, which receives news articles as plain text in RSS feeds arriving continuously in real-time from several partner systems, processes and extracts information from these feeds and stores the information into the database;
\item a Web-based decision support (DS) system running on the front-end, which visualizes the information for decision-making and user evaluation;
\item both of them share the central database, which stores the structured information extracted by IE system and visualized by decision support system.
\end{enumerate}

In the IE system, there is an increasing need to extend the capability of extracting information from only English articles in medical and business domain to be able to handle articles in other languages like French, Russian and in other domains. In the decision support system, several new ways of Information Visualization and user evaluation interfaces are required by users for getting better decision support.

In order to achieve these new features, a number of approaches including Information Extraction, machine learning, Information Visualization, evolutionary delivery model, requirements elicitation, modelling, database design approach and a variety of evaluation approaches have been investigated and adopted. Besides, various programming languages such as Lisp, Java, Python, JavaScript/Jquery, etc. have been used. More importantly, appropriate development process has been followed. This thesis reports on the whole process followed to achieve the required improvements made to PULS.
Original languageEnglish
Place of PublicationHelsinki
Publisher
Publication statusPublished - 2012
MoE publication typeG2 Master's thesis, polytechnic Master's thesis

Fields of Science

  • 113 Computer and information sciences
  • Information Extraction
  • Information Visualization
  • Decision Support
  • Machine learning

Cite this