Natural language processing system for business intelligence

Mian Du

Research output: ThesisDoctoral ThesisCollection of Articles


The ongoing information explosion has a particular impact on business areas, involving corporate strategy and business decision-making. Business intelligence tools aim to help users to understand market trends, which is critical for their day-to-day operations. For example, it is a typical business intelligence task to effectively obtain accurate and relevant information about the competitor’s activity in the same industry sector. This thesis presents research on a natural language processing system, which aims to address the problem of information overload in the business domain. It uses document filtering, information extraction, and supervised and semi-supervised learning. Input to the system includes news documents from on-line news websites and company press pages. We first demonstrate that a combination of NLP techniques and frequent sequential pattern mining can be used for finding patterns from unstructured natural-language text, i.e., news articles. The patterns relate to a specific domain of news. Evaluation results show that scenario-based summarization can filter out irrelevant documents and also extract important sentences from relevant documents as summaries for pre-defined scenarios in a specific domain. For document-level filtering, this method achieves very high precision, while keeping quite high recall in our study. Next, we present experiments with supervised learning for labelling business-news documents with multiple industry sectors. The main contribution is that combining a named-entity-based rote classifier with the balanced classifiers yields better results than either classifier alone. This method also improves on the best score previously reported, while using the same amount of training data for the rote classifier, and considerably less for the statistical classifiers. We then explore the interplay between company news, social media visibility, and stock prices. Information extracted from on-line news by means of the deep linguistic analysis is used to construct queries to various social media platforms. The main results presented in the thesis demonstrate the interesting correlations between the mentions of a company in the news and the views of its page in Wikipedia. Based on the above research topics, the thesis also presents the design and architecture of a complete decision-support system. The system is an example of using the above research results to extract, analyze and organize information from plain-text news.
Original languageEnglish
Awarding Institution
  • University of Helsinki
  • Tarkoma, Sasu, Supervisor
  • Yangarber, Roman, Supervisor
Award date29 Nov 2017
Place of PublicationHelsinki
Print ISBNs978-951-51-3900-9
Electronic ISBNs978-951-51-3901-6
Publication statusPublished - 29 Nov 2017
MoE publication typeG5 Doctoral dissertation (article)

Fields of Science

  • 113 Computer and information sciences

Cite this