Supervisor, PhD Thesis

Timo Honkela (Handledare)

Aktivitet: ExaminationstyperHandledare eller bihandledare av doktorsavhandling

Beskrivning

Aalto University publication series
DOCTORAL DISSERTATIONS 137/2012

Language- and domain- independent text mining
Mari-Sanna Paukkeri
Doctoral dissertation for the degree of Doctor of Science in

Technology to be presented with due permission of the Aalto University School of Science, for public examination and debate in Auditorium AS1 of the school on 9th November 2012 at 12 noon.

Aalto University
School of Science
Department of Information and Computer Science

Supervising professor
Prof. Erkki Oja

Thesis advisors [term used in Aalto University]
Doc. Timo Honkela
Dr. Mathias Creutz

Preliminary examiners
Dr. Reinhard Rapp, Johannes Gutenberg University Mainz, Germany
Dr. Roman Yangarber, University of Helsinki, Finland

Opponent
Doc. Jussi Karlgren, Gavagai AB, Sweden

© Mari-Sanna Paukkeri
ISBN 978-952-60-4833-8 (printed)
ISBN 978-952-60-4834-5 (pdf)
ISSN-L 1799-4934
ISSN 1799-4934 (printed)
ISSN 1799-4942(pdf)
http://urn.fi/URN:ISBN:978-952-60-4834-5

Abstract

The field of natural language processing (NLP) has developed enormously during the last decades. The availability of constantly increasing amount of textual data in electronic form has accelerated also the development of statistical methods for NLP, in which characteristics of natural languages are learned from large corpora. Statistical methods have shown their applicability in information retrieval, in which documents of various languages and domains are returned according to user queries, statistical machine translation which is easily applicable to new languages, document clustering to group semantically similar documents, and many information extraction tasks, including keyphrase extraction, document summarization and discovering linguistic features. However, a majority of the NLP research, including also many statistical methods, is concentrated on the English language, using various language-specific tools and resources, such as part-of-speech taggers and ontologies, which are not directly applicable to other languages. ...

Title of thesis: Language- and domain- independent text mining
Period9 nov 2012
ExaminandMari-Sanna Paukkeri
Examination vidAalto University