Using regression makes extraction of shared variation in multiple datasets easy

Jussi Korpela, Andreas Henelius, Lauri Ahonen, Arto Klami, Kai Puolamäki

Research output: Contribution to journalArticleScientificpeer-review

Abstract

In many data analysis tasks it is important to understand the relationships between different datasets. Several methods exist for this task but many of them are limited to two datasets and linear relationships. In this paper, we propose a new efficient algorithm, termed cocoreg, for the extraction of variation common to all datasets in a given collection of arbitrary size. cocoreg extends redundancy analysis to more than two datasets, utilizing chains of regression functions to extract the shared variation in the original data space. The algorithm can be used with any linear or non-linear regression function, which makes it robust, straightforward, fast, and easy to implement and use. We empirically demonstrate the efficacy of shared variation extraction using the cocoreg algorithm on five artificial and three real datasets.
Original languageEnglish
JournalData Mining and Knowledge Discovery
Volume30
Issue number5
Pages (from-to)1112-1133
Number of pages22
ISSN1384-5810
DOIs
Publication statusPublished - Sep 2016
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 113 Computer and information sciences

Cite this