On the inconsistency of ℓ1-penalised sparse precision matrix estimation

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Background: Various l(1)-penalised estimation methods such as graphical lasso and CLIME are widely used for sparse precision matrix estimation and learning of undirected network structure from data. Many of these methods have been shown to be consistent under various quantitative assumptions about the underlying true covariance matrix. Intuitively, these conditions are related to situations where the penalty term will dominate the optimisation.

Results: We explore the consistency of l(1)-based methods for a class of bipartite graphs motivated by the structure of models commonly used for gene regulatory networks. We show that all l(1)-based methods fail dramatically for models with nearly linear dependencies between the variables. We also study the consistency on models derived from real gene expression data and note that the assumptions needed for consistency never hold even for modest sized gene networks and l(1)-based methods also become unreliable in practice for larger networks.

Conclusions: Our results demonstrate that l(1)-penalised undirected network structure learning methods are unable to reliably learn many sparse bipartite graph structures, which arise often in gene expression data. Users of such methods should be aware of the consistency criteria of the methods and check if they are likely to be met in their application of interest.
Original languageEnglish
JournalBMC Bioinformatics
Volume17
Issue numberSuppl 16
Pages (from-to)99-107
Number of pages9
ISSN1471-2105
DOIs
Publication statusPublished - 13 Dec 2016
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 112 Statistics and probability
  • 113 Computer and information sciences

Cite this

@article{b0db9ae7fd384ae5bfce72e8c7e44edf,
title = "On the inconsistency of ℓ1-penalised sparse precision matrix estimation",
abstract = "Background: Various l(1)-penalised estimation methods such as graphical lasso and CLIME are widely used for sparse precision matrix estimation and learning of undirected network structure from data. Many of these methods have been shown to be consistent under various quantitative assumptions about the underlying true covariance matrix. Intuitively, these conditions are related to situations where the penalty term will dominate the optimisation.Results: We explore the consistency of l(1)-based methods for a class of bipartite graphs motivated by the structure of models commonly used for gene regulatory networks. We show that all l(1)-based methods fail dramatically for models with nearly linear dependencies between the variables. We also study the consistency on models derived from real gene expression data and note that the assumptions needed for consistency never hold even for modest sized gene networks and l(1)-based methods also become unreliable in practice for larger networks.Conclusions: Our results demonstrate that l(1)-penalised undirected network structure learning methods are unable to reliably learn many sparse bipartite graph structures, which arise often in gene expression data. Users of such methods should be aware of the consistency criteria of the methods and check if they are likely to be met in their application of interest.",
keywords = "112 Statistics and probability, 113 Computer and information sciences",
author = "Otte Hein{\"a}vaara and Janne Lepp{\"a}-Aho and Jukka Corander and Antti Honkela",
year = "2016",
month = "12",
day = "13",
doi = "10.1186/s12859-016-1309-x",
language = "English",
volume = "17",
pages = "99--107",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BMC",
number = "Suppl 16",

}

On the inconsistency of ℓ1-penalised sparse precision matrix estimation. / Heinävaara, Otte; Leppä-Aho, Janne; Corander, Jukka; Honkela, Antti.

In: BMC Bioinformatics, Vol. 17, No. Suppl 16, 13.12.2016, p. 99-107.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - On the inconsistency of ℓ1-penalised sparse precision matrix estimation

AU - Heinävaara, Otte

AU - Leppä-Aho, Janne

AU - Corander, Jukka

AU - Honkela, Antti

PY - 2016/12/13

Y1 - 2016/12/13

N2 - Background: Various l(1)-penalised estimation methods such as graphical lasso and CLIME are widely used for sparse precision matrix estimation and learning of undirected network structure from data. Many of these methods have been shown to be consistent under various quantitative assumptions about the underlying true covariance matrix. Intuitively, these conditions are related to situations where the penalty term will dominate the optimisation.Results: We explore the consistency of l(1)-based methods for a class of bipartite graphs motivated by the structure of models commonly used for gene regulatory networks. We show that all l(1)-based methods fail dramatically for models with nearly linear dependencies between the variables. We also study the consistency on models derived from real gene expression data and note that the assumptions needed for consistency never hold even for modest sized gene networks and l(1)-based methods also become unreliable in practice for larger networks.Conclusions: Our results demonstrate that l(1)-penalised undirected network structure learning methods are unable to reliably learn many sparse bipartite graph structures, which arise often in gene expression data. Users of such methods should be aware of the consistency criteria of the methods and check if they are likely to be met in their application of interest.

AB - Background: Various l(1)-penalised estimation methods such as graphical lasso and CLIME are widely used for sparse precision matrix estimation and learning of undirected network structure from data. Many of these methods have been shown to be consistent under various quantitative assumptions about the underlying true covariance matrix. Intuitively, these conditions are related to situations where the penalty term will dominate the optimisation.Results: We explore the consistency of l(1)-based methods for a class of bipartite graphs motivated by the structure of models commonly used for gene regulatory networks. We show that all l(1)-based methods fail dramatically for models with nearly linear dependencies between the variables. We also study the consistency on models derived from real gene expression data and note that the assumptions needed for consistency never hold even for modest sized gene networks and l(1)-based methods also become unreliable in practice for larger networks.Conclusions: Our results demonstrate that l(1)-penalised undirected network structure learning methods are unable to reliably learn many sparse bipartite graph structures, which arise often in gene expression data. Users of such methods should be aware of the consistency criteria of the methods and check if they are likely to be met in their application of interest.

KW - 112 Statistics and probability

KW - 113 Computer and information sciences

U2 - 10.1186/s12859-016-1309-x

DO - 10.1186/s12859-016-1309-x

M3 - Article

VL - 17

SP - 99

EP - 107

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - Suppl 16

ER -