Bayesian cluster analysis with applications to pathogen population genomics

Tutkimustuotos: OpinnäyteVäitöskirjaArtikkelikokoelma

Kuvaus

Identifying similarity patterns in heterogeneous observations is a very common problem in many branches of science. When the similarities and dissimilarities are encoded by a group structure, the task of dividing the observed sample into an unknown number of homogeneous groups is known as cluster analysis. Among the many types of statistical data analyses, it is one of the most widely applied. In evolutionary biology, for example, the population structure plays an important role. Groups naturally arise as the result of evolutionary processes and depending on the resolution of the study, clusters might represent similar molecules, organisms, or even species. With the huge amount of genetic data now freely available in on-line databases, cluster analysis is a valuable technique to better understand the evolution of organisms. In this dissertation we focus our attention on Bayesian approaches to model-based clustering. We review the mathematical formalization of the two most common methods, finite mixture models and product partition models, together with algorithms needed to draw inferences. We then introduce a novel Bayesian model which has been specifically designed to partition categorical data matrices. Finally, we show how cluster analysis is a very effective method for understanding the evolution of pathogens, and how this information is relevant to public health.
Alkuperäiskielienglanti
Myöntävä instituutio
  • Helsingin yliopisto
Valvoja/neuvonantaja
  • Corander, Jukka, Valvoja
Myöntöpäivämäärä27 lokakuuta 2017
JulkaisupaikkaHelsinki
Kustantaja
Painoksen ISBN978-951-51-3675-6
Sähköinen ISBN978-951-51-3676-3
TilaJulkaistu - 27 lokakuuta 2017
OKM-julkaisutyyppiG5 Tohtorinväitöskirja (artikkeli)

Tieteenalat

  • 112 Tilastotiede

Lainaa tätä

Pessia, Alberto. / Bayesian cluster analysis with applications to pathogen population genomics. Helsinki : University of Helsinki, 2017. 107 Sivumäärä
@phdthesis{bd89f8e6f7194293bbd8f9f9316af97f,
title = "Bayesian cluster analysis with applications to pathogen population genomics",
abstract = "Identifying similarity patterns in heterogeneous observations is a very common problem in many branches of science. When the similarities and dissimilarities are encoded by a group structure, the task of dividing the observed sample into an unknown number of homogeneous groups is known as cluster analysis. Among the many types of statistical data analyses, it is one of the most widely applied. In evolutionary biology, for example, the population structure plays an important role. Groups naturally arise as the result of evolutionary processes and depending on the resolution of the study, clusters might represent similar molecules, organisms, or even species. With the huge amount of genetic data now freely available in on-line databases, cluster analysis is a valuable technique to better understand the evolution of organisms. In this dissertation we focus our attention on Bayesian approaches to model-based clustering. We review the mathematical formalization of the two most common methods, finite mixture models and product partition models, together with algorithms needed to draw inferences. We then introduce a novel Bayesian model which has been specifically designed to partition categorical data matrices. Finally, we show how cluster analysis is a very effective method for understanding the evolution of pathogens, and how this information is relevant to public health.",
keywords = "112 Statistics and probability",
author = "Alberto Pessia",
year = "2017",
month = "10",
day = "27",
language = "English",
isbn = "978-951-51-3675-6",
publisher = "University of Helsinki",
address = "Finland",
school = "University of Helsinki",

}

Bayesian cluster analysis with applications to pathogen population genomics. / Pessia, Alberto.

Helsinki : University of Helsinki, 2017. 107 s.

Tutkimustuotos: OpinnäyteVäitöskirjaArtikkelikokoelma

TY - THES

T1 - Bayesian cluster analysis with applications to pathogen population genomics

AU - Pessia, Alberto

PY - 2017/10/27

Y1 - 2017/10/27

N2 - Identifying similarity patterns in heterogeneous observations is a very common problem in many branches of science. When the similarities and dissimilarities are encoded by a group structure, the task of dividing the observed sample into an unknown number of homogeneous groups is known as cluster analysis. Among the many types of statistical data analyses, it is one of the most widely applied. In evolutionary biology, for example, the population structure plays an important role. Groups naturally arise as the result of evolutionary processes and depending on the resolution of the study, clusters might represent similar molecules, organisms, or even species. With the huge amount of genetic data now freely available in on-line databases, cluster analysis is a valuable technique to better understand the evolution of organisms. In this dissertation we focus our attention on Bayesian approaches to model-based clustering. We review the mathematical formalization of the two most common methods, finite mixture models and product partition models, together with algorithms needed to draw inferences. We then introduce a novel Bayesian model which has been specifically designed to partition categorical data matrices. Finally, we show how cluster analysis is a very effective method for understanding the evolution of pathogens, and how this information is relevant to public health.

AB - Identifying similarity patterns in heterogeneous observations is a very common problem in many branches of science. When the similarities and dissimilarities are encoded by a group structure, the task of dividing the observed sample into an unknown number of homogeneous groups is known as cluster analysis. Among the many types of statistical data analyses, it is one of the most widely applied. In evolutionary biology, for example, the population structure plays an important role. Groups naturally arise as the result of evolutionary processes and depending on the resolution of the study, clusters might represent similar molecules, organisms, or even species. With the huge amount of genetic data now freely available in on-line databases, cluster analysis is a valuable technique to better understand the evolution of organisms. In this dissertation we focus our attention on Bayesian approaches to model-based clustering. We review the mathematical formalization of the two most common methods, finite mixture models and product partition models, together with algorithms needed to draw inferences. We then introduce a novel Bayesian model which has been specifically designed to partition categorical data matrices. Finally, we show how cluster analysis is a very effective method for understanding the evolution of pathogens, and how this information is relevant to public health.

KW - 112 Statistics and probability

M3 - Doctoral Thesis

SN - 978-951-51-3675-6

PB - University of Helsinki

CY - Helsinki

ER -