Seriation in paleontological data using Markov Chain Monte Carlo methods

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Kuvaus

Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.
Alkuperäiskielienglanti
LehtiPLoS Computational Biology
Vuosikerta2
Numero2
Sivut1-9
Sivumäärä9
ISSN1553-734X
DOI - pysyväislinkit
TilaJulkaistu - 2006
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

@article{f321ae504ff846a2930ad1aeda7b01ee,
title = "Seriation in paleontological data using Markov Chain Monte Carlo methods",
abstract = "Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.",
keywords = "113 Computer and information sciences",
author = "Kai Puolam{\"a}ki and Mikael Fortelius and Heikki Mannila",
year = "2006",
doi = "10.1371/journal.pcbi.0020006",
language = "English",
volume = "2",
pages = "1--9",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "PUBLIC LIBRARY OF SCIENCE",
number = "2",

}

Seriation in paleontological data using Markov Chain Monte Carlo methods. / Puolamäki, Kai; Fortelius, Mikael; Mannila, Heikki.

julkaisussa: PLoS Computational Biology, Vuosikerta 2, Nro 2, 2006, s. 1-9.

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

TY - JOUR

T1 - Seriation in paleontological data using Markov Chain Monte Carlo methods

AU - Puolamäki, Kai

AU - Fortelius, Mikael

AU - Mannila, Heikki

PY - 2006

Y1 - 2006

N2 - Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.

AB - Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.

KW - 113 Computer and information sciences

U2 - 10.1371/journal.pcbi.0020006

DO - 10.1371/journal.pcbi.0020006

M3 - Article

VL - 2

SP - 1

EP - 9

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 2

ER -