Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm

    Research output: Contribution to journalArticleScientificpeer-review

    Abstract

    Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates Lire also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.
    Original languageEnglish
    JournalGenetics Research
    Volume90
    Pages (from-to)509-524
    Number of pages16
    ISSN0016-6723
    DOIs
    Publication statusPublished - 2008
    MoE publication typeA1 Journal article-refereed

    Fields of Science

    • 111 Mathematics

    Cite this

    @article{3be633390f5c4feead6a1f1f1632c8a1,
    title = "Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm",
    abstract = "Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates Lire also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.",
    keywords = "111 Mathematics",
    author = "Matti Pirinen and Sangita Kulathinal and Dario Gasbarra and Sillanp{\"a}{\"a}, {Mikko J}",
    year = "2008",
    doi = "10.1017/S0016672308009877",
    language = "English",
    volume = "90",
    pages = "509--524",
    journal = "Genetics Research",
    issn = "0016-6723",
    publisher = "Cambridge University Press",

    }

    Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm. / Pirinen, Matti; Kulathinal, Sangita; Gasbarra, Dario; Sillanpää, Mikko J.

    In: Genetics Research, Vol. 90, 2008, p. 509-524.

    Research output: Contribution to journalArticleScientificpeer-review

    TY - JOUR

    T1 - Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm

    AU - Pirinen, Matti

    AU - Kulathinal, Sangita

    AU - Gasbarra, Dario

    AU - Sillanpää, Mikko J

    PY - 2008

    Y1 - 2008

    N2 - Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates Lire also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.

    AB - Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates Lire also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.

    KW - 111 Mathematics

    U2 - 10.1017/S0016672308009877

    DO - 10.1017/S0016672308009877

    M3 - Article

    VL - 90

    SP - 509

    EP - 524

    JO - Genetics Research

    JF - Genetics Research

    SN - 0016-6723

    ER -