QUICK REVIEW

[论文解读] Haplotype-based variant detection from short-read sequencing

Erik Garrison, Gábor Marth|arXiv (Cornell University)|Jul 17, 2012

Gene expression and cancer classification参考文献 23被引用 4,047

一句话总结

该论文开发了一个贝叶斯框架，用于从短读序列中检测单倍型，并在 FreeBayes 中实现，以处理多等位基因位点和非均匀拷贝数。

ABSTRACT

The direct detection of haplotypes from short-read DNA sequencing data requires changes to existing small-variant detection methods. Here, we develop a Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number. We then describe our implementation of this framework in a haplotype-based variant detector, FreeBayes.

研究动机与目标

Motivate haplotype-based variant detection to utilize short-range phasing information from sequencing traces.
Generalize variant detection to multiallelic loci and non-uniform copy number across samples.
Develop a Bayesian model to compute P(G1,...,Gn|R1,...,Rn) incorporating data likelihood and priors.
Implement a haplotype-based detector (FreeBayes) and provide posterior quality metrics.
Enable direct detection of longer haplotypes and improve genotyping accuracy via local imputation concepts.

提出的方法

Define n samples with copy number mi and M total copies, and K alleles with frequencies fi at a locus.
Extend Bayes’ rule to P(G1,...,Gn|R1,...,Rn) with data likelihood P(Ri|Gi) and priors based on population allele frequencies using Ewens’ sampling formula.
Compute P(Ri|Gi) by accounting for observed alleles from reads, using multinomial sampling adjusted for base qualities and mapping qualities.
Decompose priors into P(G1,...,Gn|f1,...,fk) and P(f1,...,fk), adjusting for unphased genotypes and using the multinomial coefficient with allele frequencies.
Apply Ewens’ sampling formula to approximate P(f1,...,fk) under a neutral mutation-drift model with parameter θ.
Assemble haplotype observations within dynamically determined windows, anchored by reference sequence, and compute P(G1,...,Gn|R1,...,Rn) via gradient ascent to a maximum a posteriori solution.
Provide outputs including the locus polymorphism probability P(K>1|R1,...,Rn) and marginal genotype likelihoods P(Gj|Ri,...,Rn).

实验结果

研究问题

RQ1Can multiallelic loci and non-uniform copy number be modeled within a Bayesian haplotype framework for variant detection?
RQ2Does incorporating population-level priors and phasing information improve haplotype-based variant detection from short reads?
RQ3Can longer haplotypes be detected directly from short-read data by assembling local haplotype observations?
RQ4How effective is the method at distinguishing true haplotypes from sequencing errors using base/ mapping qualities?
RQ5What are the quality outputs (polymorphism probability and marginal genotype likelihoods) produced by the method?

主要发现

A Bayesian framework is developed to model multiallelic loci and non-uniform copy number for haplotype-based variant detection.
The approach generalizes prior and likelihood computations to handle unphased genotypes and uses Ewens’ sampling formula to estimate allele frequency priors.
A haplotype detector (FreeBayes) assembles haplotype observations in dynamic windows and uses gradient ascent to find a maximum a posteriori multi-sample genotype.
The method yields a posterior probability of polymorphism at a locus, P(K>1|R1,...,Rn), and provides marginal genotype likelihoods for individuals.
Incorporating local imputation-like refinement improves raw genotype quality over purely maximum-likelihood approaches.
The framework enables direct detection of longer haplotypes from short-read sequencing data by modeling multiallelic haplotypes within a unified Bayesian context.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。