Skip to main content
QUICK REVIEW

[Paper Review] Haplotype-based variant detection from short-read sequencing

Erik Garrison, Gábor Marth|arXiv (Cornell University)|Jul 17, 2012
Gene expression and cancer classification23 references4,047 citations
TL;DR

The paper develops a Bayesian framework for detecting haplotypes from short-read sequencing and implements it in FreeBayes to handle multiallelic loci and non-uniform copy number.

ABSTRACT

The direct detection of haplotypes from short-read DNA sequencing data requires changes to existing small-variant detection methods. Here, we develop a Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number. We then describe our implementation of this framework in a haplotype-based variant detector, FreeBayes.

Motivation & Objective

  • Motivate haplotype-based variant detection to utilize short-range phasing information from sequencing traces.
  • Generalize variant detection to multiallelic loci and non-uniform copy number across samples.
  • Develop a Bayesian model to compute P(G1,...,Gn|R1,...,Rn) incorporating data likelihood and priors.
  • Implement a haplotype-based detector (FreeBayes) and provide posterior quality metrics.
  • Enable direct detection of longer haplotypes and improve genotyping accuracy via local imputation concepts.

Proposed method

  • Define n samples with copy number mi and M total copies, and K alleles with frequencies fi at a locus.
  • Extend Bayes’ rule to P(G1,...,Gn|R1,...,Rn) with data likelihood P(Ri|Gi) and priors based on population allele frequencies using Ewens’ sampling formula.
  • Compute P(Ri|Gi) by accounting for observed alleles from reads, using multinomial sampling adjusted for base qualities and mapping qualities.
  • Decompose priors into P(G1,...,Gn|f1,...,fk) and P(f1,...,fk), adjusting for unphased genotypes and using the multinomial coefficient with allele frequencies.
  • Apply Ewens’ sampling formula to approximate P(f1,...,fk) under a neutral mutation-drift model with parameter θ.
  • Assemble haplotype observations within dynamically determined windows, anchored by reference sequence, and compute P(G1,...,Gn|R1,...,Rn) via gradient ascent to a maximum a posteriori solution.
  • Provide outputs including the locus polymorphism probability P(K>1|R1,...,Rn) and marginal genotype likelihoods P(Gj|Ri,...,Rn).

Experimental results

Research questions

  • RQ1Can multiallelic loci and non-uniform copy number be modeled within a Bayesian haplotype framework for variant detection?
  • RQ2Does incorporating population-level priors and phasing information improve haplotype-based variant detection from short reads?
  • RQ3Can longer haplotypes be detected directly from short-read data by assembling local haplotype observations?
  • RQ4How effective is the method at distinguishing true haplotypes from sequencing errors using base/ mapping qualities?
  • RQ5What are the quality outputs (polymorphism probability and marginal genotype likelihoods) produced by the method?

Key findings

  • A Bayesian framework is developed to model multiallelic loci and non-uniform copy number for haplotype-based variant detection.
  • The approach generalizes prior and likelihood computations to handle unphased genotypes and uses Ewens’ sampling formula to estimate allele frequency priors.
  • A haplotype detector (FreeBayes) assembles haplotype observations in dynamic windows and uses gradient ascent to find a maximum a posteriori multi-sample genotype.
  • The method yields a posterior probability of polymorphism at a locus, P(K>1|R1,...,Rn), and provides marginal genotype likelihoods for individuals.
  • Incorporating local imputation-like refinement improves raw genotype quality over purely maximum-likelihood approaches.
  • The framework enables direct detection of longer haplotypes from short-read sequencing data by modeling multiallelic haplotypes within a unified Bayesian context.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.