Likelihood Approaches to Structural Variant Discovery: From Integrating Signals in Individual Genomes Towards Simultaneous Variant Discovery in Populations

  • Suzanne Sindi
  • A Genomics@JHU Seminar
  • When: April 26, 2016, 10:30
  • Where: Bloomberg School of Public Health
    9th Floor Room E6519
    615 Wolfe St.
    Baltimore, MD 21205
  • Light refreshments served at 10:00 am


Structural variants (SVs) – such as deletions, insertions, copy-number gains and inversions – are rearrangements of a region of DNA relative to a reference. Until relatively recently, SVs were thought to be rare in genomes of healthy individuals, especially mammals. However, advances in high-throughput DNA sequencing, combined with the availability of high-quality reference genomes, has demonstrated SVs to be common even in healthy individuals. Decreasing costs in DNA sequencing have allowed researchers to sequence many individual genomes in a species, but typically SVs are still analyzed in a single individual and combined across individuals as a secondary step.

The dominant approach for SV prediction consists of paired-end (PE) sequencing of single target genome and mapping the resulting sequences to a reference genome. SVs are predicted through analysis of mapped read configuration. Early computational approaches to SV discovery focused on one of several PE signals consistent with an SV: discordant PEs, read depth and split read mappings. However, as we demonstrated with our computational approach GASV-Pro (Sindi; 2012) combining discordant PEs with read depth in a likelihood framework substantially reduces false positive predictions. Indeed, our likelihood approach allows for integration of data from multiple sequencing technologies (Ritz; 2014)

Today, I will given an overview of computational methods for SV discovery and discuss two novel likelihood based approaches under development. The first employs a Hidden Markov Model (HMMs) for split-read alignment allowing for a likelihood model consisting of all three common signals for SV prediction in a single individual. The second addresses simultaneous prediction of SVs in populations including related individuals by framing SV prediction in populations as a constrained optimization problem. We solve this problem through matrix-free quasi-Newton interior-point methods using relatedness between individuals to constrain the predictions and employ the l1 norm to enforce sparsity.

Genomics @ JHU Seminar Series

View All Events