Diffusion-based Interactions in Noisy Single Cell Data

  • Smita Krishnaswamy
  • A Genomics@JHU Seminar
  • When: November 14, 2016, 13:30
  • Where: Charles Commons conference center
    10 E 33rd Street
    Baltimore, MD 21218
  • Light refreshments served at 1:00 PM


smita-krishnaswamyRecently, there have been significant advances in single-cell genomic and proteomic technologies that can measure the expression of thousands of mRNA transcripts and dozens of proteins. However, this data suffers from sparsity and noise. Furthermore, its high dimensionality makes interpreting the data difficult for biologists. Our aim is to facilitate interpretation by providing a set of tools and novel algorithms that allow biologists to extract meaningful and predictive information from the data, and which yield clear and concise visualizations. In my lab we utilize a diffusion framework to learn the manifold geometry of the data. This framework models the local affinity between high-dimensional data points using a kernel function and then utilizes graph diffusion to form long range connections and paths through the data. Our framework has used to impute and correct noisy single-cell RNA-sequencing data, using a method called MAGIC that utilizes the Markov diffusion operator that is part of the framework that models cellular neighborhoods. Then we extend this method to a family of novel transformations and algorithms performed on the Markov diffusion operator in order to emphasize several types of patterns in the data. First, we perform high dimensional interaction analysis between genes, pathways and modules using mutual information estimated from the high dimensional density analysis estimated via the steady state eigenvector of the diffusion operator. Secondly, we derive a new embedding, which we call PHATE, to map progressions in the data via a transformation and re-embedding of the diffusion operator. In our re-embedding, paths of data progression are emphasized to reveal differentiation and cell-state trajectories, as well as relationships between genes. Then, we propose a data condensation process by which we continually change the diffusion process such that cluster structures at all scales are revealed. We show our algorithms on several biological systems. However, due to the generic nature of our methods they can be used in any system to enable the efficient and widespread analysis of single-cell data by revealing several types of structures in single-cell data.


Smita Krishnaswamy was trained as a computer scientist with a Ph.D. from the University of Michigan’s EECS department where her research focused on algorithms for automated synthesis and verification of nanoscale logic circuits that exhibit probabilistic effects. During her Ph.D., she received a best paper award at DATE 2005 (a top conference in the field of design automation), and an outstanding dissertation award. She published numerous first-author papers on probabilistic network models and algorithms for VLSICAD. In addition, her dissertation was published as a book by Springer in 2013. Following her Ph.D., she joined IBM’s TJ Watson Research Center as a scientist in the systems division, where she focused on formal methods for automated error detection. Her Deltasyn algorithm was eventually utilized in IBMs p and z series high-performance chips. She then switched her research efforts to biology. Her postdoctoral training was completed at Columbia University in the systems biology department where she focused on learning computational models of cellular signaling from single-cell mass cytometry data.
Although technologies such as mass cytometry, and single-cell RNA sequencing, are able to generate high-dimensional high-throughput single-cell data, the computational, modeling and visualization techniques needed to analyze and make sense of this data are still lacking. Smita’s research addresses this challenge by developing scalable computational methods for analyzing and learning predictive network models from massive biological datasets. Her methods for characterizing interactions in cellular signaling networks, published in a recent Science paper, reveal the computation performed by cells as they process signals in terms of stochastic response functions. Smita, along with experimental collaborators, have applied these methods to T cell signaling and have found that signaling response functions are reconfigured through differentiation and disease. For example, Smita and her collaborators found that subtle alterations in receptor-proximal signaling in non-obese diabetic (NOD) mice are amplified through signaling cascades leading to larger defects in downstream signals responsible for damping immune response. Her ongoing work involves creating more sophisticated and accurate models of transformational biological processes by combining both single-cell signaling and genomic data. At Yale, she is creating a forward-looking and interdisciplinary research group that is focused on developing computational techniques to solve today’s challenging biological and medical problems. (biography cited from Yale School of Medicine)

Genomics @ JHU Seminar Series

View All Events