Approaching Genomics Data from Hundreds of Dimensions Simultaneously (This is not a Faster Horse)

Sarah J. Wheelan, MD, PhD, Assistant Professor, Oncology Bioinformatics, Johns Hopkins University

Alignment to The Cancer Genome Atlas Project Raw Sequencing Reads: With skyrocketing numbers of whole genome sequence and phenotype data available from individuals’ germline and diseased cells, we need a new framework for understanding genomics data. Using the Data-Scope (a data-intensive supercomputer, funded by the NSF), we aim to detect sets of nucleotide-level variations that best classify given phenotypes. Next, we can find covarying or spatially correlated genomic variations across the entire dataset or within phenotypes. Our final goal, and the most powerful application of these data and algorithms, is to use unsupervised methods to delineate genomic variants that discriminate subsets of the data, without regard to phenotypes.