Genomics Data Explosion will Require Proactive Data Management Strategy

IDIES Affiliate Michael Schatz extrapolates current Genomics data growth rates and asserts that genomicists must create strategies now to avoid becoming inundated with data.

Genomics, a science that didn’t exist 15 years ago, is on course to join Astronomy, Twitter, and YouTube as frontrunners of Big Data. In the July 7, 2015 issue of PLOS Biology, IDIES affiliate Michael Schatz and his co-authors document how Genomics data and the demands of working with it have increased at an astounding rate. According to Professor Schatz and his colleagues, knowing that Genomics is growing at this unprecedented pace will enable genomicists to develop a strategies to capture, store, process, and interpret genomics data.

Genomics may soon generate more electronic bytes per year than any other field. Even by conservative estimates, genomicists predict that by the end of the decade more than one billion billion (1 exabase) base pairs will be sequenced. With the global trend to sequence large populations of genomes, as well as the strides in DNA sequencing technology, it is likely that this estimate is conservative. Professor Schatz, who joined Johns Hopkins in January 2016 as an Associate Research Professor of Computer Science, is a co-author of the PLOS paper. The authors estimate that genomics data may experience growth of four to five orders of magnitude by 2025, making it one of the fastest-growing Big Data sciences.

PLOS Article