Materials In Extreme Environments (MEDE) Data Science Integration: MEDE Data Science Cloud, version 1

David C. Elbert*,1, Nicholas S. Carey2, Aavik Pakrasi and Tamás Budavári3, 2, 4 **
, 1. Department of Earth and Planetary Sciences, 2. Department of Computer Science, 3. Department of Applied Mathematics and Statistics, 4. Hopkins Extreme Materials Institute. Johns Hopkins University, Baltimore, MD 21218

Poster

The first five years of the Materials Genome Initiative (MGI) has been marked by significant advances in access to research data across the materials domain. In addition, there is broad recognition that materials sciences and engineering is confronting issues of rapidly expanding data scale and scope. Such Big Data come from advances on many fronts including: instrumentation capable of acquisition with shorter time scales, higher resolution, and higher dimensionality; computing power and approaches facilitating larger, faster simulations; and processing control allowing tighter regulation and opportunities to produce more refined materials at lower costs.

At Hopkins we’ve developed the Materials in Extreme Dynamic Environments Data Science Cloud (MEDE-DSC) to address the need for robust, sustainable data-science tools in the materials domain. The MEDE-DSC combines computing infrastructure with collaborative integration into the materials design loop. The focus of the project aligns with MGI strategic goals to facilitate access to materials data; to build data science skills in the materials domain; and to create tools that help materials scientists link experiments, computation, and theory. This focus guides the project commitment to bring data science tools to materials domain researchers where domain knowledge and expertise guide meaningful materials research.

MEDE-DSC infrastructure is built on the SciServer platform. SciServer, an NSF Data Infrastructure Building Block (DIBB) center, combines core components for Big Data storage and computation to bring the computation to the data. In our implementation we focus on delivering materials science tools in a simple, robust package. The computing environment utilizes preloaded Docker containers built on the SciServer virtual machine, Linux architecture. Materials scientists and engineers access computing tools and data through a versatile, expandable Jupyter Notebook architecture. The combination of containers and notebooks brings power, consistency and clarity while moving towards reproducible, narrated computation. Ultimately, our hope is that MEDE-DSC’s Big Data tools provide materials scientists the opportunity to design a new class of research that fully utilizes modern instrumentation and simulation capabilities.

* Contact info: elbert@jhu.edu

**We want to acknowledge the SciServer team at Johns Hopkins, especially Gerard Lemson, Mitya Medvedev, Manu Popp, Mike Rippin, and Alex Szalay for their help and responsiveness. MEDE-DSC development is sponsored by the Army Research Laboratory Cooperative Agreement Number W911NF-12-2-0022.