David Elbert* 1, Nick Carey 2, Tamas Budavari 3, Brian Schuster 4,  Hopkins Extreme Materials Institute,  Computer Sciences, Johns Hopkins University,  Applied Mathematics and Statistics, Johns Hopkins University,  Army Research Laboratory
Materials science and engineering faces Big Data challenges from advances in instrumentation and computational modeling. In two projects we use SciServer for cloud-based, materials-specific infrastructure to provide data curation, visualization, and analysis to diverse, materials-domain workgroups.
In the MEDE Data Science Cloud (MEDE-DSC), SciServer provides data-centric computing infrastructure with collaborative integration into the materials design loop. Shared data are accessible from local, containerized computational tools using a web based, Jupyter frontend. Version-controlled containers and notebooks bring power, consistency and transparency while moving towards reproducible, narrated computation. RESTful APIs provide integration to other MGI resources. Developing Globus integration is an important step to manage streaming data ingress from diverse sources.
In the PARADIM Data Collective (PDC), SciServer provides a data-driven, collaborative discovery platform with data browsing, federation, and analysis. PDC infrastructure development focuses on event-triggered data ingress, metadata harvesting, and processing. The PDC will provide data browsing, visualization and compute tools in a JupyterLab environment. The PDC integrates Machine Learning to the data environment to optimize opportunities across experiments.