SciServer Compute: Bringing Analysis Close to the Data

Mike Rippin, PhD, Institute for Data Intensive Engineering and Science, Johns Hopkins University

SciServer Compute is a recent addition to SciServer, a Big Data infrastructure project developed at Johns Hopkins University that provides a common environment for sharable, computationally-intensive research. SciServer Compute implements Jupyter notebooks in Docker containers to bring advanced analysis capabilities close to Terabyte-scale relational databases and Petabyte-scale file storage systems. In addition to real-time analysis in Jupyter Notebooks with Python, R, and Matlab, SciServer Compute delivers an API for asynchronous tasks in persistent Docker containers. Compute adds new libraries for CasJobs, an asynchronous free-form database querying tool, as well as libraries to access data on hosted and local file storage systems. SciServer’s MyScratch provides Terabytes of temporary storage space, while SciDrive offers a Dropbox-like interface for long-term storage of scientific results. These components are accessible through the single sign-on Login Portal. SciServer supports many scientific disciplines, incorporating large databases and file collections from Astronomy, Cosmology, Turbulence, Genomics, Oceanography, and Materials Science. SciServer’s strength stems from efficient data flow and integration between components of the SciServer system: file storage, database, and analysis with various large scientific datasets.