IDIES home page Johns Hopkins University home page

Seminars

IDIES Colloquium Series

IDIES occasionally organizes colloquia by experts in the field on recent advances in data-intensive science. This page announces upcoming colloquia, and provides a record of previous colloquia.


Spring 2010

Tuesday, May 4, 2010
1:30-2:30 PM, Bloomberg 462
Roger Barga, (Microsoft Research)
Cloud Computing and Research Applications

Abstract

Cloud computing uses data centers to provide on-demand access to services such as data storage and hosted applications that provide scalable web services and large scale scientific data analysis. While the architecture of a data center is similar to a conventional supercomputer, they are designed with very different goals. This talk will cover the basic cloud computing system architectures and the application programming models, beginning with general concepts of data center architecture including the use of virtualization and the role of low power. We next examine cloud computing and storage models with a detailed look at the Microsoft Azure cloud computing platform. The talk concludes with an overview of our ongoing efforts to leverage the power of the Microsoft Azure cloud computing platform to address some of the most challenging problems in data intensive research.

Speaker Biography

Roger Barga is currently senior architect of the Cloud Computing Futures (CCF) group in Microsoft Research (MSR), where he leads a technical engagements team that works with external researchers interested in carrying out large scale computing, scientific research, and data analytics on the Windows Azure cloud platform. Previously Roger led the Advanced Research Tools and Services (ARTS) team in MSR which built innovative services and tools for data intensive research. Roger joined Microsoft in 1997 as a Researcher in the Database Group of Microsoft Research, where he participated in both systems research and product development efforts in database, workflow and stream processing systems. Contact him at barga@microsoft.com or at his web site.


Wednesday, April 14, 2010
2:30-3:30 PM, Bloomberg 475
Ciprian M. Crainiceanu, (Department of Biostatistics, JHU)
Statistical analysis of populations of images

Abstract

Images, often stored in 2- and 3-dimensional arrays are fast becoming ubiquitous in medical and public health research. Analyzing populations of images is a statistical problem that raises a host of daunting challenges. The most severe challenge is that data sets incorporating images recorded for hundreds or thousands of subjects at multiple visits are massive. We introduce the population value decomposition (PVD), a general method for simultaneous dimensionality reduction of large populations of massive images. We show how PVD can seamlessly be incorporated into statistical modeling and lead to a new, transparent and fast inferential framework. Our methodology was motivated by and applied to the Sleep Heart Health Study, the largest community-based cohort study of sleep containing more than 85 billion observations on thousands of subjects at two visits. We also show an application to one of the largest observational studies incorporating fMRI measurements on hundreds of subjects.


Spring 2009

Tuesday, May 19, 2009
10:45-11:45 AM, CSEB B-17
José Blakeley, (Database Systems Group, Microsoft)
Data Management for High-Throughput Genomics

Abstract

Today's sequencing technology allows sequencing an individual genome within a few weeks for a fraction of the costs of the original Human Genome project. Genomics labs are faced with dozens of TB of data per week that have to be automatically processed and made available to scientists for further analysis. This talk explores the potential and the limitations of using relational database systems as the data processing platform for high-throughput genomics. In particular, we are interested in the storage management for high-throughput sequence data and in leveraging SQL and user-defined functions for data analysis inside a database system. We give an overview of a database design for high-throughput genomics, how we used a SQL Server database in some unconventional ways to prototype this scenario, and we will discuss some initial findings about the scalability and performance of such a more database-centric approach.

Bio

José Blakeley is a Software Architect in the Database Systems Group in the SQL Server Division. Jose's current interest is on data management for large-scale DW and OLTP workloads. Jose manages a small advanced development team exploring the impact of new hardware (large memory, multi-core, FLASH/NVRAM) on DBMS architectures. In the past, Jose has worked on many aspects of DBMS technology including client and server programmability, database engine extensibility, query processing, object-relational mapping, and scientific database applications. Some of his contributions at Microsoft include the creation of the OLE DB data access interfaces, the integration of the CLR in SQL Server 2005, the extensibility mechanisms in SQL Server, and the creation of the ADO.NET Entity Framework. José has authored many conference papers, book chapters and journal articles on DBMS technology. Jose holds over 20 patents. He received a B. Eng from ITESM, Monterrey, Mexico, and a Ph.D. in computer science from University of Waterloo, Canada.


Thursday, April 23, 2009
1:30-2:30 PM, Bloomberg 475
David Luebke (NVIDIA)
Graphics, Hardware, and GPU Computing: Past, Present, and Future

Abstract

Modern GPUs have emerged as the world's most successful parallel architecture. GPUs provide a level of massively parallel computation that was once the preserve of supercomputers like the MasPar and Connection Machine. For example, NVIDIA's GeForce GTX 280 is a fully programmable, massively multithreaded chip with up to 240 cores, 30,720 threads and capable of performing up to a trillion operations per second. The raw computational horsepower of these chips has expanded their reach well beyond graphics. Today's GPUs not only render video game frames, they also accelerate physics computations, video transcoding, image processing, astrophysics, protein folding, seismic exploration, computational finance, radioastronomy - the list goes on and on. Enabled by platforms like the CUDA architecture, which provides a scalable programming model, researchers across science and engineering are accelerating applications in their discipline by up to two orders of magnitude. These success stories, and the tremendous scientific and market opportunities they open up, imply a new and diverse set of workloads that in turn carry implications for the evolution of future GPU architectures.

In this talk I will discuss the evolution of GPUs from fixed-function graphics accelerators to general-purpose massively parallel processors. I will briefly motivate GPU computing and explore the transition it represents in massively parallel computing: from the domain of supercomputers to that of commodity "manycore" hardware available to all. I will discuss the goals, implications, and key abstractions of the CUDA architecture. Finally I will close with a discussion of future workloads in games, high-performance computing, and consumer applications, and their implications for future GPU architectures.

Bio

David Luebke helped found NVIDIA Research in 2006 after eight years on the faculty of the University of Virginia. Luebke received his Ph.D. under Fred Brooks at the University of North Carolina in 1998. His principal research interests are GPU computing and real-time computer graphics. Luebke's honors include the NVIDIA Distinguished Inventor award, the NSF CAREER and DOE Early Career PI awards, and the ACM Symposium on Interactive 3D Graphics "Test of Time Award". Dr. Luebke has co-authored a book, a SIGGRAPH Electronic Theater piece, a major museum exhibit visited by over 110,000 people, and dozens of papers, articles, chapters, and patents.


Fall 2008

Thursday, September 18, 2008
Joint Colloquium Physics & Astronomy and Dept. of Computer Science
Gordon Bell (Microsoft Research Silicon Valley)
Realizing Memex ... Digital Capture, Storage, and Utilization of All Personal Information
Schafler Auditorium, Bloomberg Hall, 3:00 - 4:00 PM

Friday, October 3, 2008
CEAFM seminar
John Clyne (National Center for Atmospheric Research)
VAPOR: A desktop environment for interactive exploration of large scale CFD simulation data
110 Maryland Hall, 11:00 AM - Noon