Seminars
IDIES Colloquium Series
IDIES occasionally organizes colloquia by experts in the field on recent advances in data-intensive science.
This page announces upcoming colloquia, and provides a record of previous colloquia.
Spring 2010
Tuesday, May 4, 2010
1:30-2:30 PM, Bloomberg 462
Roger Barga, (Microsoft Research)
Cloud Computing and Research Applications
Abstract
Cloud computing uses data centers to provide on-demand access to services such as data storage and
hosted applications that provide scalable web services and large scale scientific data analysis. While the
architecture of a data center is similar to a conventional supercomputer, they are designed with very
different goals. This talk will cover the basic cloud computing system architectures and the application
programming models, beginning with general concepts of data center architecture including the use of
virtualization and the role of low power. We next examine cloud computing and storage models with a
detailed look at the Microsoft Azure cloud computing platform. The talk concludes with an overview of our
ongoing efforts to leverage the power of the Microsoft Azure cloud computing platform to address some of
the most challenging problems in data intensive research.
Speaker Biography
Roger Barga is currently senior architect of the Cloud Computing Futures (CCF) group in Microsoft Research (MSR),
where he leads a technical engagements team that works with external researchers interested in carrying out
large scale computing, scientific research, and data analytics on the Windows Azure cloud platform.
Previously Roger led the Advanced Research Tools and Services (ARTS) team in MSR which built innovative
services and tools for data intensive research. Roger joined Microsoft in 1997 as a Researcher in the
Database Group of Microsoft Research, where he participated in both systems research and product development
efforts in database, workflow and stream processing systems. Contact him at
barga@microsoft.com or at
his web site.
Wednesday, April 14, 2010
2:30-3:30 PM, Bloomberg 475
Ciprian M. Crainiceanu, (Department of Biostatistics, JHU)
Statistical analysis of populations of images
Abstract
Images, often stored in 2- and 3-dimensional arrays are fast becoming
ubiquitous in medical and public health research. Analyzing populations of images is a
statistical problem that raises a host of daunting challenges. The most severe
challenge is that data sets incorporating images recorded for hundreds or thousands
of subjects at multiple visits are massive. We introduce the population
value decomposition (PVD), a general method for simultaneous dimensionality
reduction of large populations of massive images. We show how PVD can seamlessly
be incorporated into statistical modeling and lead to a new, transparent
and fast inferential framework. Our methodology was motivated by and applied
to the Sleep Heart Health Study, the largest community-based cohort study of
sleep containing more than 85 billion observations on thousands of subjects at two
visits. We also show an application to one of the largest observational studies
incorporating fMRI measurements on hundreds of subjects.
Spring 2009
Tuesday, May 19, 2009
10:45-11:45 AM, CSEB B-17
José Blakeley, (Database Systems Group, Microsoft)
Data Management for High-Throughput Genomics
Abstract
Today's sequencing technology allows sequencing an individual
genome within a few weeks for a fraction of the costs of the original
Human Genome project. Genomics labs are faced with dozens of TB of data
per week that have to be automatically processed and made available to
scientists for further analysis. This talk explores the potential and
the limitations of using relational database systems as the data
processing platform for high-throughput genomics. In particular, we are
interested in the storage management for high-throughput sequence data
and in leveraging SQL and user-defined functions for data analysis
inside a database system. We give an overview of a database design for
high-throughput genomics, how we used a SQL Server database in some
unconventional ways to prototype this scenario, and we will discuss some
initial findings about the scalability and performance of such a more
database-centric approach.
Bio
José Blakeley is a Software Architect in the Database Systems Group in
the SQL Server Division. Jose's current interest is on data management
for large-scale DW and OLTP workloads. Jose manages a small advanced
development team exploring the impact of new hardware (large memory,
multi-core, FLASH/NVRAM) on DBMS architectures. In the past, Jose has
worked on many aspects of DBMS technology including client and server
programmability, database engine extensibility, query processing,
object-relational mapping, and scientific database applications. Some
of his contributions at Microsoft include the creation of the OLE DB
data access interfaces, the integration of the CLR in SQL Server 2005,
the extensibility mechanisms in SQL Server, and the creation of the
ADO.NET Entity Framework. José has authored many conference papers, book
chapters and journal articles on DBMS technology. Jose holds over 20
patents. He received a B. Eng from ITESM, Monterrey, Mexico, and a Ph.D.
in computer science from University of Waterloo, Canada.
Thursday, April 23, 2009
1:30-2:30 PM, Bloomberg 475
David Luebke (NVIDIA)
Graphics, Hardware, and GPU Computing: Past, Present, and Future
Abstract
Modern GPUs have emerged as the world's most successful parallel architecture.
GPUs provide a level of massively parallel computation that was once the preserve
of supercomputers like the MasPar and Connection Machine. For example, NVIDIA's
GeForce GTX 280 is a fully programmable, massively multithreaded chip with up to
240 cores, 30,720 threads and capable of performing up to a trillion operations
per second. The raw computational horsepower of these chips has expanded their
reach well beyond graphics. Today's GPUs not only render video game frames, they
also accelerate physics computations, video transcoding, image processing,
astrophysics, protein folding, seismic exploration, computational finance,
radioastronomy - the list goes on and on. Enabled by platforms like the CUDA
architecture, which provides a scalable programming model, researchers across
science and engineering are accelerating applications in their discipline by up to
two orders of magnitude. These success stories, and the tremendous scientific and
market opportunities they open up, imply a new and diverse set of workloads that
in turn carry implications for the evolution of future GPU architectures.
In this talk I will discuss the evolution of GPUs from fixed-function graphics
accelerators to general-purpose massively parallel processors. I will briefly
motivate GPU computing and explore the transition it represents in massively
parallel computing: from the domain of supercomputers to that of commodity
"manycore" hardware available to all. I will discuss the goals, implications, and
key abstractions of the CUDA architecture. Finally I will close with a discussion
of future workloads in games, high-performance computing, and consumer
applications, and their implications for future GPU architectures.
Bio
David Luebke helped found NVIDIA Research in 2006 after eight years on the faculty
of the University of Virginia. Luebke received his Ph.D. under Fred Brooks at the
University of North Carolina in 1998. His principal research interests are GPU
computing and real-time computer graphics. Luebke's honors include the NVIDIA
Distinguished Inventor award, the NSF CAREER and DOE Early Career PI awards, and
the ACM Symposium on Interactive 3D Graphics "Test of Time Award". Dr. Luebke has
co-authored a book, a SIGGRAPH Electronic Theater piece, a major museum exhibit
visited by over 110,000 people, and dozens of papers, articles, chapters, and
patents.
Fall 2008
Thursday, September 18, 2008
Joint Colloquium Physics & Astronomy and Dept. of Computer Science
Gordon Bell (Microsoft Research Silicon Valley)
Realizing Memex ... Digital Capture,
Storage, and Utilization of All Personal Information
Schafler Auditorium, Bloomberg Hall, 3:00 - 4:00 PM
Friday, October 3, 2008
CEAFM seminar
John Clyne (National Center for Atmospheric Research)
VAPOR: A desktop environment for
interactive exploration of large scale CFD simulation data
110 Maryland Hall, 11:00 AM - Noon
|