Working with Public Datasets Workshop
Friday, April 30, 2021, 1:00-4:00pm
GERARD LEMSON, IDIES Director of Science
SAYEED CHOUDHURY, Associate Dean for Research Data Management
TINGLONG DAI, Associate Professor, Carey Business School
|Welcome & Introductions:|
|1:00 – 1:25 pm||Tinglong Dai, Associate Professor, Carey Business School||Welcome & Event Background|
|Gerard Lemson, IDIES Director of Science, SciServer Lead||IDIES & SciServer, what we can offer|
|Mara Blake, Data Services Manager, Sheridan Libraries||Overview of the Sheridan Data Services Offerings|
|Deep Dive Research Presentations: Q&A to follow each presentation|
|1:25 – 2:45 pm||Andrew Ching, Professor, Carey Business School||Consumption Responses to an Unpopular Policy: Evidence from a Short-lived Soda Tax|
|Curt Cronister, Senior Data Manager, Baltimore Education Research Consortium (BERC)||What (When) (Where) is a School? Using Public Data to Contextualize Schools in Educational Research|
|2:45 – 3:00 pm||BREAK|
|3:00 – 4:00 pm||Panel Discussion|
|Sayeed Choudhury, Associate Dean for Research Data Management; Hodson Director of the Digital Research and Curation Center|
|Dr. Jim Kyung-Soo Liew, Associate Professor of Finance, Carey Business School|
|Jose Arrieta, former Chief Information Officer of the United States Department of Health and Human Services; Adjunct Professor, Carey Business School|
|Charles Meneveau, the Louis M. Sardella Professor of Mechanical Engineering, Professor, Department of Physics and Astronomy (joint appointment), Professor, Department of Environmental Health and Engineering (joint appointment), Associate Director, Institute for Data Intensive Engineering and Science|
|Marc Stein, Associate Professor, School of Education; Managing Director of the Baltimore Education Research Consortium (BERC)|
Have you ever wanted to use “public” datasets in your research, and just weren’t sure how or where to start?
Is the work involved in handling these data sets overwhelming your local computer resources? Does it require technical expertise that you do not have available? Do you want to collaborate with colleagues but have problems efficiently sharing access to the data and processing pipelines? Do you want to connect your data with other similar data sets, but does this just multiply the problems?
IDIES has created an afternoon workshop that will lead you through what the possibilities are of working with public datasets and will discuss options how IDIES and the Sheridan Libraries can assist in those efforts.
A dataset is considered public if it is made available, most often online, to the general public, though access does not have to be free or without requiring registration. Importantly, and in contrast to most IDIES-published data sets so far, we will focus on data sets that were not created for some explicit scientific project. For example: governmental, public service or commercial reasons. These datasets were not created with your specific scientific research in mind, which generally means much work is required to put them in a form suitable for your analysis. This is even more so when different public data sets should be combined together.
During this workshop some real-world examples will be presented, using public datasets, how they have been applied in past and ongoing research. A panel will explore how IDIES could assist researchers to obtain and analyze such data sets. For example could IDIES collect a variety of the most interesting public datasets, to be accessed and analyzed in a single place? Using advanced storage capabilities, providing simple interfaces for accessing, and analyzing the data using the compute resources IDIES offers through SciServer.
Possible types of datasets:
- Social Media
- Transportation and mobility
- Audio, images, and videos
- Disease transmission
Possible types of research that can utilize these datasets:
- Climate Change
- Diversity, equity, and inclusiveness
- Global Supply Chains
- How AI will shape the future
- Social Networks
- Access to Education
- Health policy