Scalable Framework for Statistical Inference on Big Multimodal Data via Sketching and Concentration

Vladimir Braverman, PhD, Assistant Professor, Department of Computer Science, Whiting School of Engineering, Johns Hopkins University

Our goal is to develop scalable frameworks for statistical inference of multimodal high dimensional, large and complex data. For example, community detection for large graphs can often be approximated on a sketched version of the given graph, in which the inherent dimension is much smaller than the number of vertices. We propose to develop efficient algorithmic tools that, collectively, we call Sketching and Measure Concentration for Data Analysis (SMCDA). Informally, SMCDA presents solutions that scale polylogarithmically in data size by discarding the vast majority of data while approximately preserving the information of interest.