Perceptual Science Series
Large-Scale Manifold Learning
Monday, October 27, 2008, 01:00pm - 02:00pm
Research Scientist, Google Research, NY
Manifold learning provides a principled way of extracting low-dimensional nonlinear structure from high-dimensional data. Even though most of the manifold learning techniques need large amount of data to faithfully capture the true underlying manifold, not much work has been done on learning manifolds from a large number of data points. In this talk, I will address the problem of extracting low-dimensional manifold structure given millions of high-dimensional face images. Specifically, I will discuss the computational challenges in nonlinear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges.
Most manifold learning techniques require spectral decomposition of dense matrices. The complexity of such decomposition is O(n^3) where n is the number of data points. For n as large as 18 million, this computation becomes infeasible both in time and space. In this talk, I will analyze two approximate SVD techniques for large dense matrices (Nystrom and Column-sampling), providing the first theoretical and empirical comparison between these techniques. The experiments reveal interesting counter-intuitive behaviors of the two approximations. Next, I will show extensive experiments on learning low-dimensional embeddings for two large face datasets: CMU-PIE (35 thousand faces) and a web dataset (18 million faces).
Sanjiv Kumar received his PhD from The Robotics Institute, Carnegie Mellon
University in 2005. Since then, he has been working at Google Research, NY
as a Research Scientist. His research interests include statistical
learning, computer vision, graphical models and medical imaging.