Perceptual Science Series
Telling the Story of a Scene: from Humans to Computers
Fei Fei Li
From Monday, May 05, 2008 - 11:00am
To Sunday, May 04, 2008 - 12:00am
Princeton University, Computer Science Department
For both humans and machines, the ability to learn and recognize the semantically meaningful contents of the visual world is an essential and important functionality. In this talk, we will examine the topic of natural scene understanding in human psychophysical and physiological experiments as well as in computer vision modeling. I will first present a series of recent human psychophysics studies on natural scene recognition. All these experiments converge to one prominent phenomena of the human visual system: humans are extremely efficient and rapid in capturing the semantic contents of the real-world images. Inspired by these behavioral results, we report a recent fMRI experiment that classifies different types of natural scenes (e.g. beach vs. building vs. forest, etc.) based on the distributed fMRI activity. This is achieved by utilizing a number of pattern recognition algorithms in order to capture the multivariate nature of the complex fMRI data. In the second half of the talk, we introduce a series of our recent computer vision work aimed at a holistic and integrated understanding of natural scenes, namely telling the �what (event/activity), where (scene category) and who (objects) story� of an image. We focus on two main issues: model representation and learning. Our models are based on generative graphics models, where images are represented by local image patches. A Bayesian generative model is used to represent the hierarchy of visual concepts in the image. We will then show how these complex, semantic visual concepts (such as sport events in static photos) can be learned automatically by taking advantage of the noisy but abundant images on the Internet.