r e s e a r c h


  

publications


  

classes


  

home
  

Computational and psychophysical investigations of mid-level vision -- Representation of visual shape -- Interpolation and extrapolation of contour and surface structure -- Computation of layered surface under partial occlusion and transparency


The ease of our perceptual experience greatly belies the complexity of its underlying mechanisms. This complexity is, however, made evident by consideration of the computational task that the visual system faces:

  • Inputs to the visual brain consist of photoreceptor responses to the pattern of light projected onto the retinas. These retinal arrays simply encode how light or dark each retinal location is, at any given time. They are thus (a) two-dimensional, (b) unstructured (e.g., there is no 'grouping' of points based on whether or not they correspond to the same object or surface), and (c) always changing (e.g., every time the observer's eyes, head, or body moves, as different portions of objects become occluded, or as lighting conditions change).
  • By contrast, the 'output' of the visual system—in effect, the visual world we perceive and interact with—is (a) three-dimensional (solid objects in a 3D environment), (b) highly structured (scenes organized into objects and surfaces, objects organized into component parts and sub-surfaces), and (c) remarkably constant (objects tend to maintain their perceived shapes when viewed from different vantage points, or when they become partly occluded; surfaces tend to maintain their lightness and color when viewed under different lighting conditions).

The problem of visual perception is, in essence, to understand how biological visual systems are able to transform 2D, unstructured, and ever-changing arrays of light into structured, meaningful, and stable representations of solid objects in 3D environments—and, indeed, how any computational system could achieve such a seemingly impossible transformation.

Specific problems involved in this transformation include:

  • segmentation of objects from the background
  • computation of surface structure under partial occlusion (this requires completing the hidden portions of objects and surfaces)
  • segmentation of objects and surfaces into smaller, semi-independent, units – in order to allow for more efficient shape descriptions
  • segmentation of image intensity into multiple surfaces along a single line of sight (for example, when one sees one surface through another, partially transmissive, one)

These visual problems are often classified under the umbrella-term of mid-level vision – indicating that the surface representations involved lie somewhere between the 'low-level' processing that occurs on 2D image representations, and the 'high-level' object representations that include semantic and lexical knowledge.