In the past several years we have been carrying out empirical and theoretical research motivated in part by a theory of visual-indexing (see Pylyshyn, 1988, 1989a for an outline of the theory and Pylyshyn et al 1994 and Pylyshyn 1994 for a summary of some recent results).
The theory (so-called FINST Indexing Theory) consists of a set of hypotheses about a mechanism by which certain salient features or objects in a visual display are indexed (i.e. "FINSTed") so that they can be referred to by subsequent cognitive processes. It is best viewed as a fragment of a theory of the architecture of early vision (in the sense discussed in Pylyshyn, 1990, 1991) which explains one source of information-processing limitation in visual information intake.
According to the theory an early stage in visual perception involves a resource-limited mechanism for the individuation of a small number (4-6) of visual elements or objects. Individuating is more primitive than encoding either the properties of the tokens or their locations in the visual field: It merely entails that the tokens are selected or marked as distinct from one another and their historical continuity is maintained: each maintains its individuality as the "same" token as it changes location or possibly other properties as well. Tokens are individuated by being indexed in the same sense that a data structure in a computer might be indexed: the index serves as a mechanism for accessing the token for subsequent operations. A small number of tokens are selected (presumably by virtue of possessing certain locally-salient properties) and indexed so that: (1) subsequent stages of the visual system are able to reference the indexed tokens -- say for purposes of determining their individual and relational properties (only indexed tokens can be bound to arguments of visual routines and consequently operated on), (2) an index remains attached to its token as the token changes its retinal location or other properties, allowing it to be tracked, qua individual object, (3) indexed tokens can be interrogated (or "pulsed") without the necessity of first locating them through some form of search -- consequently the set of indexed tokens can be separated from the rest of the display without reference to their individual properties, and (4) only indexed tokens can be the targets of motor movements, including eye-movements, since motor programs, like visual routines, also require that their arguments to be bound to tokens. The general idea of the theory is shown schematically in Figure 1 below.
Figure 1. Sketch of the connection between visual indexes and their causal links
The details of this view are discussed in two recent publications: Situating vision in the world (Trends in Cognitive Science) and Visual Indexes, Preconceptual Objects, and Situated Vision (Cognition) – both are in PDF format.