In the past several years we have been carrying
out empirical and theoretical research motivated in part by
a theory of visual-indexing (see Pylyshyn, 1988, 1989a for
an outline of the theory and Pylyshyn et al 1994 and Pylyshyn
1994 for a summary of some recent results).
The theory (so-called FINST Indexing Theory)
consists of a set of hypotheses about a mechanism by which
certain salient features or objects in a visual display are
indexed (i.e. "FINSTed") so that they can be referred
to by subsequent cognitive processes. It is best viewed as
a fragment of a theory of the architecture of early vision
(in the sense discussed in Pylyshyn, 1990, 1991) which explains
one source of information-processing limitation in visual
information intake.
According to the theory an early stage in
visual perception involves a resource-limited mechanism for
the individuation of a small number (4-6) of visual elements
or objects. Individuating is more primitive than encoding
either the properties of the tokens or their locations in
the visual field: It merely entails that the tokens are selected
or marked as distinct from one another and their historical
continuity is maintained: each maintains its individuality
as the "same" token as it changes location or possibly
other properties as well. Tokens are individuated by being
indexed in the same sense that a data structure in a computer
might be indexed: the index serves as a mechanism for accessing
the token for subsequent operations. A small number of tokens
are selected (presumably by virtue of possessing certain locally-salient
properties) and indexed so that: (1) subsequent stages of
the visual system are able to reference the indexed tokens
-- say for purposes of determining their individual and relational
properties (only indexed tokens can be bound to arguments
of visual routines and consequently operated on), (2) an index
remains attached to its token as the token changes its retinal
location or other properties, allowing it to be tracked, qua
individual object, (3) indexed tokens can be interrogated
(or "pulsed") without the necessity of first locating
them through some form of search -- consequently the set of
indexed tokens can be separated from the rest of the display
without reference to their individual properties, and (4)
only indexed tokens can be the targets of motor movements,
including eye-movements, since motor programs, like visual
routines, also require that their arguments to be bound to
tokens. The general idea of the theory is shown schematically
in Figure 1 below.