We have argued that an indexing mechanism is useful to account for a range of phenomena and may even be the key to cross-modal spatial coordination. The question which then arises is whether it can be realized computationally or neurally in a way that meets at least some of the prima facie constraints on neural realizability -- e.g., it can be computed rapidly using spatially-local information. Partly because of this concern, we have developed a preliminary design and implementation of a parallel network for indexing and tracking, similar to that of Koch & Ullman (1984, 1985), but modified to allow more than one visual token to be indexed and separately tracked (the Koch & Ullman model uses a single hierarchical Winner-Take-All Hopfield net and so only provides one index, which the authors view as corresponding to focal attention).
Our modified network (described in a thesis by Brian Acton, 1993, in Pylyshyn & Eagleson, 1994 and in Pylyshyn, 2000) not only provides a well-motivated architecture for visual indexing, but also makes clear just exactly what the index mechanism does. As shown here, the network receives inputs from a retinotopic map (perhaps from individual retinotopic feature-detector units, or perhaps from the sort of Feature Maps described by Treisman et al) and outputs a 1 if one or more input lines exceeds some threshold of activity and 0 otherwise. When such an output occurs no information is provided on the output side regarding which unit is the active one and what properties that unit has. However the network then allows a "query" line to be pulsed which routes the pulse to precisely the input line that was in fact the most active one. This simple facility allows the system to query a salient visual token without "knowing" (in the sense of having access to the coordinates of) the location or any other properties of that salient token.
For example if it sends a query signal through this network and at the same time sends a signal to global detectors of a certain type (say the global "redness" detector), an output from any of this set of detectors (with thresholds set at 2 inputs), will occur only if the feature detector at the "active" location was of the specified type. Thus this mechanism allows one to ask of a salient object whether it has a certain property, even though there may be no way to explicitly determine where that object is or what other properties it has -- at least not without applying additional operations. Extending this capability to sets of multiple salient tokens can be done by the direct approach of allowing sets of such tokens to be queried in parallel, or by allowing them to be queried in rapid sequence. Both approaches are being pursued since they make slightly different empirical predictions. One of them uses a short-term inhibitory mechanism to prevent several salient tokens from capturing the same output line. A side effect of this approach is a proposal for the source of the Inhibition-Of-Return phenomenon -- whose close relationship to indexing has already been explored in a recent thesis in our laboratory (Sears, 1995).