Discussion on the Two Stage tracking model, error recovery, and perceptual grouping

The result that has perhaps attracted the greatest interest is our finding (summarized above) that subjects are able to track up to 5 identical independently-moving targets in parallel. However, we have also shown that tracking performance is a function of the number and type of nontarget elements, thus bringing into question the assumption that indexed tracking is a totally preattentive process. This has led to the development of a two-stage view of the tracking task (McKeever, 1991, Sears, 1991, Sears & Pylyshyn, 1995). The first stage involves the automatic data-driven assignment of indexes to primitive features. If, as we suppose, what happens at this stage is related to what Muller & Findlay (1988) call the automatic component of attention, then it is likely that it is short lasting (see also Nakayama & Mackaben, 1989) and perhaps has a large capacity, though the latter has yet to be demonstrated. The second stage is more limited and involves deliberate attentive processing, and is probably serial in nature. The hypothesis is that in the tracking task the selected elements are visited in a serial manner and their relative-location is encoded in an internal model and updated periodically, providing an approximate "shadow" model of the dynamic display. Because the elements that are encoded and updated in this way are also indexed, no spatial scanning is required for tracking; attention can visit the elements directly using the indexes. However if an error occurs or the system detects the loss of an index, the shadow model can be used as an aid for error-recovery.

This two-stage model can account for the effect of organizational factors (such as those studied by Yantis, 1992) by attributing the effect to the error-recovery processes which takes places in the second stage and uses relative-location information. Both McKeever (1992) and Sears & Pylyshyn (1995) have also used the two-stage model, together with the assumption that subjects can detect when an index is lost in order to initiate a recovery process, to account for otherwise anomalous data. For example, Sears has used such a model to explain why tracking performance (RT to events occurring on the targets) deteriorates when more non-targets are introduced -- contrary to what one would expect from a pure index-tracking view. Sears used a two- response procedure in which Ss are asked both to make a discriminative response to a figural change and at the same time to indicate whether the figural change occurred on a tracked or non-tracked object. When those trials in which Ss had mistaken a non-target for a target are eliminated from the analysis, no deterioration in performance with increasing non-target set size is found. Our proposal is that when Ss detect a loss of an index they attempt to recovered by assuming a substitute target -- either a similar one when their properties differed (as they did in some studies) or a nearby one when they didn't. With a higher density of neighbors the probability of mistaken targets increases and those now become tracked items as though they were targets. We found that performance in detecting changes on both the mistakenly tracked items and on correctly tracked items was identical and independent of the number of non-tracked tokens in the display.

Although provocative, the two-stage tracking model leaves many questions unanswered which we are investigating experimentally. The whole question of the conditions under which indexes are attached and lost is of considerable interest. We have developed general purpose software for running the tracking studies and ported it to a Macintosh platform using the VisionShell software. We have begun to carry out studies to examine a set of issues relating to the following questions:

As it is currently formulated, the two-stage model of tracking with error-recovery assumes that Ss can detect when an index is lost or transferred. This assumption is itself non-obvious and worth examining empirically. One way to test this is to adopt the methodology that we used most recently for assessing tracking ability: requiring Ss to indicate the location of all targets at the end of a trial (the moving elements end up at the 12 locations of points on a clock face). In this way we can assess whether a newly introduced sudden-onset element, or a non-target that came too close to a target is more likely to "capture" an index.

There is already evidence that certain kinds of perceptual grouping of targets facilitates tracking (e.g. Yantis, 1992), although it is not clear whether the advantage occurs at the index-maintenance or the error-recovery stage. We can now ask whether such groupings involving non-indexed (non-target) items also make a difference. For example, when targets come into accidental collinear alignment or in any way become perceptually grouped with non-targets -- forming spatial "chunks", such as those proposed by Mahoney & Ullman (1988) -- does this affect tracking performance by attracting indexes away from other targets? This has implications for the question of whether the index assignment process is fully pre-attentive and data-driven in a spatially-local manner, or whether perceptual phenomena of a more general nature can affect that stage.

Additional explorations of "visual routines"

There has been research recently concerning certain visual detection tasks that appear to be extremely simple for the human visual system, such as detection of the insidedness relation or the "on the same contour" property studied extensively by Jolicoeur (1988). Index theory makes specific predictions concerning the way in which reaction time should increase as the number of points that need to be indexed increases: It predicts that if the features are ones that are preattentively computed, RT should increase very little as the number of points increases up to the limit of the number of available indexes. Thus it predicts that judgment of, say, the collinearity of a set of points should not increase as the number of points increases (at least up to about 6). Pilot studies suggest that this is indeed the case. However, we have only tested this in the case of linearity, as opposed to other shape properties, and only where there are no non- targets. We have also studied the detection of the Inside-Outside relation, examining the effect of order of presentation of items (Wright & Dawson, 1987). This line of investigation would help us sort out the question of whether multiple-argument visual predicates can be evaluated in parallel once the arguments are all bound to indexes -- a central idea in indexing theory.