Visual attention, tracking, and indexing

The research we do in this laboratory seeks to understand the basic processes that are involved when people focus their attention on some visual scene, and particularly when they must focus attention on several things at once. The methods we have developed over the past decade include a novel experimental technique, called multiple object tracking (MOT), that has revealed some unexpected findings concerning the processes involved when people track moving things. In MOT, people are shown about 8 randomly-moving identical elements on a video screen. About 4 of them are made to be briefly distinct, usually by flashing them on and off. Then the task is to keep track of these 4 items, that are now identical to the rest of the 8 items on the screen, as they move randomly in random but enmeshed paths. At the end of a short period of time the items stop moving and you have to select which ones had been the designated “targets”. Although this task sounds very difficult, very few students fail to do it: most find it easy. The question is: How do they do it? Given what is known about what is called “focal attention” how is it possible to keep track of 4 (and maybe 5) such moving items? Most people believe that we can only really attend to one thing at a time.

The question this raises has turned out to have deep and far reaching implications because we do not know exactly how it is done and because there is a lot about the task that is surprising. For example, people keep right on tracking even when the targets disappear behind a visual “post” and come out the other side. And if they come out the other side with a different shape or color, people don’t even notice and it has no bearing on how well they do the tracking task. Even as they track the four items they soon lose track of which one is which (if they had labels on them at the beginning, these are soon forgotten). It also seems that notwithstanding how hard this task feels, you can easily do a second task at the same time (so long as you don’t have to make a response while tracking). As you might expect, you find it easier to see a small dot that is flashed on a target than on one of the "nontargets" that you are ignoring. But if it is flickered somewhere between two targets it does not help. And the moving items that you are ignoring are not entirely ignored because you find it harder to see a small dot appear on those items than elsewhere (we say they are “inhibited”). But there are even more surprises: You can track “things” even if the things are superimposed and not moving around in space, but just changing their appearance in a continuous manner. So if you look at a pattern of transparent bars superimposed on another pattern of colored bars, and if both are changing smoothly in such things as the size of the bars and their angle and color, you can track one of these transparent “objects” as distinct from the other and tell which one was which at the end of the trial. You can track things that move in "property space" as opposed to real space!

This work (and other kinds of experiments not mentioned) has led to a theory called the Visual Index Theory (and known affectionately to its friends as FINST theory, for historical reasons that are not worth mentioning). FINST theory has some pretty radical things to say about how we connect our visual perceptions with things in the world, but it would take a long time to draw this out (and it has been done better in publications, including a fairly accessible one in the magazine Trends in Cognitive Science. The article can be accessed at: TICS-reprint.pdf.

But what makes these discoveries even more fascinating is the way they mesh with some fairly old and deep philosophical questions concerning what an individual, a single "thing" or "object," is and how we can pick out and “refer” to individual things even without knowing anything about them (including where they are located). This form of reference is called “deictic” or “indexical” or "demonstrative.” The latter comes from the name of certain pronouns (called “demonstratives”) such as “this” or “that” (that are also of interest to linguists). Even people working on robots are interested in this kind of reference since a robot should be able to function without knowing all about the environment it is in – it should be able to think such thoughts as “‘this’ thing is straight ahead of me so I need to go around it” (where the demonstrative “this” carries no information about what it is or even where it is, except that it is “in front of me” which also carries no information outside of the context where the thought occurred)!

In that respect this research program is an ideal illustration of the cross-disciplinary nature of research in cognitive science.