Pylyshyn’s reply to commentators

Stalking the elusive mental image screen*

Zenon W. Pylyshyn

Rutgers Center for Cognitive Science

Rutgers University, New Brunswick, NJ

Abstract1

1.                Introduction  2

2.                Is imagery primarily a problem for neuroscience to solve?  2

2.1                The involvement of the visual brain in mental imagery  3

3.                Depiction and the “Tootell Display”  4

4.                The “null hypothesis” and its detractors  7

4.1                Why should we assume a symbolic system as the “null hypothesis”?  7

4.2                How does a symbol system deal with various particular phenomena?  7

4.3                Do we need analogs as well as symbol structures?. 8

4.4                Formats and representations in different modalities  9

4.5                Are there other options besides pictures and symbols?  11

5.                Where does the spatiality of images come from, if not the display?  12

5.1                Space, motion and motor control in visual imagery  12

5.2                Superimposing images on the visual world: Attention and inhibition  13

5.3                Visual neglect and spatial orientation in imagery  14

6.                Tacit knowledge and cognitive penetrability again  16

7.                Second-order and “structural” isomorphism.. 17

8.                Other topics: Visual expectations, thoughts and phenomenology  18

8.1                Visual expectations or visual images?  18

8.2                Distinguishing images from thoughts.18

8.3                The role of phenomenology.19

9.                Conclusion  19

References. 20

 

Abstract

After 30 years of the current “imagery debate,” it appears far from resolved, even though there seems to be a growing acceptance that a cortical display cannot be identified directly with the experienced mental image nor can it account for the experimental findings on imagery, at least not without additional ad hoc assumptions.  The commentaries on the target article range from the annoyed to the supportive, with a surprising number of the latter.  In this response I attempt to correct some misreadings of the target article and discuss some ideas and evidence introduced by commentators, much of which I found helpful, even though they do not alter my basic thesis.   I also further develop the idea that the spatial character of images may come from the way they are connected to our immediate or immediately-recalled environment (by attention or by visual indexes) and towards which we may orient while we are imaging, thus leaving the alleged spatial properties of images outside the head and freeing image-representations from having to be displayed on any surface.

1.     Introduction

Nearly thirty years after the first round of the current “imagery debate” (Pylyshyn, 1973) it appears that we may have made a small amount of progress.  Although the views expressed by commentators are as varied as those of the research communities from which they come, the large number of at least partially supporting commentaries, from many disciplines, came as something of a surprise since the picture theory is still the dominant view.  I had assumed that the target article would be widely reviled since it asks that one put aside one’s intuitions about what is in the mind/brain.  This appears to be an exercise that few theorists care to take up, for reasons that Slezak and Gold clearly lay out: It is often called the “intentional fallacy” and scientific arguments over it go back to the beginning of the Renaissance.

Because Kosslyn’s views about mental imagery are widely accepted, and because his theory is the most explicit statement of the view I am criticizing, I will devote disproportional space to the long commentary by Kosslyn, Thompson & Ganis (hereafter KTG).  I appreciate the well-organized set of responses by KTG that clarify their stand on a number of the issues on which Kosslyn and I have disagreed over the years.  I especially appreciate the clear statement in the latter part of the introductory paragraph (beginning “The issue is not ...”) that seems to represent at least a softening of Kosslyn’s earlier position.  The position they outline in their commentary also diverges in a number of other respects from the canonical view presented in (Kosslyn, 1994).  For example KTG no longer appear to claim that what we experience in mental imagery corresponds to the content of the depictive display.  They now maintain (item 4 of their section “Imagery and perception”) that the experience of imagery does not originate from the depictive representation (which they claim is in V1), but from the activation of memory representations in the inferior temporal lobe.   This is a puzzling revision since if what we experience is not what is in the depictive display but elsewhere in memory, why do they appeal to properties of the display to explain what happens when subjects do certain things to the image that they experience (e.g., examine or scan it)?   In spite of this and other apparent revisions of the Kosslyn position, I will argue that no amount of tuning will save the depiction theory.

2.     Is imagery primarily a problem for neuroscience to solve?

I said in the target article that disagreements about the nature of mental imagery are more than a question of different interpretations of data.  As Slezak’s historical perspective serves to highlight (and which Dennett, 1991;  and Thomas, in press, have also documented) the disagreements rest on much deeper preconceptions and illusions – which is why they arouse such passionate reactions, and why the long "imagery debate" does not appear to have resolved the basic disagreements.  Selzak points out that even though many people recognize that a representation of a cat does not have to be furry, nonetheless “it is telling that such views must be repeatedly refuted throughout the history of speculation about the mind”. 

Kosslyn, Thompson & Ganis (as well as Polimeni & Schwartz) feel that the reason the “debate” was not settled earlier is that behavioral data are incapable of resolving the question of the nature of representations underlying imagery, so we have had to turn to the findings of neuroscience.  KTG cite Anderson's, 1978, indeterminism thesis to support this view – although it is well known that data, no matter how much of it there is nor what form it takes, always underdetermine theory.[1]  However interesting and important the neuroscience findings are, it still remains the case that the problem most of us are trying to solve is not a neurophysiological one but a psychological one:  anyone who studies mental imagery wants to understand the nature of a particular phenomenon that arises in behavior and in experience – and that means understanding its formal and information-processing characteristics as well as its instantiation in the brain.  The search for neural correlates or for neural mechanisms takes it for granted that we know what they are correlates of, or what functions the mechanisms are computing, and this depends on our understanding of the phenomena and in having at least some idea – preferably one that is not obviously wrong – of how it works.   It also depends on certain assumptions about how function maps on to structure; an assumption that is very often questionable, as it is in the study of vision and imagery (see the cautionary note in Young, 2000).  The idea that only neuroscience can provide the answers we seek is sheer prejudice, although a widely shared prejudice that finds expression in the commentaries by Kosslyn, Thompson & Ganis, Polimeni & Shwartz, de Haan & Aleman, and Toth.

Several commentators seem to feel that even raising the issue of the nature of mental images at this time, given how much has been written about it, is somehow in poor taste.  For example, De Haan & Aleman claim that we have “gone beyond” the debate and that “recent research allows us to formulate new theoretical ideas concerning how we are able to mentally image the outside world.”  But they don’t say what these new theoretical ideas are, and I very much suspect that they are the very same old theoretical ideas that were criticized in the target article.  Similarly, Toth takes the occasion of his commentary to offer a variety of opinions, but provides no arguments about the topic at hand, assuming instead that the issue has been resolved by neuroscience findings, which, he says, “symbolic theories seem to eschew, rather than embrace”.   It would have been useful to illuminate this claim with a few examples of such findings so readers could see for themselves whether they were germane, as I tried to do in the target article.  Progress on conceptually difficult problems, like mental imagery, is unlikely to be furthered by offering parochial and dogmatic assertions about how science ought to precede (and even about what individual scientist’s motives might be).

2.1  The involvement of the visual brain in mental imagery

I tried to make it clear in the target article that the disagreements are about what, if anything, is special about mental images that distinguishes them from other forms of cognitive representation.  The argument is not about whether certain parts of the brain are involved in both vision and mental imagery.  The notion that some of the same neural circuits are activated in both cases is raised by a number of commentators, including Grossberg, Burgess, Bartolomeo & Chokron, Chatterjee, de Haan & Aleman, Olivetti Belardinelli & Di Matteo, Pani, and Polimeni  & Schwartz.  But the assumption that the vision-like experience of imagery derives from the deployment of some of the same neural structures as are activated in corresponding episodes of visual perception is not in dispute.  What is in dispute is the further assumption that both vision and imagery use a special form of representation, one that has become known as “depictive.”   But even if one does not make that assumption, the overlap between vision and imagery is a platitude unless one has at least the beginnings of a model of what they have in common other than the untenable assumption that they both examine some inner display.  For this reason I welcome attempts, such as Grossberg’s, to develop such a model, in which top-down and bottom up information play off to produce visual experiences on the one hand, and hallucinations or visual images on the other, and which also account for some of their major differences.  On the other hand it must be admitted that this is only a very small step towards understanding either the experience of imagery or its information-processing properties, and is moreover no help at all in explaining the vast array of behavioral imagery phenomena such as documented in (Kosslyn, 1980)  (although, as I suggest in sections 3 and 4 of the target article and in section 6 of this response, a great many of these phenomena are unlikely to be explained in terms of any imagery-specific mechanisms, but rather in terms of general mechanisms such as inference from tacit knowledge).

3.     Depiction and the “Tootell Display”

We see in the kind of neuroscience evidence that is sought and in the way it is interpreted, that investigators like Kosslyn, Thompson & Ganis  (KTG) are seeking evidence for an old intuitively-appealing idea of what goes on when we “see” with our “mind’s eye”, an idea I called the “picture theory,” that goes back to Descartes and was revived in modern times by Kossyn, Shepard, Paivio and others.  This is clear from the fact that KTG take as their central paradigm the finding of (Tootell, Silverman, Switkes, & de Valois, 1982) and similar more recent findings cited in their commentary; viz., that in visual perception and mental imagery there is a literal 2D layout of neural activity in visual cortex that “resembles” (Kosslyn’s term) what it represents.  Moreover, even more critically, both vision and imagery exploit this spatial layout and rely on the fact that a particular structure-preserving mapping defines the content of the image or percept, in a way that would not be true of a symbolic form of representation (even though the latter would likely also be encoded spatially in the brain).  This is a strong thesis with considerable intuitive appeal.  It does not do justice to Kosslyn’s highly focused research program to back off from this thesis each time a counterexample is raised, as they do when they admit that images are not exactly like any possible picture (e.g., they contain “predigested information” – again Kosslyn’s term), that they are laid out in a “functional space,” that the process of inspecting images is not really like the process of visual perception or that even though the depictive display explains imagery phenomena its contents are not what we experience when we image.  The way that images are supposed to be like pictures is quite clear, not only in the Kosslyn text I quoted in the target article, but also in the points made in the KTG commentary: Images and pictures are both laid out in space in a way that “preserves metrical properties” such as “large vs small,” “near vs far” as well as other geometrical properties (e.g., “square” vs “circular”).  And it had better be the case that the space they are laid out in is literal space in the brain, and not some softened “functional” version; otherwise none of the predictions that involve size, distance, relative location or shape would follow from intrinsic properties of the image (and if they do not follow from intrinsic properties, then the depictive theory does not differ from the “null hypothesis”).  

Many empirical arguments have been cited in support of the view that images are special insofar as they are “depictive,” or picture-like.  But as I argued in the target article (and documented with quotations in Pylyshyn, 1981), the appeal to “depictive representations” in the literature has largely been a shell game in which picture theorists get all the predictive value out of the literal 2D assumption and then immediately repudiate that assumption and retreat to a less specific version of the proposal (“its how the process accesses it that matters”) which, alas, no longer explains what it was supposed to explain, or at least has no advantage over the null hypothesis.[2]  To avoid misunderstandings about how pictorial the entity hypothesized by KTG and others really is (and whether it is colored or 3D) let us call the sort of neural activity map found by Tootell et al., a “Tootell Display” or “Tootell Screen” instead of a “picture”.  (I have not attempted a mathematical definition of “picture” because the picture-theorists do not have anything that definite in mind.   Polimeni  & Schwartz’s definition as a “locally regular feature map” will do since it suggests that the display might be a wildly, but continuously distorted version of the proximal stimulus.  A more technical way of putting it is that a picture can be any continuous mapping of the proximal stimulus that preserves local topology – such as a homeomorphism or locally affine transformation.  This definition of “picture” allows for the “cortical magnification factor” that KTG raise in their commentary.)

While there is neuroscience evidence for a Tootell Display, such a display cannot explain the facts of imagery that are driving the imagery research program (e.g., mental scanning, the effects of different image "size," mental rotation, and the other lines of research I discussed in the target article) for at least the following 2 reasons.

(1) In a large number of cases the experimental phenomena are not attributable to properties of the mental image at all, but to people's ability to simulate what they believe they would see if they were to witness certain events happen (e.g., if they were to scan their attention from place to place on a map).  In other words, the facts in question do not constrain brain structures at all:  any structures that could recall past episodes, draw inferences, generate sequences of thoughts, and estimate time intervals such as time-to-collision, would do.  The purpose of introducing the "null hypothesis" is not primarily to claim that images must be symbolic, but that many phenomena that imagery theorists are concerned with can be accounted for by any theory adequate to carry out reasoning, including a symbolic one that implements a "language of thought" (the reason I take the latter to be the “default” form is treated briefly in section 4.1 of this reply).   One reason that proposals, such as Gottschling’s, or Pani’s, that images are analog, would not work in general is just that the phenomena are very often not due to any properties of the image.  This is not to say that there are no analog representations of particular magnitudes in the brain – see section 4.3 of this reply.

 (2) The Tootell Display proposal is not compatible with what we know about either mental imagery or vision.  This was the point of the eight reasons I gave in section 7.2 of the target article: they are just a sampling of reasons why even if a detailed 2D pattern of activity mirroring the experienced image were found in the cortex during imagery, this could not be what is responsible for the empirical phenomena of mental imagery (by which I mean the kind of phenomena that have been extensively documented in Kosslyn, 1980; 1994).  If activity corresponding to the Tootell Display were found during episodes of mental imagery this would indeed be a very interesting finding, but contrary to general belief we would not be the least bit closer to explaining the many phenomena of mental imagery than we were before:  the Tootell Display is likely to have as little a role in theories of mental imagery as it has in fact had in theories of visual processing.  In his commentary, Dennett has some characteristically picturesque examples for putting the same point.   Dennett’s point is that, when viewed in a certain way, many interesting patterns might be observed in the brain without those patterns being exploited by the brain to compute certain particular behavioral phenomena.  The question of whether the patterns are exploited is also independent of the question of whether certain brain regions are involved (causally or incidentally) in the activity of imaging, so the efficacy of rTMS is not directly relevant to this question (pace KTG and De Haan & Aleman).  It is also independent of exactly how the spatial distribution of the display is implemented in detail in the brain: Whether this is done on a cell-by-cell basis or whether it uses vector encoding, as suggested by Sokolov, the conceptual issue remains the same.

Ingle underscores some of the inadequacies of the Tootell Display that I had mentioned and provides some additional suggestions.  He reminds us that the contents of the Tootell Display change several times each second so its contents do not actually correspond to how we see the world.  The important point that is often ignored is that what we experience, in both vision and imagery, is a spatial configuration that was never present on the retina or on the Tootell Display, so such a display could not correspond to the contents of either vision or imagery.  The obvious way to deal with this devastating criticism is to assume that mental imagery (and visual experience) is associated with some other stable panoramic display, as oppose to the Tootell Display, onto which the contents of the Tootell Display are transferred in registration with eye movements.  But as Ingle and others have pointed out, there is no evidence for that sort of display in the brain (and there is plenty of evidence that such a display is never constructed in the course of saccadic integration, see O'Regan, 1992; O'Regan & Noë, 2002).  In fact, as Ingle also points out, visual information that forms the basis of object recognition is carried by a distinct pathway (the ventral system) and converges with information about location, likely in the coordinates of motor control, that is carried by another pathway (the dorsal system) and these two systems do not merge (as they would have to if they corresponded to the contents of visual or imagery experiences) until perhaps the prefrontal cortex, where there is no evidence for any sort of topographical organization.  In their commentary KTG wisely disavow such a stable panoramic display (see point 6 in their section “Imagery and Perception”).  Interestingly, (Kosslyn, 1994) clearly embraced it (see chapter 4-7, especially p. 85-94) and argued that since such a display is used by vision it is natural to assume it is also used by imagery.

Van der Velde & Kamps have also focussed on the merging of the identity and location information from the “two visual systems” and provide a sketch, using a “blackboard architecture” of how these two types of information might be coordinated – a major problem in models of visual perception.  While the term “blackboard” might suggest a spatial layout, none of the appeals to such an architecture in AI or vision have had that implication.  The Van der Velde & Kamps proposal itself only claims that retinotopically organized information is transformed continuously into either the identity of each object or into “location representations related to movements of different body parts.”   So far this is compatible with the null hypothesis.  But they also suggest that the blackboard may be located in visual cortext (which might be taken as support for the depictive view).   But, as Ingle points out, the earliest place where ventral and dorsal visual systems converge is roughly in prefrontal cortex, where there is no evidence of topographical organization.

4.     The “null hypothesis” and its detractors

4.1  Why should we assume a symbolic system as the “null hypothesis”?

The purpose behind my Null Hypothesis proposal seems to have been widely misunderstood, leading to some arguments that are not germane.  Thomas questions why I consider the symbolic alternative to be the “null” or default case, and perhaps I did not provide enough discussion of this point.  Millar says that the null hypothesis is a “formal description of problem solving tasks” that “makes no predictions about how people actually go about solving different types of problems.”  The null hypothesis is not a formal description of anything; it is a proposal, laid on the table largely as a foil, about what form our representations might take when they are experienced as visual images.  The reason why I treat this particular option as the “default” is that we know something about it (since it includes all the various formal languages and symbolic calculi for which we have a formal semantics to tell us how the meanings of complex structures are composed from the meanings of their parts) and because it meets certain minimal requirements that must be met by any system adequate for the representation of knowledge and for reasoning.  In particular we know that a recursive system of symbols has properties such as productivity, compositionality, and systematicity and that these are essential for reasoning and knowledge representation (for a detailed argument on this point see, Fodor & Pylyshyn, 1988).    Even though the format used by a circumscribed part of the system need not meet all these requirements, it will still have to face the problem of providing a seamless interface between its form and the form used in reasoning, since both vision and imagery do play a role in reasoning (and translating between a picture and beliefs is what the entire brain does, hence the ever-present danger of a homunculus regress that KTG like to pooh pooh).[3]   I take considerations such as these to present a serious challenge to those who advocate some proposal other than the symbolic one.  Saying that this proposal “is odd in the context of evolutionary biology” (as Millar does) is a strange way to meet this challenge.  It buys into the impression fostered by some connectionists (and also discussed in Fodor & Pylyshyn, 1988) that if it does not look like a nervous system it must be biologically implausible.

4.2  How does a symbol system deal with various particular phenomena?

 Many commentators presented interesting imagery findings that they claim would be difficult for a symbolic system to accommodate (e.g., Arterberry, Craver-Lemley and Reeves, Bartolomeo & Chokron, Burgess, Chatterjee, and to a lesser extent Jüttner & Renstchler).  But why do they think that?  Is it because some of the findings imply the processing/storage of metrical information?  Is it because the operations on the representations are globally “spatial”, such as rotating something or viewing it from another viewpoint?  These examples are often presented in the tone of voice “What do you have to say about that?”  Well in most cases I have nothing to say about them except that they are not grounds for favoring the alternative: No picture theory could even begin to address them without a collection of ad hoc stipulations.  And with the benefit of such stipulations any theory, including a symbolic calculus could do equally well.    It is worth reiterating that what makes certain operations (e.g. rotation) appear natural for an pictorial representation is that it so readily invites the intentional fallacy that Gold discusses: It is easy to forget that what is natural to rotate is the (solid, rigid, physical) thing that is imagined, not the mental representation nor its brain encoding.  It is definitely not natural to rotate a Tootell Display (or its contents) or to examine it from a different perspective, an operation that Bartolomeo & Chokron felt was unnatural in a symbolic system.  Some analog operations do make sense (e.g., rotation of a dihedral vertex through a small angle in 3D, as postulated by Marr & Nishihara, 1976), and I look forward to the development of a theory that incorporates them, but it is unlikely that it will contain a depictive display (Marr & Nishihara’s SPASAR mechanism computes an analog operation on a symbolic representation).

4.3  Do we need analogs as well as symbol structures?

Despite the above reasons for putting forth a symbolic form as a default proposal, there may be good and sufficient reasons for rejecting this form of representation for images and instead to adopt something more intuitive, like a depictive form.  The point that I argued in detail in the target article (for the devil is in the details) is that the known experimental findings do not provide such reasons.

This is not to suggest that there are no important unresolved issues in the symbolic option.  One that may be at the heart of many of the examples commentators have raised is this: discrete symbol systems do not seem to have a generally satisfactory way of dealing with the representation and manipulation of real-valued magnitudes.  They do, of course, have the venerable numeral systems – even systems with arbitrarily expandable precision, such as used in the Dewey decimal system – but these may not be completely satisfactory for such purposes as, say, controlling biomotor systems, or perhaps even for accounting for such phenomena as the “symbolic distance effect” cited by Petrusic & Baranski, or various abilities that implicate a metrical form of representation (as, for example, in mathematical reasoning, see Dehaene, Dehaene-Lambertz & Cohen, 1998; Gallistel & Gelman, 2000).   Polimeni & Schwartz make the not unreasonable suggestion that a hybrid analog-symbolic system might be required to model vision and other cognitive functions, although whether they do or not remains an open empirical question.

Notwithstanding the various proposed roles for analogs in mental representations, there are many concerns with the proposal that image representations involve an analog component, not the least of which is that there is a question about the interpretation of much of the data that lead investigators to that assumption.  The considerations discussed in connection with the tacit knowledge proposal (sections 3 and 4 of the target article as well as section 6 of this response) is one sort of worry (viz., that these data may reflect something other than the form of the representation), and various problems with the empirical demonstrations, discussed by Petrusic & Baranski, is another. 

Yet another reason given for favoring analogs comes from the way we experience images and percepts. For example, Pani worries about why our experience of both vision and imagery has the character of continuity and suggests that the imagery system may use “tokens borrowed from the neural mapping of the visual world…[so that] the visuospatial properties they do contain will lead people to report them as experiences in a mental world with an analog character.”   The assumption that some of the same neural structures are involved in imagery and vision does not entail that we need what Pani calls a “dense mapping” from the world to structures in the brain to account for the experience of dense vistas, any more than we need an analog system to talk about dense layouts (I just did so in this sentence!).  To ascribe density to these structures, on the grounds that we experience dense visual regions, is to make the same mistake as to ascribe color, shape or size to the neural structures because we experience color, shape and size in our images; it is the insidious intentional fallacy.

In spite of these concerns, I agree that for many purposes an analog representation system (in which a system of physical magnitudes in the represented domain maps to a different system of physical magnitudes in the representing domain) might be required.  Gottschling is right to say that despite the indefensible matrix-type functional space, I have not excluded some interpretation of “functional space” that depends on a real analog of space (but see Note 6 of the target article).  But caveat lector; there is more to the notion of an analog representation than meets the eye.  Take, for example, the Polimeni & Schwartz argument for the efficiency of representing time by time and space by space, as opposed to using “expensive” symbolic codes.  To reach this conclusion a particular cost-accounting scheme had to be assumed, which could turn out to be unjustified.  For instance, (Minsky & Papert, 1971) discuss an example of what looks at first glance like an efficient analog computing mechanisms, but in which the efficiency turns out to be illusory when precision and physical constraints are taken into account.[4]  Furthermore, one should not be fooled into thinking that it is even clear what it is for a system to be analog – it is easy to get into the same difficulties in thinking about analogs as one does in thinking about “functional space”, and for similar reasons (see Block, 1972; Lewis, 1971; Pylyshyn, 1984).

4.4  Formats and representations in different modalities

Intons-Peterson challenges my null hypothesis on the grounds that it lacks criteria for when something is the “same form of representation” as that used for general reasoning.  And so it does; this was not intended as an operational definition, but a challenge to theories.  When a worked-out proposal is available the criteria will become clear.  For example, if we were to take the predicate calculus as the proposed form of representation for general reasoning, and the (Levesque & Brachman, 1985) proposed “vivid representation” as the form for mental imagery, the criterion for being “the same type of representation” would be met, since the latter is just a subset of the former (the vocabulary of nonlogical terms would, of course, be domain specific).  In general, sameness can only be assessed relative to a theory, so Intons-Peterson is correct in saying that this is an underspecified proposal.  She is also quite right to point out that there are forms of imagery other than visual and that they could provide further evidence for my thesis.  The work I am familiar with in other modalities does point to the importance of amodal space, as opposed to visual space, as an organizer of cognitive representation (a point that was also made by Jüttner  & Rentschler in their commentary).  Clearly this is a direction that merits further research.

Olivetti Belardinelle & Di Matteo make a similar point regarding the advisability of extending imagery research into other modalities.  In their review of a rather sparse literature on imagery in different modalities, these commentators found evidence for both modality-specific and amodal components to imagery.  Moreover, they found that during imagery in the seven sensory modalities they surveyed, brain areas activated were generally specific to the modality in question, yet in no case were they the primary cortical areas associated with perception in that modality.  While they speculate that the amodal component might be propositional, they conclude that the modality-specific component is likely “concrete/analogical”.  But this is just the conclusion that I have been suggesting is unwarranted.  So long as we do not know how brain activity maps onto modal experience, and do not have evidence that information in different modalities is different in format, as opposed to content, the modal-amodal distinction does not show that we represent the two types of information differently; as I suggested, again in the spirit of a null hypothesis foil, they could be the same except for their content (what they are about).

The issue of different modalities raises the question of whether what we call spatial imagery, in the fullest sense of the word, arises in the blind.  Both Chatterjee and de Haan argue that totally cortically blind people have imagery in this strong sense, though Millar disputes this, at least for the congenitally blind, citing informal examples of congenitally blind people who only report sketchy images in modalities that (judging by her examples) do not implicate spatial imagery.  Of course without the experience of vision, blind people might not use color terms in exactly the same way as sighted people do.  But the preponderance of evidence seems to show that the blind (including congenitally blind) have an extremely well developed sense of space and shape which they claim to experience in the form of images, and they also show the same effects of scanning, mental rotation and other signature imagery phenomena as do sighted people (though Millar claims that they are not as efficient at it).  Moreover, blind children learn to use spatial terms to refer to space in very nearly the same way as sighted children (Landau & Gleitman, 1985).  This suggests that sight is not necessary for most of the spatial phenomena that have been studied by imagery researchers, and therefore neither is the vision-specific Tootell Display (unless we are to assume that audition, olfaction, and the tactile and gustatory sense are also represented in a two-dimensional depictive display).  Certainly Jüttner  & Rentschler’s work shows that a cross-modal representation of object shapes is the rule rather than the exception, and thus that images of objects are amodal (or at least not solely visual), thus casting further doubt on the inherently visual depictive theory of imagery.

Jüttner  & Rentschler, and others feel that the null hypothesis is also committed to an amodal conception of perceptual representation.  But one of the principal differences among modalities is that they are about different things and require different concepts.  For example, vision concerns visual properties like color or luninance, audition concerns auditory properties such as pitch or loudness, the tactile sense concerns properties such as smooth or sharp, and so on.  Each modality has its own set of modality-specific concepts.  Beyond those is a large array of modality-free concepts such as those referring to spatial properties, which apply to most modalities.  A symbol system can prima-facie accommodate such facts the way that language does, by simply using different terms when dealing with different modalities.  The point is that modality is, at least to a first approximation, about content and not about format so there is no need to hypothesize a different form of representation for each modality.  Whether this makes symbolic systems amodal is thus mostly a matter of terminology.

Knauff & Schlieder also object to my appeal to “general reasoning,” though not on the grounds of modality specificity, but on the reasonable assumption that different strategies are utilized in different contexts.  But Knauff & Schlieder also claim that all computational constraints are extrinsic since they reside in the interaction between data structures and processes.  This just shows how familiarity with one form of architecture (the fetch-execute cycle on a location-addressable storage) prevents one from seeing that architecture and process are distinguishable, even though different architectures can be emulated in a computer (see Pylyshyn, 1984, for an extended discussion of this very issue).  The distinction between intrinsic and extrinsic constraints is the distinction between architecture and process and this distinction is critical for information processing models of psychological processes.   The change in terminology involved in going from mental images to mental models inherits all the same questions and issues discussed in the target article:  Are the spatial relations among objects in the model sustained by the architecture or are they a consequence of extrinsic assumptions?  If the latter, are the assumptions made because the real intention is to appeal to some sort of literal spatial layout in the head, or are they made because they represent tacit knowledge of space that the reasoning agent possesses, which could change in an instant with new information?  While the language of mental models may seem less egregious than the language of pictures (since, as the authors note, it primarily concerns relative spatial location rather than a whole range of visual properties), and indeed mental models have sometimes avoided some of the pitfalls of mental imagery theorizing, they often get their explanatory power from the very same source; from reification of spatial location in thought.  In some of their discussion, Knauff & Schlieder appear to accept my points, arguing that what is special about the use of mental models is that it involves reasoning about space, rather than using some spatial medium to carry it out. What a non question-begging theory of “mental models” would have to assume is an important area of investigation; for indeed there is something to the distinction between reasoning by rule and reasoning by model which is waiting to be spelled out (as I suggest in Section 8 of the target article). 

4.5  Are there other options besides pictures and symbols? 

 Commentator have offered a number of ideas for a “third option”, although the options that have been proposed are either too sketchy to judge or else are problematic.   For example, an alternative that many people have favored, and that Chatterjee has proposed is the schema.  For Chatterjee this is an abstraction that combines some of the features of both pictures and language.  In is, in fact, not too far from one of the ways Kosslyn, in his more conciliatory moods, has talked about the depictive displays when he says that they contain “predigested information” or are “annotated.”  One version of the schema proposal that has been popular in artificial intelligence is sometimes called a Frame (see, e.g., Minsky, 1975).  It is a structure that contains slots that are eventually bound to particulars as information becomes available.  It also contains default assignments that prevail if no information to the contrary is available and it contains procedures for transforming the structure or for determining what values go in the slots.  That sort of schema is clearly a symbolic form of representation.  What Chatterjee has in mind, however, and what many people would like to see, is a form of representation that not only has the properties of frames, but also has intrinsic spatial properties.  But such a system would still require a literal spatial display on which to locate the descriptions or annotations, so the locations of the slots or descriptors would have to follow Euclidean principles.   The only way out of this dilemma may be the one we have taken in our own theoretical work, in which we provide a mechanism called a visual index (or FINST) which provides a limited means for a descriptive representation to point to or reference objects in the world, thereby inheriting spatial properties from the world without actually having such properties itself (see below).

Thomas is one of the commentators who agrees that picture theory has all the problems I attribute to it, yet neither is he favorably disposed towards the null hypothesis.  Rather, he advocates another approach he calls Perceptual Activity Theory which holds that “imagery arises from vicarious exercise of … [mastery of relevant sensorimotor contingencies]: a sort of play-acting of perceptual exploration.”  The version of activity theory developed by (O'Regan & Noë, 2002), which Thomas cites with approval, has many clear advantages over picture theory, though it also has some problems of its own (many of which it inherits from J.J. Gibson’s direct realism, which minimizes the role of representations of any kind, see the commentary in, Pylyshyn, 2002).

Having said that, I should point out that many of the ideas I have proposed, in this response and elsewhere (Pylyshyn, 2001b), are very much in the spirit of perceptual action theory.  For example, in the next section I suggest that many of the apparent spatial and directional properties of images could derive from real space, providing we have a mechanism for associating features or objects in images with corresponding objects in real space.  This view has been developed in connection with a theory of visual indexes, which provides a mechanism for preconceptual links to objects in the world (the theory, which is alluded to only briefly in this response, is laid out in detail in Pylyshyn, 1998, 2000, 2001a, 2001b; Pylyshyn, forthcoming).  If we assume that the spatial quality of percieved space derives from the way we interact with, or are potentially able to interact with, objects in real space, we can explain why images appear to be spatial in the same way, so long as the spatial locations of objects in the images are bound to locations of objects in the world (as postulated in visual index theory).  So in fact my views do not diverge so radically from Thomas’ insofar as his views are sufficiently specified to allow comparison.  But I continue to view symbolic forms of representation as the default or “null” hypothesis for reasons sketched in section 4.1.

5.     Where does the spatiality of images come from, if not the display?

5.1  Space, motion and motor control in visual imagery 

Nijhawan & Khurana argue that perhaps the sort of static images I criticized are not strongly spatial because spatial maps are constructed from motion information and so are best accessed through motion stimuli.  The idea that motion may be important in revealing the spatial character of representations is an interesting one.  But in fact many of the experimental phenomena used to demonstrate the spatial nature of imagery do involve motion (e.g. mental scanning, mental rotation, mental paper folding, and other dynamic operations performed on images).  Nijhawan & Khurana ask whether I would agree that images are spatial if smooth motion in an image could be demonstrated.  But this question already assumes a reification of mental space.  What if imagined continuous motion did not literally involve movement through some space by the continuous change of position?  I have argued that imagining a greater distance does not involving a greater amount of some neural magnitude, such as cortical distance.  Imagining that something is moving and imagining that something is further away could be nothing more than entertaining the thought that some particular things are moving or are further away.  There is nothing strange about the idea of distinguishing motion from change in position in some space.  Patients with cerebral akinetopsia or motion blindness (Zeki, 1991) are able to see that objects change location over time without seeing them as moving.  Conversely, imagining something as moving could also occur without anything changing in real time (I once questioned why we assume that imagined time maps onto real time, but the question just seemed to puzzle people, Pylyshyn, 1979a).  In addition, of course, it could involve imagining something as being at a sequence of locations (as when observers simulate the experience of seeing something move by thinking of it as being at a series of locations).  But where are the locations, if not on an image display?  My contention, sketched out in sections 4.5, 5.2, and 5.3 of this response, is that the locations in this case could be those occupied by certain objects in the world, perceived visually or in some other modality, or places that are recalled in terms of their relation to some currently perceived objects (in the target article I described an experiment that showed that we were better at imagining uniform motion if we had a series of visible places to think of the object as being while we imagined it to be moving).   The point is that in imagining motion, the space through which an object passes need not be in the head.

Raab & Boschker also emphasize the dynamic quality of the perceptual act, although from a different point of view; one that focuses on the active exploration of visual information (this approach is consistent with that advocated by Thomas, and is one that has gained favor in computer vision).  Whatever the merits of the approach, it has led researchers (e.g., O'Regan & Noë, 2002) to focus on the information available in the world instead of its projection on a cortical display, which is clearly a step in the right direction.  From this perspective Raab & Boschker recommend that we study imagery by first studying dynamic and motor imagery.  I am glad to endorse this recommendation, and indeed have already endorsed the idea that some parts of the proprioceptive/kinesthetic/motor system may be used to provide the spatial properties usually attributed to images themselves.  But the same intentional fallacy that pervades visual imagery theory also threatens theories of motor imagery; it is the illusion that external properties of the world are not only represented, but are duplicated in the brain.  For example, the idea that we plan the execution of motor actions by carrying them out imaginally and observing what happens, presupposes that when we imagine actions, their consequence is automatically made available by virtue of some property of the imagery mechanism, without the involvement of knowledge, inference and thought.  This is exactly like the case of visual imagery where it is typically assumed that when we imagine some process (like mental scanning) all we have to do is wait and “see” what happens without having to draw inferences – because the process by which the image unfolds is determined by the inherent properties of the depictive display. 

The role that motor mechanisms play in the transformation of visual images is even less clear, notwithstanding evidence of correlations between visual image transformations and activity in parts of the motor system (Cohen et al., 1996; Richter et al., 2000), or the influence of real motor actions on visual transformations (Wexler, Kosslyn & Berthoz, 1998).   Some of the cortical activity observed during both motor performance and the mental transformation of visual images, may reflect the fact that these areas (e.g., posterior parietal cortex) compute higher-level functions required for extrapolating trajectories, for tracking, for planning, and for visuomotor coordination (Anderson, Snyder, Bradley, & Xing, 1997).   Since many of these functions also have to be computed in the course of anticipating movements visually, it is reasonable that the same areas might be active in both cases.  While studying the interaction of imagery and the motor system is clearly important, at the present time we are far from justified in concluding that dynamic visual imagery is carried out by means of the motor system (or that visual operations exploit motor control mechanisms).  This way of speaking suggests that our motor system can grasp and manipulate our images (a view that is quite compatible with the mental reification of the world that we find in much mental imagery theorizing).

5.2  Superimposing images on the visual world: Attention and inhibition

In the target article I suggested that certain things are special about the case where images are combined with visual perception (although the same thing probably applies when images are combined with other perceptual modalities that concern space).  The example discussed by Gosselin & Schyns falls into this category, and the comments in section 5.3 of the target article apply to it.  For over 30 years there has been evidence that observers are able to vary both their response bias (b) and sensitivity (d') at attended regions of a display (Bonnel, Possami & Schmitt, 1987; Downing, 1988; Farah, 1989 ; Mueller & Findlay, 1987; Segal & Fusella, 1969, 1970).  Focusing attention on a region with a certain simple shape (e.g., an S-shape) while looking at a display of uniform noise, could thus lead to the perception of a figure in the shape of the region being attended: the noise in the attended region would simply be enhanced (either amplified in amplitude or raised by a pedestal function, depending on whether the effect was due to a change of criterion or of sensitivity), creating a real perceptual effect at that region.

Gosselin & Schyns claim that since there was no information in the signal, the relevant information must have come from “an internally generated signal – i.e., an image”.  I agree with the first part of that claim but see nothing to be gained by calling a distribution of attention an “image”.   Contrary to the way that Gosselin & Schyns put it, there is no reason to assume that observers had to have “knowledge of all pictorial characteristics of an ‘S’” and even less reason to claim that what they have is “functionally isomorphic to an actual image of an ‘S’”.  All they need is a description of the ‘S’ together with the ability to think the following sort of demonstrative thoughts while viewing certain regions in the display: “this region is where Ri would fall,” where Ri is the part of the overall representation of the shape that encodes a certain discrete part i of that shape (it might, for example, be a code for the top upward-concave semicircle of the ‘S’).   Such a mechanism would allow observers to allocate attention to the appropriate region, and in so doing would enhance the sensitivity in that region without having to image the figure.  (Farah, 1989) showed that instructions to attend to a region were even more effective in sensitizing a region than instructions to imagine the relevant-shaped region.   Grossberg’s LAMINART model even provides a neural mechanism for how this can occur (by shifting the balance between excitation and inhibition on relevant neural circuits).

This view of what is going on is radically different from the one proposed by Gosselin & Schyns.  For one thing, according to this view there is no such thing as space in the image (as required by the depictive theory), there is only the space where the observer is looking.  To put this another way, the Gosselin & Schyns finding (and their sophisticated power-spectrum analysis) is compatible with the representation of the ‘S’ involved in this situation being a symbolic encoding with no spatial display in the head at all.  For example, it might be encoded as symbols denoting different geons (Biederman, 1987), together with some symbolic form of location code used to direct visual attention to contiguous sets of pixels in the real display, thus leaving all spatial properties where they belong – outside the head.  It would be more interesting if it could be shown that observers “project” more than a selected region onto a surface.  For example, if they could project other visual properties such as color, shading, texture, depth etc., which had observable visual consequences that could not plausibly be attributed to the effect of attention.

The Arterberry, Craver-Lemley and Reeves commentary also involves phenomena that arise when an image is “superimposed” over a viewed display.  The demonstration of dissimilarities between vision and imagery discussed by Arterberry et al. is very interesting.  Their account, in terms of the suppression of visual signals that compete with projected imagery, is particularly intriguing and has far-reaching ramifications for understanding the involvement of visual cortex in imagery.  Gottesmann also reports evidence that activity in primary visual cortex, as monitored by EEG, is suppressed during vivid dreaming.  Meyer points out that single-cell recordings also do not provide evidence for the activation of visual cortex in imagery, and Sokolov hypothesizes that visual information may be located in the temporoparietal area.  Such findings suggest that visual imagery experiences may not be associated with activity in the visual cortex (as Crick & Koch, 1995, had already recognized).  Indeed, Gottesmann’s finding that there is actual suppression of activity in visual cortex during certain visual experiences raises the intriguing possibility that the increased blood flow observed by neuroimagining techniques may not reflect increased information-processing activity (a possibility previously suggested by Fidelman, 1994) but might perhaps reflect the active suppression of visual information processing.  This would make sense in those experiments that involve examining an image during fMRI or PET scanning when there is some visual input, or even visual persistence of the sort that Ingle describes or that (Ishai & Sagi, 1995) report, which would have to be ignored in carrying out the primary task (and this could be true even if the eyes were closed).  Intriguing as it is, such speculation will clearly need to be submitted to careful empirical scrutiny.

5.3  Visual neglect and spatial orientation in imagery

One often hears that reports of left hemifield neglect in both vision and mental imagery supports the view that both involve an image on a cortical display, since if one side of the Tootell Display is damaged, the same deficit might be expected in both vision and imagery.   But the idea that what is damaged in visual neglect is one side of a display seems too simplistic[5]; it does not account for the dissociation between visual and imaginal neglect (Coslett, 1997), for the amodal nature of neglect (the deficit shows up in audition as well as vision, Marshall, 2001; Pavani, Ladavas & Driver, 2002), for the fact that “neglected” stimuli typically provide some implicit information (Driver & Vuilleumier, 2001; McGlinchey-Berroth, Milberg, Verfaellie, & Grande, 1996; Schweinberger & Stief, 2001), for the characteristic response bias factors in neglect (Bisiach, Ricci, Lualdi, & Colombo, 1998; Vuilleumier & Rafal, 1999) and for the fact that higher-level strategic factors appear to play a central role in the neglect syndrome (Behrmann & Tipper, 1999b; Bisiach et al., 1998; Landis, 2000; Rode, Rossetti & Biosson, 2001).  The “damaged display” view also does not account for the large number of cases of object-centred neglect (Behrmann & Tipper, 1999a; Tipper & Behrmann, 1996).  Moreover, as (Bartolomeo & Chokron, 2002) have documented (and reiterate in their commentary), the primary deficit in neglect is best viewed as the failure of stimuli on the neglect side to attract attention.

I agree with Bartolomeo & Chokronas well as with Burgess and with Chatterjee, that it would be odd for a symbolic encoding system by itself to have directional preferences, such as found in neglect, and I also agree that most cases of imaginal neglect are unlikely to be due to tacit knowledge.   Having granted that, one must then ask why we should expect the explanation for such directional properties to be found in the format of representations or in the medium of the Tootell Display.  Deficits such as neglect, whether in vision or in imagery, represent a failure to orient to one side or the other, and the direction may have more to do with direction in the world, than direction in an image.  As in the case of the Gosselin & Schyns example discussed earlier, orienting is a world-directed response.  There is considerable merit in Bartolomeo & Chokron’s suggestion that perhaps “visual imagery involves some of the attentional-exploratory mechanisms that are employed in visual behavior … [so] the ‘perceptual’ aspects of visual mental images might thus result not from the construction of putative ‘quasi-perceptual’ representations, but from the engagement of attentional and intentional aspects of perception in imaginal activity.”  In other words when attending to the left side of an image, patients are actually orienting towards the left side of the perceived world (or perhaps of their body).  Even with eyes closed we have accurate recall, at least for a short time, of the location of things in the world immediately around us (see the remarks about this by Ingle) and it may be in relation to these world-locations that attention orients.  As I speculated in section 5.3 of the target article, it may be generally the case that it is the physical space outside the head that gives imagery its putative spatial character and that it does so by virtue of how mental contents are associated with (or bound to) places in the perceived world.  This interpretation is given further support by reports, mentioned by Bartolomeo & Chokron, that imaginal neglect can be modulated by peripheral manipulations, such as turning the head. 

Although the case is clearest when the spatial layout is visually perceived while imagining, since in that case aspects of what is imagined can be associated with places in the perceived layout through visual indexes, there is no reason why this should not also hold when real space is sensed through other modalities, such as proprioceptive or kinesthetic modalities (indeed motor-space analogs of visual indexes, called Anchors, were proposed when the visual index theory was first introduced in  Pylyshyn, 1989).  It is known that people are very good at orienting to stimuli that are not visually present (Attneave & Farrar, 1977).  The ability to bind objects of thought to the location of perceived (or recalled) external objects allows us to orient to them, thereby enhancing the illusion that things are laid out inside the head the way that the corresponding things are laid out outside the head, thus reinforcing the intentional fallacy.

6.     Tacit knowledge and cognitive penetrability again

In the target article and elsewhere I have been at pains to point out that tacit knowledge does not explain all imagery phenomena (it does not, for example, explain all aspects of mental rotation or of the crowding effect in mental scanning or the oblique effect).  But if it does explain some things (e.g., the scanning effect, the image size effect) then the extra apparatus of a depictive display is redundant because it plays no role in the explanation, however much it might give comfort to one's preconceived ideas.  It's not that the postulated structure is necessarily false, but it is simply irrelevant to the data at hand.  In these cases, if there are certain activity patterns on the Tootell Display they are not the reason that you get the scanning effect, the size effect, or even the phenomenology of mental imagery.  That is why it does not help to say that tacit knowledge may be encoded in the form of a depictive representation (as KTG suggest).  KTG are correct to note that I use tacit knowledge to talk about content rather than form, so that if the phenomena can be explained by appeal to tacit knowledge then assuming that such knowledge is encoded in a depictive manner is gratuitous.  It could be encoded in protein molecules so far as these data are concerned (indeed there is evidence in favor of such an idea as regards spatial memory, see Blum, Moore, Adams, & Dash, 1999), because these data do not in any way constrain the format of the representation. 

So why assume a depictive representation as a way of encoding tacit knowledge?  The answer is surely that KTG believe that it is the spatial format and not the content that explain the data, but in that they are simply wrong for those cases (such as scanning) where tacit knowledge provides a better explanation.  The demonstration that they are wrong consists in showing such things as that if observers believed that in viewing a map it would take them longer to switch from viewing point A to viewing point B when A and B were, say, on opposite sides of the river on the map, then they will take longer when they do it from their image. If you need convincing, just imagine a map and imagine that switching your point of regard takes longer when the two imagined places are across the river (you could do that by thinking of them as moving more slowly through water or as swimming across or just taking longer to get there without moving at all).  Go ahead, do the experiment: it is your image so it can have any property you want it to have!   I’m not saying that people do believe that it will take longer if the fixations are on either side of the river.  But if they did believe it, for whatever reason that may strike them, then it is obvious that this is the way it would happen in their image.  You might wonder whether this is only true of dynamical processes like scanning as opposed to basic geometrical properties.  Can your image, for example, be non-Euclidean?  This is where tacit knowledge is obviously relevant.  In order to imagine such a thing as a non-Euclidean (or four-dimensional) space, you would have to have certain relevant knowledge; you would have to know what moving through a non-Euclidean space would look like, in the sense that you would need to know, for example, how shapes would change as you moved through this space.  (Contrary to KTG’s claim, I don’t say that you need to have seen things for you to be able to imagine them, but you need to know what aspects of them would look like, in the same sense that the patients that Goldenberg writes about in his commentary do not know what certain things would look like and consequently cannot image them).

The main point about cognitive penetrability is not just that you can influence your image (as KTG assume), but that your image has no properties other than those you take it to have – which is precisely Dalla Barba, Rosenthal and Visetti’s point about there being no surprises in your image.  Anyone who does not believe that their image will do more-or-less what they will it to do is allowing scientific ideology to override common sense.  Your image will even look like what you make it look like, usually how you believe that something does look (but not necessarily, since you can make your image of something look different from the way you believe it actually looks!).  Of course many factors determine what you do make it look like, whether or not you can express (in language or in drawings) what something looks like and the conditions under which you can recall what something looks like.  Studies showing that people cannot predict the results of imagery experiments (as in the Denis & Carfantan, 1985, study that KTG cite) are beside the point, all you need is some idea, however vague or implicit or ineffable, about what it would be like if you were to see the thing you are supposed to imagine.  Adopting the image mode of reasoning (i.e., focusing on the appearance of things, whatever that means in terms of a theory of imagery) may well alter the likelihood that you will think of or recall something or other, just as being in a certain place affects what you recall.  But no conclusion can be drawn about the nature of images from such properties of imagery, since these are properties of memory and thought in general.

Goldenberg appears to agree with most of what I claim about certain phenomena of imagery being due to knowledge, but he insists that, in contrast to the knowledge that functions in recognition, the knowledge involved in imagery is not tacit but explicit.  Here much depends on what you mean by “tacit”.  In very many cases the relevant knowledge is indeed explicit knowledge of how things look, inasmuch as that knowledge is available for answering questions.  But it also seems to me that the knowledge that determines such properties of imagery as “the visual angle of the mind’s eye” (Kosslyn, 1978), are not available if one simply asks the subject, which is why I called it tacit, yet it can be rationally altered by providing the right experience or information, and it can be revealed in a variety of ways, which is why I call it knowledge.  For Goldenberg the relevant distinction is between knowledge that can have a general effect in cognition (which he calls explicit) and knowledge that is part of the modular visual system and is only used in recognition (which he calls tacit).  While that a distinction is clearly worth preserving it is not the one I had in mind in appealing to tacit knowledge, so the apparent disagreement may well be merely terminological.

7.     Second-order and “structural” isomorphism

I do not claim, as Amiri & Marsolek suppose, that representations must be first-order isomorphic in order to be explanatory.  What I said is that picture-theorists claim that images are spatially isomorphic (or homeomorphic) to a picture of what they depict but that in order for this sort of isomorphism to explain typical imagery phenomena, the representation would have to be literally spatial rather than “functionally spatial”.   But second-order isomorphism of the sort that Shepard studied, while it does not require pictures-in-the-head, is not sufficiently constraining: It is simply functional isomorphism and is compatible with a descriptivist position (see Pylyshyn, 1984, Chapter 9).  A system of representations that is second-order isomorphic to some domain is just a system of representations that allows a similarity measure among represented objects to be computed (as in the original similarity judgment study of Shepard & Chipman, 1970).  There is no reason why such similarity judgments could not be based on inferences drawn from symbolic representations. Second-order isomorphism by itself places no constraints on the form of the representation – it could be depictive or symbolic or anything else.  One would need at least to know why the second order isomorphism held (what mechanisms were responsible) in order to infer the form of the representation itself.

The (Edelman, 1998) paper that Amiri & Marsolek cite, presents a mathematical analysis of the requirements that should be met by an adequate system of representation, which inter alia include second-order isomorphism.  The further step that Amiri & Marsolek take of suggesting that it is the cognitive architecture rather than the content of representations (and inferences drawn from them) that must be responsible for the second-order isomorphism is an interesting and substantive proposal.  Given the modularity of vision, there are significant belief-independent constraints built in to the early vision system, especially in computing an object’s appearance.  Nonetheless, second-order similarity of the sort that was demonstrated by  (Shepard & Chipman, 1970) is in general cognitively penetrable (and readily altered by attentional strategies, as Shepard, 1964, showed) and therefore unlikely to be part of the architecture.

I’m not sure whether Wright has something like second-order isomorphism in mind when he says that visual experiences are only “structurally isomorphic” to sensory inputs.  But he makes a very different point when he goes on to claim that although there are no pictures in the brain there is “non-epistemic” storage, which he calls “inner registration,” of sensory events.  I assume that he equates these to uninterpreted sensory images.  But there is a great deal of empirical data showing that sensory records are not kept, but even if they were, mental images are certainly not like records of such sensory events.  I argued in the target article that while some image reinterpretation may occur, this sort of reinterpretation is arguably not “visual,” nor is the record that is reinterpreted a record of “non-epistemic” visual sensations nor their “structurally isomorphic” internal responses.   While it is known that fairly rich sensory storage may be available for short times (Ishai & Sagi, 1995; Sperling, 1960), those sorts of iconic stores are very different from images constructed from memory.  Wright’s “bell-I-mud-dum” example involves reparsing a phonetic string in short-term memory (where it was arguably already “epistemic” inasmuch as it was likely encoded in terms of phones), but I am not aware of any evidence of such reparsings occurring from an image constructed from long-term memory.

8.     Other topics: Visual expectations, thoughts and phenomenology

8.1  Visual expectations or visual images? 

Zaidi & Griffiths provide some interesting demonstrations of visual illusions that appear when a mental rotation is carried out on a perceived figure.  From these they conclude, quite reasonably, that it is the assumptions that viewers make that results in their expecting the rotated figure to look different from what it actually does look like when it is physically rotated.   Zaidi & Griffiths conclude that “active visual imagery is an integral part of active visual perception”.   I would not have put it that way since visual expectations are hardly equivalent to the sorts of images that we experience or that picture-theorists postulate.  Visual expectations need not take the form of a projected picture, as opposed to some general prediction as to which elements should be where.   In discussing the Gosselin & Schyns commentary in section 5.2, I argue that a visual expectation, even one that involves detailed shapes and locations, does not need to be more than a spatial distributions of attention over a real scene, which is very different from a picture or a pattern of activity on a Toottell Display.

8.2  Distinguishing images from thoughts. 

Niall is right that if images are propositional, then their propositional content is insufficient to demonstrate a simple Euclidean theorem.  But who is so naïve as to think that the visual system “sees” the entailments of Euclid’s axioms in a figure (perhaps a person who believes we can think in pictures?).  As Niall’s example shows, we are deceived if we think we have represented all the diagram’s geometrical properties in our image.  But the moral to draw from this is not that mental images are “non-epistemic” (to use Wright’s term) or that they do not constitute knowledge.  Diagrams in the world and on the retina are nonepistemic, but mental representations of them are epistemic; they constitute beliefs about how things look, which is why we can think about them and is why we can be mistaken about their true shape.   They are also too impoverished to permit proofs of Euclid’s First Proposition without additional non-diagramatic representations (i.e., thoughts, which, contra empiricism, do not derive from sensations).

8.3  The role of phenomenology

Dalla Barba, Rosenthal and Visetti raise an interesting point concerning the role of phenomenology in the enterprise of understanding mental imagery.  They say that in studying imagery, phenomenology is of the essence and it does support a picture theory of imagery because that is how we experience imagery.  They assert that I am unfairly maligning phenomenology “for what it never pretended to be” and that “phenomenology has never aimed at causal explanation.”  This may well be the case, although my target was not phenomenology, but precisely the attribution of causal power to the experience itself (which is done implicitly and nearly universally).   I find myself agreeing with much of the Dalla Barba et al., commentary (e.g., concerning the content of images and its qualitative difference from vision), which suggests that perhaps the proper use of phenomenological evidence may be a useful tool, although psychology has been justifiably suspicious of introspection since the failure of the method to deliver scientifically useful results at the turn of the (last) century (although the introspective method is not the same as phenomenology since it was intended to provide an analytical discipline for recognizing components of conscious states).  The issue of a phenomenological homunculus, raised by Dalla Barba et al., reduces to the intractable mind-body (or experience-experiencer) problem, which is beyond the scope of the present article, not to mention the present author (but see Dennett, 1991).  In any case the phenomenological homunculus (the experience of being a viewer of one’s image) is irrelevant to a causal theory, as the authors admit at the outset.

9.     Conclusion

What is so unappealing about the current direction in the study of mental imagery is that it cannot seem to avoid what (Pessoa, Thompson & Noë, 1998) call “analytical isomorphism” – the assumption that what one will find in the brain is what appears in one’s conscious experience.   I recommend the following heuristic:  If you feel yourself drawn by some body of data to the view that what is in your head is a smaller and perhaps less detailed version of what is in the world, then you had better stop and reconsider your underlying assumptions.  While many readers were not persuaded by what I called the null hypothesis, it does appear that there has been a move away from naïve picture theory in several areas of imagery research.  Many people are now objecting to the purely symbolic view by considering other options, rather than by insisting that it is obvious that imagery must exploit some sort of spatial display.   Others are concentrating on studying the parallel mechanisms of vision and imagery and rejecting the implication that this means there must be a picture-like object for vision to exploit.  This is a conceptually difficult problem and the arguments will no doubt continue (despite the belief held by many writers that the debate has already been resolved by evidence from neuroscience).   One can always hope that the next time around we may approach the question with a better appreciation of the general conditions that have to be met by an adequate theory.  On the other hand, as Slezak intimates, we may be condemned, like Sisyphus, to repeat the task of correcting the intentional fallacy without end, creating employment for future generations of cognitive scientists and philosophers.

References

Anderson, R. A., Snyder, L. H., Bradley, D. C., & Xing, J. (1997). Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience, 29, 303-330.

Attneave, F., & Farrar, P. (1977). The visual world behind the head. American Journal of Psychology, 90(4), 549-563.

Bartolomeo, P., & Chokron, S. (2002). Orienting of attention in left unilateral neglect. Neuroscience and Biobehavioral Reviews, 26(2), 217-234.

Behrmann, M., & Tipper, S. (1999a). Attention accesses multiple reference frames: Evidence from unilateral neglect. Journal of Experimental Psychology: Human Perception and Performance, 25, 83-101.

Behrmann, M., & Tipper, S. P. (1999b). Attention accesses multiple reference frames: Evidence from visual neglect. Journal of Experimental Psychology: Human Perception & Performance, 25(1), 83-101.

Biederman, I. (1987). Recognition-by-components:  A theory of human image interpretation. Psychological Review, 94, 115-148.

Bisiach, E., Ricci, R., Lualdi, M., & Colombo, M. R. (1998). Perceptual and response bias in unilateral neglect: Two modified versions of the Milner Landmark task. Brain & Cognition, 37(3), 369-386.

Block, N. J., and J. A. Fodor. (1972). Cognitivism and the Analog/Digital Distinction.

Blum, S., Moore, A. N., Adams, F., & Dash, P. K. (1999). A mitogen-activated protein kinase cascade in the CA1/CA2 subfield of the dorsal hippocampus is essential for long-term spatial memory. Journal of Neuroscience, 19(9), 3535-3544.

Bonnel, A. M., Possami, C. A., & Schmitt, M. (1987). Early modulation of visual input: A study of attentional strategies. The Quarterly Journal of Experimental Psychology, 39A(4), 757-776.

Cohen, M. S., Kosslyn, S. M., Breiter, H. C., DiGirolamo, G. J., Thompson, W. L., Anderson, A. K., Bookheimer, S. Y., Rosen, B. R., & Belliveau, J. W. (1996). Changes in cortical activity during mental rotation: A mapping study using functional MRI. Brain, 119.

Coslett, H. B. (1997). Neglect in vision and visual imagery: a double dissociation. Brain, 120, 1163-1171.

Crick, F., & Koch, C. (1995). Are we aware of neural activity in primary visual cortex? Nature, 375(11), 121-123.

Dehaene, S., Dehaene-Lambertz, G., & Cohen, L. (1998). Abstract representations of numbers in the animal and human brain. Trends in Neurosciences, 21(8), 355-361.

Denis, M., & Carfantan, M. (1985). People's knowledge about images. Cognition, 20(1), 49-60.

Dennett, D. C. (1978). Brainstorms. Cambridge, Mass.: MIT Press, a Bradford Book.

Dennett, D. C. (1991). Consciousness Explained. Boston: Little, Brown & Company.

Downing, C. J. (1988). Expectancy and visual-spatial attention: Effects on perceptual quality. Journal of Experimental Psychology: Human Perception and Performance, 14, 188-202.

Driver, J., & Vuilleumier, P. (2001). Perceptual awareness and its loss in unilateral neglect and extinction. Cognition, 79(1-2), 39-88.

Edelman, S. (1998). Representation is representation of similarities. Bheavioral and Brain Sciences, 21, 449-498.

Farah, M. J. (1989). Mechanisms of imagery-perception interaction. Journal of Experimental Psychology: Human Perception and Performance, 15, 203-211.

Fidelman, U. (1994). A misleading implication of the metabolism scans of the brain. Int J Neurosci, 74(1-4), 105-108.

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical  analysis. Cognition, 28, 3-71.

Gallistel, C. R., & Gelman, R. (2000). Nonverbal numerical cognition: From reals to integers. Trends in Cognitive Sciences, 4, 59-65.

Ishai, A., & Sagi, D. (1995). Common mechanisms of visual imagery and perception. Science, 268(5218), 1772-1774.

Kosslyn, S. M. (1978). Measuring the visual angle of the mind's eye. Cognitive Psychology, 10, 356-389.

Kosslyn, S. M. (1980). Image and Mind. Cambridge, Mass.: Harvard Univ. Press.

Kosslyn, S. M. (1994). Image and Brain: The resolution of the imagery debate. Cambridge. MA: MIT Press.

Landau, B., & Gleitman, L. R. (1985). Language and experience: Evidence from the blind child. Cambridge, MA, USA: Harvard University Press.

Landis, T. (2000). Disruption of space perception due to cortical lesions. Spatial Vision, 13(2-3), 179-191.

Levesque, H. J., & Brachman, R. J. (1985). A fundamental tradeoff in knowledge representation and reasoning (revised version). In H. J. Levesque & R. J. Brachman (Eds.), Readings in Knowledge Representation (pp. 41-70). Los Altos, CA: Morgan Kaufmann Publishers.

Lewis, D. (1971). Analog and Digital. Nous, 321-327.

Marr, D., & Nishihara, H. K. (1976). Representation and recognition of spatial organization of three-dimensional shapes.

Marshall, J. C. (2001). Auditory neglect and right parietal cortex. Brain, 124(4), 645-646.

McGlinchey-Berroth, R., Milberg, W. P., Verfaellie, M., & Grande, L. (1996). Semantic processing and orthographic specificity in hemispatial neglect. Journal of Cognitive Neuroscience, 8(3), 291-304.

Minsky, M., & Papert, S. (1971). On some associative, Parallel and Analog computations. In E. L. Jacks (Ed.), Associative Information Techniques: Symposium at the General Motors Research Laboratories (pp. 27-47). New York: Elsevier.

Minsky, M. L. (1975). A Framework for Representing Knowledge. In P. H. Winston (Ed.), The Psychology of Computer Vision. New York: McGraw-Hill.

Mueller, H. J., & Findlay, J. M. (1987). Sensitivity and criterion effects in the spatial cuing of visual attention. Perception and Psychophysics, 42, 383-399.

O'Regan, J. K. (1992). Solving the "real" mysteries of visual perception: The world as an outside memory. Canadian Journal of Psychology, 46, 461-488.

O'Regan, J. K., & Noë, A. (2002). A sensorymotor account of vision and visual consciousness. Behavoral and Brain Sciences, 24(5), xxx-xxx.

Pavani, F., Ladavas, E., & Driver, J. (2002). Selective deficit of auditory localisation in patients with visuospatial neglect. Neuropsychologia, 40(3), 291-301.

Pessoa, L., Thompson, E., & Noë, A. (1998). Finding out about filling in: A guide to perceptual completion for visual science and the philosophy of perception. Behavioral and Brain Sciences, 21(6), 723-802.

Pylyshyn, Z. W. (1973). What the Mind's Eye Tells the Mind's Brain:  A Critique of Mental Imagery. Psychological Bulletin, 80, 1-24.

Pylyshyn, Z. W. (1979a). Do Mental Events Have Durations? Behavioral and Brain Sciences, 2(2), 277-278.

Pylyshyn, Z. W. (1979b). Validating Computational Models:  A Critique of Anderson's Indeterminacy of Representation Claim. Psychological Review, 86(4), 383-394.

Pylyshyn, Z. W. (1984). Computation and cognition:  Toward a foundation for cognitive science. Cambridge, MA: MIT Press.

Pylyshyn, Z. W. (1989). The role of location indexes in spatial perception: A sketch of the FINST spatial-index model. Cognition, 32, 65-97.

Pylyshyn, Z. W. (1998). Visual indexes in spatial vision and imagery. In R. D. Wright (Ed.), Visual Attention (pp. 215-231). New York: Oxford University Press.

Pylyshyn, Z. W. (2000). Situating vision in the world. Trends in Cognitive Sciences, 4(5), 197-207.

Pylyshyn, Z. W. (2001a). Connecting vision and the world: Tracking the missing link. In J. Branquinho (Ed.), The Foundations of Cognitive Science. Oxford, UK: Claredon Press.

Pylyshyn, Z. W. (2001b). Visual indexes, preconceptual objects, and situated vision. Cognition, 80(1/2), 127-158.

Pylyshyn, Z. W. (2002). Seeing, acting and knowing: commentary on O'Regan & Noë. Behavioral and Brain Sciences, 24(5), xxx-xxx.

Pylyshyn, Z. W. (forthcoming). Seeing and visualizing: It's not what you think. Cambridge, MA: MIT Press.

Richter, W., Somorjai, R., Summers, R., Jarmasz, M., Menon, R. S., Gati, J. S., Georgopoulos, A. P., Tegeler, C., Ugurbil, K., & Kim, S.-G. (2000). Motor area activity during mental rotation studied by time-resolved single-trial fMRI. Journal of Cognitive Neuroscience, 12(2), 310-320.

Rode, G., Rossetti, Y., & Biosson, D. (2001). Prism adaptation improves representational neglect. Neuropsychologia, 39(11), 1250-1254.

Schweinberger, S. R., & Stief, V. (2001). Implicit perception in patients with visual neglect: Lexical specificity in repetition priming. Neuropsychologia, 39(4), 420-429.

Segal, S. J., & Fusella, V. (1969). Effects of imaging on signal-to-noise ratio, with varying signal conditions. British Journal of Psychology, 60(4), 459-464.

Segal, S. J., & Fusella, V. (1970). Influence of imaged pictures and sounds on detection of visual and auditory signals. Journal of Experimental Psychology, 83(3), 458-464.

Shepard, R. N. (1964). Attention and the Metrical Structure of the Similarity Space. Journal of Mathematica Psychology, 1, 54-87.

Shepard, R. N., & Chipman, S. (1970). Second-Order Isomorphism of Internal Representations:  Shapes of States. Cognitive Psychology, 1, 1-17.

Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs, 74 (whole No. 11).

Thomas, N. J. T. (in press). Mental Imagery, Philosophical Issues About. In L. Nadel (Ed.), Encyclopedia of Cognitive Science. London: Macmilan/Nature Pulishing.

Tipper, S. P., & Behrmann, M. (1996). Object-centered not scene-based visual neglect. Journal of Experimental Psychology: Human Perception and Performance, 22(5), 1261-1278.

Tootell, R. B., Silverman, M. S., Switkes, E., & de Valois, R. L. (1982). Deoxyglucose analysis of retinotopic organization in primate striate cortex. Science, 218(4575), 902-904.

Vuilleumier, P., & Rafal, R. (1999). "Both" means more than "two": Localizing and counting in patients with visuospatial neglect. Nature Neuroscience, 2(9), 783-784.

Wexler, M., Kosslyn, S. M., & Berthoz, A. (1998). Motor processes in mental rotation. Cognition, 68(1), 77-94.

Young, M. P. (2000). The architecture of visual cortex and inferential processes in vision. Spatial Vision, 13(2-3), 137-146.

Zeki, S. (1991). Cerebral akinetopsia (visual motion blindness): A review. Brain, 114(Pt 2), 811-824.

 



* Work on this paper was supported by the National Institutes of Health Research Grant 1R01-MH60924.  Send reprint requests to the author at Rutgers Center for Cognitive Science, Center for Cognitive Science, Psychology Bldg Addition, Busch Campus, Rutgers University, New Brunswick, Piscataway, NJ 08854-8020.  Author email: zenon@ruccs.rutgers.edu

[1] I responded to Anderson’s version of the indeterminism thesis in (Pylyshyn, 1979b), and have written extensively on the notion of “strong equivalence” in cognitive science (e.g., Pylyshyn, 1984), showing that mere input-output equivalence is not what cognitive scientists aim for, even without the benefit of neuroscience data.

[2] Kosslyn et al., claim, “The depictive theory …  presents a coherent, internally consistent view of how mental images may be processed.”  But so long as the coherence and predictive power come not from intrinsic properties of the “depictive” form of representation itself, but from a variety of ancillary assumptions about how the representation must be used and what restrictions are placed on accessing information from it, the depictive theory is coherent only in that it fits one’s preconceptions, and its predictive power derives entirely from the independent constraints which any theory could adopt.  Think of the added epicycles of “annotations” or the “predigested information” or the requirement that to get from A to B one has to pass through places that are “in between,” even though they are so only by stipulation, or the appeal to the intuition that smaller images must be harder to “see,” and so on, all of which are assumed for no “internally consistent” reason except to fit the data at hand, however they turn out.  Was it really the depictive form of images that predicted the oblique effect?  Once laid bare, the depictive theory is no less a patchwork than any other theory for explaining the experimental phenomena of mental imagery – which are unlikely to have a single cause in any case.