To appear in Handbook of Epistemology ed. by Matti Sintonen, et al. (Dordrecht: Kluwer).

 Final Draft 11/10/99

Archived at HomePage for the Rutgers University Research Group on Evolution and Higher Cognition.


Reason and Rationality

Richard Samuels
Department of Philosophy
University of Pennsylvania
Philadelphia, PA 1904-6304
rsamuels@phil.upenn.edu

Stephen Stich
Department of Philosophy and Center for Cognitive Science
Rutgers University
New Brunswick, NJ 08901
stich@ruccs.rutgers.edu

and

Luc Faucher
Department of Philosophy
Rutgers University
New Brunswick, NJ 08901
lucfaucher@hotmail.com


 

 

1. Introduction: Three Projects in the Study of Reason

 

            Over the past few decades, reasoning and rationality have been the focus of enormous interdisciplinary attention, attracting interest from philosophers, psychologists, economists, statisticians and anthropologists, among others. The widespread interest in the topic reflects the central status of reasoning in human affairs.  But it also suggests that there are many different though related projects and tasks which need to be addressed if we are to attain a comprehensive understanding of reasoning.

 

Three projects that we think are particularly worthy of mention are what we call the descriptive, normative and evaluative projects. The descriptive project – which is typically pursued by psychologists, though anthropologists and computer scientists have also made important contributions – aims to characterize how people actually go about the business of reasoning and to discover the psychological mechanisms and processes that underlie the patterns of reasoning that are observed. By contrast, the normative project is concerned not so much with how people actually reason as with how they should reason.  The goal is to discover rules or principles that specify what it is to reason correctly or rationally – to specify standards against which the quality of human reasoning can be measured. Finally, the evaluative project aims to determine the extent to which human reasoning accords with appropriate normative standards. Given some criterion, often only a tacit one, of what counts as good reasoning, those who pursue the evaluative project aim to determine the extent to which human reasoning meets the assumed standard.

 

In the course of this paper we touch on each of these projects and consider some of the relationships among them.  Our point of departure, however, is an array of very unsettling experimental results which, many have believed, suggest a grim outcome to the evaluative project and support a deeply pessimistic view of human rationality. The results that have led to this evaluation started to emerge in the early 1970s when Amos Tversky, Daniel Kahneman and a number of other psychologists began reporting findings suggesting that under quite ordinary circumstances, people reason and make decisions in ways that systematically violate familiar canons of rationality on a broad array of problems. Those first surprising studies sparked the growth of an enormously influential research program – often called the heuristics and biases program – whose impact has been felt in a wide range of disciplines including psychology, economics, political theory and medicine. In section 2, we provide a brief overview of some of the more disquieting experimental findings in this area. 

 

            What precisely do these experimental results show? Though there is considerable debate over this question, one widely discussed interpretation that is often associated with the heuristics and biases tradition claims that they have “bleak implications” for the rationality of the man and woman in the street. What the studies indicate, according to this interpretation, is that ordinary people lack the underlying rational competence to handle a wide array of reasoning tasks, and thus that they must exploit a collection of simple heuristics which make them prone to seriously counter-normative patterns of reasoning or biases. In Section 3, we set out this pessimistic interpretation of the experimental results and explain the technical notion of competence that it invokes. We also briefly sketch the normative standard that advocates of the pessimistic interpretation typically employ when evaluating human reasoning.  This normative stance, sometimes called the Standard Picture, maintains that the appropriate norms for reasoning are derived from formal theories such as logic, probability theory and decision theory (Stein, 1996).

 

Though the pessimistic interpretation has received considerable support, it is not without its critics. Indeed much of the most exciting recent work on reasoning has been motivated, in part, by a desire to challenge the pessimistic account of human rationality. In the latter parts of this paper, our major objective will be the consider and evaluate some of the most recent and intriguing of these challenges. The first comes from the newly emerging field of evolutionary psychology. In section 4 we sketch the conception of the mind and its history advocated by evolutionary psychologists, and in section 5 we evaluate the plausibility of their claim that the evaluative project is likely to have a more positive outcome if these evolutionary psychological theories of cognition are correct. In section 6 we turn our attention to a rather different kind of challenge to the pessimistic interpretation – a cluster of objections that focus on the role of pragmatic, linguistic factors in experimental contexts. According to these objections, much of the data for putative reasoning errors is problematic because insufficient attention has been paid to the way in which people interpret the experimental tasks they are asked to perform. In section 7 we  focus on a range of problems surrounding the interpretation and application of the principles of the Standard Picture of rationality. These objections maintain that the paired projects of deriving normative principles from formal systems, such as logic and probability theory, and determining when reasoners have violated these principles are far harder than advocates of the pessimistic interpretation are inclined to admit. Indeed, one might think that the difficulties that these tasks pose suggest that we ought to reject the Standard Picture as a normative benchmark against which to evaluate the quality of human reasoning. Finally, in section 8 we further scrutinize the normative assumptions made by advocates of the pessimistic interpretation and consider a number of arguments which appear to show that we ought to reject the Standard Picture in favor of some alternative conception of normative standards.

 

 

2. Some Disquieting Evidence about How Humans Reason

 

Our first order of business is to describe some of the experimental results that have been taken to support the claim that human beings frequently fail to satisfy appropriate normative standards of reasoning. The literature on these errors and biases has grown to epic proportions over the last few decades and we won’t attempt to provide a comprehensive review.[1] Instead, we focus on what we think are some of the most intriguing and disturbing studies. 

 

2.1.  The Selection Task

 

            In 1966, Peter Wason published a highly influential study of a cluster of reasoning problems that became known as the selection task.  As a recent textbook observes, this task has become “the most intensively researched single problem in the history of the psychology of reasoning.” (Evans, Newstead & Byrne, 1993, p. 99) Figure 1 illustrates a typical example of a selection task problem.


Figure 1


 

What Wason and numerous other investigators have found is that subjects typically perform very poorly on questions like this.  Most subjects respond correctly that the E card must be turned over, but many also judge that the 5 card must be turned over, despite the fact that the 5 card could not falsify the claim no matter what is on the other side.  Also, a majority of subjects judge that the 4 card need not be turned over, though without turning it over there is no way of knowing whether it has a vowel on the other side.  And, of course, if it does have a vowel on the other side then the claim is not true.  It is not the case that subjects do poorly on all selection task problems, however.  A wide range of variations on the basic pattern have been tried, and on some versions of the problem a much larger percentage of subjects answer correctly.  These results form a bewildering pattern, since there is no obvious feature or cluster of features that separates versions on which subjects do well from those on which they do poorly.  As we will see in Section 4, some evolutionary psychologists have argued that these results can be explained if we focus on the sorts of mental mechanisms that would have been crucial for reasoning about social exchange (or “reciprocal altruism”) in the environment of our hominid forebears.  The versions of the selection task we’re good at, these theorists maintain, are just the ones that those mechanisms would have been designed to handle.  But, as we will also see, this explanation is hardly uncontroversial.

 

2. 2. The Conjunction Fallacy

 

            Much of the experimental literature on theoretical reasoning has focused on tasks that concern probabilistic judgment.  Among the best known experiments of this kind are those that involve so-called conjunction problems.  In one quite famous experiment,  Kahneman and Tversky (1982) presented subjects with the following task.

 

Linda is 31 years old, single, outspoken, and very bright.  She majored in philosophy.  As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. 

 

Please rank the following statements by their probability, using 1 for the most probable and 8 for the least probable.

 

(a) Linda is a teacher in elementary school.

(b) Linda works in a bookstore and takes Yoga classes.

(c) Linda is active in the feminist movement.

(d) Linda is a psychiatric social worker.

(e) Linda is a member of the League of Women Voters.

(f) Linda is a bank teller.

(g) Linda is an insurance sales person.

(h) Linda is a bank teller and is active in the feminist movement. 

 

In a group of naive subjects with no background in probability and statistics, 89% judged that statement (h) was more probable than statement (f) despite the obvious fact that one cannot be a feminist bank teller unless one is a bank teller.  When the same question was presented to statistically sophisticated subjects – graduate students in the decision science program of the Stanford Business School – 85% gave the same answer!  Results of this sort, in which subjects judge that a compound event or state of affairs is more probable than one of the components of the compound, have been found repeatedly since Kahneman and Tversky’s pioneering studies, and they are remarkably robust. This pattern of reasoning has been labeled the conjunction fallacy.

 

2. 3. Base Rate Neglect

 

Another well-known cluster of studies concerns the way in which people use base-rate information in making probabilistic judgments. According to the familiar Bayesian account, the probability of a hypothesis on a given body of evidence depends, in part, on the prior probability of the hypothesis.  However, in a series of elegant experiments, Kahneman and Tversky (1973) showed that subjects often seriously undervalue the importance of prior probabilities.  One of these experiments presented half of the subjects with the following “cover story.”

 

A panel of psychologists have interviewed and administered personality tests to 30 engineers and 70 lawyers, all successful in their respective fields.  On the basis of this information, thumbnail descriptions of the 30 engineers and 70 lawyers have been written.  You will find on your forms five descriptions, chosen at random from the 100 available descriptions.  For each description, please indicate your probability that the person described is an engineer, on a scale from 0 to 100.    

 

The other half of the subjects were presented with the same text, except the “base-rates” were reversed.  They were told that the personality tests had been administered to 70 engineers and 30 lawyers.  Some of the descriptions that were provided were designed to be compatible with the subjects’ stereotypes of engineers, though not with their stereotypes of lawyers.  Others were designed to fit the lawyer stereotype, but not the engineer stereotype.  And one was intended to be quite neutral, giving subjects no information at all that would be of use in making their decision.  Here are two examples, the first intended to sound like an engineer, the second intended to sound neutral:

 

Jack is a 45-year-old man.  He is married and has four children.  He is generally conservative, careful and ambitious.  He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles.

 

Dick is a 30-year-old man.  He is married with no children.  A man of high ability and high motivation, he promises to be quite successful in his field.  He is well liked by his colleagues. 

 

As expected, subjects in both groups thought that the probability that Jack is an engineer is quite high.  Moreover, in what seems to be a clear violation of Bayesian principles, the difference in cover stories between the two groups of subjects had almost no effect at all.  The neglect of base-rate information was even more striking in the case of Dick.  That description was constructed to be totally uninformative with regard to Dick’s profession.  Thus, the only useful information that subjects had was the base-rate information provided in the cover story. But that information was entirely ignored.  The median probability estimate in both groups of subjects was 50%.  Kahneman and Tversky‘s subjects were not, however, completely insensitive to base-rate information.  Following the five descriptions on their form, subjects found the following “null” description:

 

Suppose now that you are given no information whatsoever about an individual chosen at random from the sample.

The probability that this man is one of the 30 engineers [or, for the other group of subjects: one of the 70 engineers] in the sample of 100 is ____%.

 

In this case subjects relied entirely on the base-rate; the median estimate was 30% for the first group of subjects and 70% for the second.  In their discussion of these experiments, Nisbett and Ross offer this interpretation.

 

The implication of this contrast between the “no information” and “totally nondiagnostic information” conditions seems clear.  When no specific evidence about the target case is provided, prior probabilities are utilized appropriately; when worthless specific evidence is given, prior probabilities may be largely ignored, and people respond as if there were no basis for assuming differences in relative likelihoods.  People’s grasp of the relevance of base-rate information must be very weak if they could be distracted from using it by exposure to useless target case information. (Nisbett & Ross, 1980, pp. 145-6)

 

            Before leaving the topic of base-rate neglect, we want to offer one further example illustrating the way in which the phenomenon might well have serious practical consequences.  Here is a problem that Casscells et. al. (1978) presented to a group of faculty, staff and fourth-year students and Harvard Medical School.

 

If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person’s symptoms or signs?  ____%

 

Under the most plausible interpretation of the problem, the correct Bayesian answer is 2%.  But only eighteen percent of the Harvard audience gave an answer close to 2%.  Forty-five percent of this distinguished group completely ignored the base-rate information and said that the answer was 95%. 

 

2. 4. Overconfidence

 

            One of the most extensively investigated and most worrisome cluster of phenomena explored by psychologists interested in reasoning and judgment involves the degree of confidence that people have in their responses to factual questions – questions like:

 

In each of the following pairs, which city has more inhabitants?

 

(a) Las Vegas                           (b) Miami

(a) Sydney                                (b) Melbourne

(a) Hyderabad                          (b) Islamabad

(a) Bonn                                   (b) Heidelberg

 

In each of the following pairs, which historical event happened first?

 

(a) Signing of the Magna Carta  (b) Birth of Mohammed

(a) Death of Napoleon              (b) Louisiana Purchase

(a)  Lincoln’s assassination        (b) Birth of Queen Victoria

 

After each answer subjects are also asked:

 

How confident are you that your answer is correct?

50%   60%   70%   80%   90%   100%

 

In an experiment using relatively hard questions it is typical to find that for the cases in which subjects say they are 100% confident, only about 80% of their answers are correct; for cases in which they say that they are 90% confident, only about 70% of their answers are correct; and for cases in which they say that they are 80% confident, only about 60% of their answers are correct.  This tendency toward overconfidence seems to be very robust.  Warning subjects that people are often overconfident has no significant effect, nor does offering them money (or bottles of French champagne) as a reward for accuracy.  Moreover, the phenomenon has been demonstrated in a wide variety of subject populations including undergraduates, graduate students, physicians and even CIA analysts.  (For a survey of the literature see Lichtenstein, Fischoff & Phillips, 1982.)

 

2. 5. Anchoring

 

            In their classic paper, “Judgment under uncertainty,” Tversky and Kahneman (1974) showed that quantitative reasoning processes – most notably the production of estimates – can be strongly influenced by the values that are taken as a starting point. They called this phenomenon anchoring. In one experiment, subjects were asked to estimate quickly the products of numerical expressions. One group of subjects was given five seconds to estimate the product of

 

 8´7´6´5´4´3´2´1

 

while a second group was given the same amount of time to estimate the product of

 

1´2´3´4´5´6´7´8.

 

Under these time constraints, most of the subjects can only do some steps of the computation and then have to extrapolate or adjust. Tversky and Kahneman predicted that because the adjustments are usually insufficient, the procedure should lead to underestimation. They also predicted that because the result of the first step of the descending sequence is higher than the ascending one, subjects would produce higher estimates in the first case than in the second. Both predictions were confirmed. The median estimate for the descending sequence was 2250 while for the ascending one was only 512. Moreover, both groups systematically underestimated the value of the numerical expressions presented to them since the correct answer is 40,320.

           

It’s hard to see how the above experiment can provide grounds for serious concern about human rationality since it results from of imposing serious constraints on the time that people are given to perform the task. Nevertheless, other examples of anchoring are genuinely bizarre and disquieting. In one experiment, for example, Tversky and Kahneman asked subjects to estimate the percentage of African countries in the United Nations. But before making these estimates, subjects were first shown an arbitrary number that was determined by spinning a ‘wheel of fortune’ in their presence. Some, for instance, were shown the number 65 while others the number 10.  They were then asked to say if the correct estimate was higher or lower than the number indicated on the wheel and to produce a real estimate of the percentage of African members in the UN. The median estimates were 45% for subjects whose “anchoring” number was 65 and 25% for subjects whose number was 10. The rather disturbing implication of this experiment is that people’s estimates can be affected quite substantially by a numerical “anchoring” value even when they must be fully aware that the anchoring number has been generated by a random process which they surely know to be entirely irrelevant to the task at hand![2]     

 

3.  The Pessimistic Interpretation: Shortcomings in Reasoning Competence

 

            The experimental results we’ve been recounting and the many related results reported in the extensive literature in this area are, we think, intrinsically unsettling. They are even more alarming if, as has occasionally been demonstrated, the same patterns of reasoning and judgment are to be found outside the laboratory.  None of us want our illnesses to be diagnosed by physicians who ignore well-confirmed information about base-rates.  Nor do we want public officials to be advised by CIA analysts who are systematically overconfident. The experimental results themselves do not entail any conclusions about the nature or the normative status of the cognitive mechanisms that underlie people’s reasoning and judgment.  But a number of writers have urged that these results lend considerable support to a pessimistic hypothesis about those mechanisms, a hypothesis which may be even more disturbing than the results themselves. On this pessimistic view, the examples of problematic reasoning, judgments and decisions that we’ve sketched are not mere performance errors.  Rather, they indicate that most people’s underlying reasoning competence is irrational or at least normatively problematic. In order to explain this view more clearly, we first need to explain the distinction between competence and performance on which it is based and say something about the normative standards of reasoning that are being assumed by advocates of this pessimistic interpretation of the experimental results.

 

3.1. Competence and Performance

 

            The competence/performance distinction, as we will characterize it, was first introduced into cognitive science by Chomsky, who used it in his account of the explanatory strategy of theories in linguistics. (Chomsky, 1965, Ch. 1; 1975; 1980)  In testing linguistic theories, an important source of data are the “intuitions” or unreflective judgments that speakers of a language make about the grammaticality of sentences, and about various linguistic properties and relations. To explain these intuitions, and also to explain how speakers go about producing and understanding sentences of their language in ordinary discourse, Chomsky and his followers proposed that a speaker of a language has an internally represented grammar of that language – an integrated set of generative rules and principles that entail an infinite number of claims about the language.  For each of the infinite number of sentences in the speaker’s language, the internally represented grammar entails that it is grammatical; for each ambiguous sentence in the speaker’s language, the grammar entails that it is ambiguous, etc.  When speakers make the judgments that we call linguistic intuitions, the information in the internally represented grammar is typically accessed and relied upon, though neither the process nor the internally represented grammar are accessible to consciousness.  Since the internally represented grammar plays a central role in the production of linguistic intuitions, those intuitions can serve as an important source of data for linguists trying to specify what the rules and principles of the internally represented grammar are. 

 

            A speaker’s intuitions are not, however, an infallible source of information about the grammar of the speaker’s language, because the grammar cannot produce linguistic intuitions by itself.  The production of intuitions is a complex process in which the internally represented grammar must interact with a variety of other cognitive mechanisms including those subserving perception, motivation, attention, short term memory and perhaps a host of others.  In certain circumstances, the activity of any one of these mechanisms may result in a person offering a judgment about a sentence which does not accord with what the grammar actually entails about that sentence.  This might happen when we are drunk or tired or in the grip of rage. But even under ordinary conditions when our cognitive mechanisms are not impaired in this way, we may still fail to recognize a sentence as grammatical due to limitations on attention or memory. For example, there is considerable evidence indicating that the short-term memory mechanism has difficulty handling center embedded structures.  Thus it may well be the case that our internally represented grammars entail that the following sentence is grammatical:

 

            What what what he wanted cost would buy in Germany was amazing.

 

even though our intuitions suggest, indeed shout, that it is not.

 

            Now in the jargon that Chomsky introduced, the rules and principles of a speaker’s internalized grammar constitutes the speaker’s linguistic competence. By contrast, the judgments a speaker makes about sentences, along with the sentences the speaker actually produces, are part of the speaker’s linguistic performance.  Moreover, as we have just seen, some of the sentences a speaker produces and some of the judgments the speaker makes about sentences, will not accurately reflect the speaker’s linguistic competence.  In these cases, the speaker is making a performance error.

 

            There are some obvious analogies between the phenomena studied in linguistics and those studied by philosophers and cognitive scientists interested in reasoning.  In both cases there is spontaneous and largely unconscious processing of an open-ended class of inputs; people are able to understand endlessly many sentences, and to draw inferences from endlessly many premises.  Also, in both cases, people are able to make spontaneous intuitive judgments about an effectively infinite class of cases – judgments about grammaticality, ambiguity, etc. in the case of linguistics, and judgments about validity, probability, etc. in the case of reasoning.  Given these analogies, it is plausible to explore the idea that the mechanism underlying our ability to reason is similar to the mechanism underlying our capacity to process language.  And if Chomsky is right about language, then the analogous hypothesis about reasoning would claim that people have an internally represented, integrated set of rules and principles of reasoning – a “psycho-logic” as it has been called – which is usually accessed and relied upon when people draw inferences or make judgments about them.  As in the case of language, we would expect that neither the processes involved nor the principles of the internally represented psycho-logic are readily accessible to consciousness.  We should also expect that people’s inferences, judgments and decisions would not be an infallible guide to what the underlying psycho-logic actually entails about the validity or plausibility of a given inference.  For here, as in the case of language, the internally represented rules and principles must interact with lots of other cognitive mechanisms – including attention, motivation, short term memory and many others.  The activity of these mechanisms can give rise to performance errors – inferences, judgments or decisions that do not reflect the psycho-logic which constitutes a person’s reasoning competence. 

 

            There is, however, an important difference between reasoning and language, even if we assume that a Chomsky-style account of the underlying mechanism is correct in both cases.  For in the case of language, it makes no clear sense to offer a normative assessment of a normal person’s competence.  The rules and principles that constitute a French speaker’s linguistic competence are significantly different from the rules and principles that underlie language processing in a Chinese speaker.  But if we were asked which system was better or which one was correct, we would have no idea what was being asked.  Thus, on the language side of the analogy, there are performance errors, but there is no such thing as a competence error or a normatively problematic competence.  If two otherwise normal people have different linguistic competences, then they simply speak different languages or different dialects.  On the reasoning side of the analogy, however, things look very different.  It is not clear whether there are significant individual and group differences in the rules and principles underlying people’s performance on reasoning tasks, as there so clearly are in the rules and principles underlying people’s linguistic performance.[3]  But if there are significant interpersonal differences in reasoning competence, it surely appears to make sense to ask whether one system of rules and principles is better than another.[4]

 

3.2. The Standard Picture

 

Clearly, the claim that one system of rules is superior to another assumes – if only tacitly – some standard or metric against which to measure the relative merits of reasoning systems. And this raises the normative question of what standards we ought to adopt when evaluating human reasoning. Though advocates of the pessimistic interpretation rarely offer an explicit and general normative theory of rationality, perhaps the most plausible reading of their work is that they are assuming some version of what Edward Stein calls the Standard Picture:

 

According to this picture, to be rational is to reason in accordance with principles of reasoning that are based on rules of logic, probability theory and so forth.  If the standard picture of reasoning is right, principles of reasoning that are based on such rules are normative principles of reasoning, namely they are the principles we ought to reason in accordance with.   (Stein 1996, p. 4)

 

Thus the Standard Picture maintains that the appropriate criteria against which to evaluate human reasoning are rules derived from formal theories such as classical logic, probability theory and decision theory.[5]  So, for example, one might derive something like the following principle of reasoning from the conjunction rule of probability theory:

 

Conjunction Principle: One ought not to assign a lower degree of probability to the occurrence of event A than one does to the occurrence of A and some (distinct) event B (Stein 1996, 6).

 

If we assume this principle is correct, there is a clear answer to the question of why the patterns of inference discussed in section 2.2 (on the “conjunction fallacy”) are normatively problematic: they violate the conjunction principle. More generally, given principles of this kind, one can evaluate the specific judgments and decisions issued by human subjects and the psycho-logics that produce them. To the extent that a person’s judgments and decisions accord with the principles of the Standard Picture, they are rational and to the extent that they violate such principles, the judgments and decisions fail to be rational.  Similarly, to the extent that a reasoning competence produces judgments and decisions that accord with the principles of the Standard Picture, the competence is rational and to the extent that it fails to do so, it is not rational.

 

Sometimes, of course, it is far from clear how these formal theories are to be applied – a problem that we will return to in section 7. Moreover, as we’ll see in section 8, the Standard Picture is not without its critics. Nonetheless, it does have some notable virtues. First, it seems to provide reasonably precise standards against which to evaluate human reasoning. Second, it fits very neatly with the intuitively plausible idea that logic and probability theory bear an intimate relationship to issues about how we ought to reason. Finally, it captures an intuition about rationality that has long held a prominent position in philosophical discussions, namely that the norms of reason are “universal principles” – principles that apply to all actual and possible cognizers irrespective of who they are or where they are located in space and time. Since the principles of the Standard Picture are derived from formal/mathematical theories –theories that, if correct, are necessarily correct –- they appear to be precisely the sort of principles that one needs to adopt in order to capture the intuition that norms of reasoning are universal principles.

 

3.3 The Pessimistic Interpretation

 

            We are now, finally, in a position to explain the pessimistic hypothesis that some authors have urged to account for the sorts of experimental results sketched in Section 2.  According to this hypothesis, the errors that subjects make in these experiments are very different from the sorts of reasoning errors that people make when their memory is overextended or when their attention wanders.  They are also different from the errors people make when they are tired, drunk or emotionally upset. These latter cases are all examples of performance errors – errors that people make when they infer in ways that are not sanctioned by their own psycho-logic.  But, according to the pessimistic interpretation, the sorts of errors described in Section 2 are competence errors.  In these cases people are reasoning, judging and making decisions in ways that accord with their psycho-logic. The subjects in these experiments do not use the right rules – those sanctioned by the Standard Picture – because they do not have access to them; they are not part of the subjects’ internally represented reasoning competence.  What they have instead is a collection of simpler rules or “heuristics“ that may often get the right answer, though it is also the case that often they do not. So, according to this pessimistic hypothesis, the subjects make mistakes because their psycho-logic is normatively defective; their internalized rules of reasoning are less than fully rational.  It is not at all clear that Kahneman and Tversky would endorse this interpretation of the experimental results, though a number of other leading researchers clearly do.[6] According to Slovic, Fischhoff and Lichtenstein, for example,  “It appears that people lack the correct programs for many important judgmental tasks….  We have not had the opportunity to evolve an intellect capable of dealing conceptually with uncertainty.” (1976, p. 174)

 

To sum up:  According to the pessimistic interpretation, what experimental results of the sort discussed in section 2 suggest is that our reasoning is subject to systematic competence errors. But is this view warranted? Is it really the most plausible response to what we've been calling the evaluative project, or is some  more optimistic view in order? In recent years, this has become one of the most hotly debated questions in cognitive science, and numerous challenges have been developed in order to show that the pessimistic interpretation is unwarranted. In the remaining sections of this paper we consider and evaluate some of the more prominent and plausible of these challenges.

 

4. The Challenge From Evolutionary Psychology

 

In recent years Gerd Gigerenzer, Leda Cosmides, John Tooby and other leading evolutionary psychologists have been among the most vocal critics of the pessimistic account of human reasoning, arguing that the evidence for human irrationality is far less compelling than advocates of the heuristics and biases tradition suggest. In this section, we will attempt to provide an overview of this recent and intriguing challenge. We start in section 4.1 by outlining the central theses of evolutionary psychology. Then in 4.2 and 4.3 we discuss how these core ideas have been applied to the study of human reasoning. Specifically, we’ll discuss two psychological hypotheses – the cheater detection hypothesis and the frequentist hypothesis – and evidence that’s been invoked in support of them. Though they are ostensibly descriptive psychological claims, a number of prominent evolutionary psychologists have suggested that these hypotheses and the experimental data that has been adduced in support of them provide us with grounds for rejecting the pessimistic interpretation of human reasoning. In section 5, we consider the plausibility of this claim.

 

4.1 The Central Tenets of Evolutionary Psychology

    

Though the interdisciplinary field of evolutionary psychology is too new to have developed any precise and widely agreed upon body of doctrine, there are two theses that are clearly central. First, evolutionary psychologists endorse an account of the structure of the human mind which is sometimes called the massive modularity hypothesis (Sperber, 1994; Samuels 1998). Second, evolutionary psychologists commit themselves to a methodological claim about the manner in which research in psychology ought to proceed. Specifically, they endorse the claim that adaptationist considerations ought to play a pivotal role in the formation of psychological hypotheses.

 

4.1.1 The Massive Modularity Hypothesis

 

Roughly stated, the massive modularity hypothesis (MMH) is the claim that the human mind is largely or perhaps even entirely composed of highly specialized cognitive mechanisms or modules. Though there are different ways in which this rough claim can be spelled out, the version of MMH that evolutionary psychologists defend is heavily informed by the following three assumptions:

 

Computationalism. The human mind is an information processing device that can be described in computational terms – “a computer made out of organic compounds rather than silicon chips” (Barkow et. al, 1992, p.7). In expressing this view, evolutionary psychologists clearly see themselves as adopting the computationalism that is prevalent in much of cognitive science

 

Nativism. Contrary to what has surely been the dominant view in psychology for most of the Twentieth Century, evolutionary psychologists maintain that much of the structure of the human mind is innate. Evolutionary psychologists thus reject the familiar empiricist proposal that the innate structure of the human mind consists of little more than a general-purpose learning mechanism. Instead they embrace the nativism associated with Chomsky and his followers (Pinker, 1997).

 

Adaptationism. Evolutionary psychologists invariably claim that our cognitive architecture is largely the product of natural selection. On this view, our minds are composed of adaptations that were “invented by natural selection during the species’ evolutionary history to produce adaptive ends in the species’ natural environment” (Tooby and Cosmides, 1995, p. xiii). Our minds, evolutionary psychologists maintain, are designed by natural selection in order to solve adaptive problems: “evolutionary recurrent problem[s] whose solution promoted reproduction, however long or indirect the chain by which it did so” (Cosmides and Tooby, 1994, p. 87).

 

Evolutionary psychologists conceive of modules as a type of computational mechanism – viz. computational devices that are domain-specific as opposed to domain-general.[7] Moreover, in keeping with their nativism and adaptationism, evolutionary psychologists also typically assume that modules are innate and that they are adaptations produced by natural selection. In what follows we will call cognitive mechanisms that posses these features Darwinian modules.[8] The version of MMH endorsed by evolutionary psychologists thus amounts to the claim that:

 

MMH. The human mind is largely or perhaps even entirely composed of a large number of Darwinian modules – innate, computational mechanisms that are domain-specific adaptations produced by natural selection.

 

This thesis is a far more radical than earlier modular accounts of cognition, such as the one endorsed by Jerry Fodor (Fodor, 1983). According to Fodor, the modular structure of the human mind is restricted to input systems (those responsible for perception and language processing) and output systems (those responsible for producing actions).  Though evolutionary psychologists accept the Fodorian thesis that such peripheral systems are modular in character, they maintain, pace Fodor, that many or perhaps even all so-called central capacities, such as reasoning, belief fixation and planning, can also “be divided into domain-specific modules” (Jackendoff, 1992, p.70). So, for example, it has been suggested by evolutionary psychologists that there are modular mechanisms for such central processes as ‘theory of mind’ inference (Leslie, 1994; Baron-Cohen, 1995) social reasoning (Cosmides and Tooby, 1992), biological categorization (Pinker, 1994) and probabilistic inference (Gigerenzer, 1994 and 1996).  On this view, then, “our cognitive architecture resembles a confederation of hundreds or thousands of functionally dedicated computers (often called modules) designed to solve adaptive problems endemic to our hunter-gatherer ancestors” (Tooby and Cosmides, 1995, p. xiv).

 

4.1.2 The Research Program of Evolutionary Psychology

 

            A central goal of evolutionary psychology is to construct and test hypotheses about the Darwinian modules which, MMH maintains, make up much of the human mind.  In pursuit of this goal, research may proceed in two quite different stages.  The first, which we’ll call evolutionary analysis, has as its goal the generation of plausible hypotheses about Darwinian modules.  An evolutionary analysis tries to determine as much as possible about the recurrent, information processing problems that our forebears would have confronted in what is often called the environment of evolutionary adaptation or the EEA – the environment in which our ancestors evolved.  The focus, of course, is on adaptive problems whose successful solution would have directly or indirectly contributed to reproductive success. In some cases these adaptive problems were posed by physical features of the EEA, in other cases they were posed by biological features, and in still other cases they were posed by the social environment in which our forebears were embedded.  Since so many factors are involved in determining the sorts of recurrent information processing problems that our ancestors confronted in the EEA, this sort of evolutionary analysis is a highly interdisciplinary exercise.  Clues can be found in many different sorts of investigations, from the study of the Pleistocene climate to the study of the social organization in the few remaining hunter-gatherer cultures. Once a recurrent adaptive problem has been characterized, the theorist may hypothesize that there is a module which would have done a good job at solving that problem in the EEA.

 

            An important part of the effort to characterize these recurrent information processing problems is the specification of the sorts constraints that a mechanism solving the problem could take for granted.  If, for example, the important data needed to solve the problem was almost always presented in a specific format, then the mechanism need not be able to handle data presented in other ways.  It could “assume” that the data would be presented in the typical format.  Similarly, if it was important to be able to detect people or objects with a certain property that is not readily observable, and if, in the EEA, that property was highly correlated with some other property that is easier to detect, the system could simply assume that people or objects with the detectable property also had the one that was hard to observe.

 

            It is important to keep in mind that evolutionary analyses can only be used as a way of suggesting plausible hypotheses about mental modules. By themselves evolutionary analyses provide no assurance that these hypotheses are true. The fact that it would have enhanced our ancestors’ fitness if they had developed a module that solved a certain problem is no guarantee that they did develop such a module, since there are many reasons why natural selection and the other processes that drive evolution may fail to produce a mechanism that would enhance fitness (Stich, 1990, Ch. 3). 

 

Once an evolutionary analysis has succeeded in suggesting a plausible hypothesis, the next stage in the evolutionary psychology research strategy is to test the hypothesis by looking for evidence that contemporary humans actually have a module with the properties in question.  Here, as earlier, the project is highly interdisciplinary.  Evidence can come from experimental studies of reasoning in normal humans (Cosmides, 1989; Cosmides and Tooby, 1992, 1996;  Gigerenzer, 1991a;  Gigerenzer and Hug, 1992), from developmental studies focused on the emergence of cognitive skills (Carey and Spelke, 1994;  Leslie, 1994;  Gelman and Brenneman, 1994), or from the study of cognitive deficits in various abnormal populations (Baron-Cohen, 1995).  Important evidence can also be gleaned from studies in cognitive anthropology (Barkow, 1992;  Hutchins, 1980), history, and even from such surprising areas as the comparative study of legal traditions (Wilson and Daly, 1992).  When evidence from a number of these areas points in the same direction, an increasingly strong case can be made for the existence of a module suggested by evolutionary analysis.

 

In 4.2 and 4.3 we consider two applications of this two-stage research strategy to the study of human reasoning.  Though the interpretation of the studies we will sketch is the subject of considerable controversy, a number of authors have suggested that they show there is something deeply mistaken about the pessimistic hypothesis set out in Section 3.  That hypothesis claims that people lack normatively appropriate rules or principles for reasoning about problems like those set out in Section 2.  But when we look at variations on these problems that may make them closer to the sort of recurrent problems our forebears would have confronted in the EEA, performance improves dramatically.  And this, it is argued, is evidence for the existence of at least two normatively sophisticated Darwinian modules, one designed to deal with probabilistic reasoning when information is presented in a frequency format, the other designed to deal with reasoning about cheating in social exchange settings. 

 

4.2  The Frequentist Hypothesis

 

            The experiments reviewed in Sections 2.2 and 2.3 indicate that in many cases people are quite bad at reasoning about probabilities, and the pessimistic interpretation of these results claims that people use simple (“fast and dirty”) heuristics in dealing with these problems because their cognitive systems have no access to more appropriate principles for reasoning about probabilities.  But, in a series of recent and very provocative papers, Gigerenzer (1994, Gigerenzer & Hoffrage, 1995) and Cosmides and Tooby (1996) argue that from an evolutionary point of view this would be a surprising and paradoxical result. “As long as chance has been loose in the world,” Cosmides and Tooby note, “animals have had to make judgments under uncertainty.” (Cosmides and Tooby, 1996, p. 14; for the remainder of this section, all quotes are from Cosmides and Tooby, 1996, unless otherwise indicated.)  Thus making judgments when confronted with probabilistic information posed adaptive problems for all sorts of organisms, including our hominid ancestors, and “if an adaptive problem has endured for a long enough period and is important enough, then mechanisms of considerable complexity can evolve to solve it” (p. 14). But as we saw in the previous section, “one should expect a mesh between the design of our cognitive mechanisms, the structure of the adaptive problems they evolved to solve, and the typical environments that they were designed to operate in – that is, the ones that they evolved in” (p. 14). So in launching their evolutionary analysis Cosmides and Tooby’s first step is to ask: “what kinds of probabilistic information would have been available to any inductive reasoning mechanisms that we might have evolved?” (p. 15) 

 

            In the modern world we are confronted with statistical information presented in many ways: weather forecasts tell us the probability of rain tomorrow, sports pages list batting averages, and widely publicized studies tell us how much the risk of colon cancer is reduced in people over 50 if they have a diet high in fiber. But information about the probability of single events (like rain tomorrow) and information expressed in percentage terms would have been rare or unavailable in the EEA. 

 

What was available in the environment in which we evolved was the encountered frequencies of actual events – for example, that we were successful 5 times out of the last 20 times we hunted in the north canyon.  Our hominid ancestors were immersed in a rich flow of observable frequencies that could be used to improve decision-making, given procedures that could take advantage of them.  So if we have adaptations for inductive reasoning, they should take frequency information as input. (pp. 15-16)

 

            After a cognitive system has registered information about relative frequencies it might convert this information to some other format.  If, for example, the system has noted that 5 out of the last 20 north canyon hunts were successful, it might infer and store the conclusion that there is a .25 chance that a north canyon hunt will be successful.  However, Cosmides and Tooby argue, “there are advantages to storing and operating on frequentist representations because they preserve important information that would be lost by conversion to single-event probability.  For example, ... the number of events that the judgment was based on would be lost in conversion.  When the n disappears, the index of reliability of the information disappears as well.” (p. 16)   

 

These and other considerations about the environment in which our cognitive systems evolved lead Cosmides and Tooby to hypothesize that our ancestors “evolved mechanisms that took frequencies as input, maintained such information as frequentist representations, and used these frequentist representations as a database for effective inductive reasoning.”[9]  Since evolutionary psychologists expect the mind to contain many specialized modules, Cosmides and Tooby are prepared to find other modules involved in inductive reasoning that work in other ways.

 

We are not hypothesizing that every cognitive mechanism involving statistical induction necessarily operates on frequentist principles, only that at least one of them does, and that this makes frequentist principles an important feature of  how humans intuitively engage the statistical dimension of the world. (p. 17)

 

But, while their evolutionary analysis does not preclude the existence of inductive mechanisms that are not focused on frequencies, it does suggest that when a mechanism that operates on frequentist principles is engaged, it will do a good job, and thus the probabilistic inferences it makes will generally be normatively appropriate ones.  This, of course, is in stark contrast to the bleak implications hypothesis which claims that people simply do not have access to normatively appropriate strategies in this area.

 

            From their hypothesis, Cosmides and Tooby derive a number of predictions:

 

(1)  Inductive reasoning performance will differ depending on whether subjects are asked to judge a frequency or the probability of a single event.

 

(2)  Performance on frequentist versions of problems will be superior to non-frequentist versions.

 

(3)  The more subjects can be mobilized to form a frequentist representation, the better performance will be.

 

(4)  ... Performance on frequentist problems will satisfy some of the constraints that a calculus of probability specifies, such as Bayes’ rule.  This would occur because some inductive reasoning mechanisms in our cognitive architecture embody aspects of a calculus of probability. (p. 17)

 

            To test these predictions Cosmides and Tooby ran an array of experiments designed around the medical diagnosis problem which Casscells et. al. used to demonstrate that even very sophisticated subjects ignore information about base rates.  In their first experiment Cosmides and Tooby replicated the results of Casscells et. al. using exactly the same wording that we reported in section 2.3.  Of the 25 Stanford University undergraduates who were subjects in this experiment, only 3 (= 12%) gave the normatively appropriate bayesian answer of “2%”, while 14 subjects (= 56%) answered “95%”.[10]

 

            In another experiment, Cosmides and Tooby gave 50 Stanford students a similar problem in which relative frequencies rather than percentages and single event probabilities were emphasized.  The “frequentist” version of the problem read as follows:

 

            1 out of every 1000 Americans has disease X.  A test has been developed to detect when a person has disease X.  Every time the test is given to a person who has the disease, the test comes out positive.  But sometimes the test also comes out positive when it is given to a person who is completely healthy.  Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.

            Imagine that we have assembled a random sample of 1000 Americans.  They were selected by lottery.  Those who conducted the lottery had no information about the health status of any of these people. 

Given the information above:

on average,

How many people who test positive for the disease will actually have the disease?  _____ out of _____.[11]  

 

On this problem the results were dramatically different.  38 of the 50 subjects (= 76%) gave the correct bayesian answer.[12]