To appear in Handbook of Epistemology ed. by
Matti Sintonen, et al. (Dordrecht: Kluwer).
Final Draft 11/10/99
Archived at HomePage for the Rutgers University Research Group on Evolution and Higher Cognition.
Richard Samuels
Department of Philosophy
University of Pennsylvania
Philadelphia, PA 1904-6304
rsamuels@phil.upenn.edu
Stephen Stich
Department of Philosophy and Center for Cognitive Science
Rutgers University
New Brunswick, NJ 08901
stich@ruccs.rutgers.edu
and
Luc Faucher
Department of Philosophy
Rutgers University
New Brunswick, NJ 08901
lucfaucher@hotmail.com
1.
Introduction: Three Projects in the Study of Reason
Over the past few decades, reasoning
and rationality have been the focus of enormous interdisciplinary attention,
attracting interest from philosophers, psychologists, economists, statisticians
and anthropologists, among others. The widespread interest in the topic
reflects the central status of reasoning in human affairs. But it also suggests that there are many
different though related projects and tasks which need to be addressed if we
are to attain a comprehensive understanding of reasoning.
Three projects that we think are particularly worthy
of mention are what we call the descriptive, normative and evaluative
projects. The descriptive project – which is typically pursued by
psychologists, though anthropologists and computer scientists have also made
important contributions – aims to characterize how people actually
go about the business of reasoning and to discover the psychological mechanisms
and processes that underlie the patterns of reasoning that are observed. By
contrast, the normative project is concerned not so much with how people
actually reason as with how they should reason. The goal is to discover rules or principles
that specify what it is to reason correctly or rationally – to
specify standards against which the quality of human reasoning can be measured.
Finally, the evaluative project aims to determine the extent to which
human reasoning accords with appropriate normative standards. Given some
criterion, often only a tacit one, of what counts as good reasoning, those who
pursue the evaluative project aim to determine the extent to which human
reasoning meets the assumed standard.
In the course of this paper we touch on each of
these projects and consider some of the relationships among them. Our point of departure, however, is an array
of very unsettling experimental results which, many have believed, suggest a
grim outcome to the evaluative project and support a deeply pessimistic view of
human rationality. The results that have led to this evaluation started to
emerge in the early 1970s when Amos Tversky, Daniel Kahneman and a number of
other psychologists began reporting findings suggesting that under quite
ordinary circumstances, people reason and make decisions in ways that
systematically violate familiar canons of rationality on a broad array of
problems. Those first surprising studies sparked the growth of an enormously
influential research program – often called the heuristics and biases
program – whose impact has been felt in a wide range of disciplines including
psychology, economics, political theory and medicine. In section 2, we provide
a brief overview of some of the more disquieting experimental findings in this
area.
What precisely do these experimental
results show? Though there is considerable debate over this question, one
widely discussed interpretation that is often associated with the heuristics
and biases tradition claims that they have “bleak implications” for the
rationality of the man and woman in the street. What the studies indicate,
according to this interpretation, is that ordinary people lack the underlying
rational competence to handle a wide array of reasoning tasks, and thus
that they must exploit a collection of simple heuristics which make them prone to seriously
counter-normative patterns of reasoning or biases. In Section 3, we set
out this pessimistic interpretation of the experimental results and explain the
technical notion of competence that it invokes. We also briefly sketch the
normative standard that advocates of the pessimistic interpretation typically
employ when evaluating human reasoning.
This normative stance, sometimes called the Standard Picture,
maintains that the appropriate norms for reasoning are derived from formal
theories such as logic, probability theory and decision theory (Stein, 1996).
Though the pessimistic interpretation has received
considerable support, it is not without its critics. Indeed much of the most
exciting recent work on reasoning has been motivated, in part, by a desire to
challenge the pessimistic account of human rationality. In the latter parts of
this paper, our major objective will be the consider and evaluate some of the
most recent and intriguing of these challenges. The first comes from the newly
emerging field of evolutionary psychology. In section 4 we sketch the
conception of the mind and its history advocated by evolutionary psychologists,
and in section 5 we evaluate the plausibility of their claim that the
evaluative project is likely to have a more positive outcome if these
evolutionary psychological theories of cognition are correct. In section 6 we
turn our attention to a rather different kind of challenge to the pessimistic
interpretation – a cluster of objections that focus on the role of pragmatic,
linguistic factors in experimental contexts. According to these objections,
much of the data for putative reasoning errors is problematic because
insufficient attention has been paid to the way in which people interpret the
experimental tasks they are asked to perform. In section 7 we focus on a range of problems surrounding the
interpretation and application of the principles of the Standard
Picture of rationality. These objections maintain that the paired projects of
deriving normative principles from formal systems, such as logic and
probability theory, and determining when reasoners have violated these
principles are far harder than advocates of the pessimistic interpretation are
inclined to admit. Indeed, one might think that the difficulties that these
tasks pose suggest that we ought to reject the Standard Picture as a normative
benchmark against which to evaluate the quality of human reasoning. Finally, in
section 8 we further scrutinize the normative assumptions made by advocates of
the pessimistic interpretation and consider a number of arguments which appear
to show that we ought to reject the Standard Picture in favor of some
alternative conception of normative standards.
2.
Some Disquieting Evidence about How Humans Reason
Our first order of business is to describe some of the
experimental results that have been taken to support the claim that human
beings frequently fail to satisfy appropriate normative standards of reasoning.
The literature on these errors and biases has grown to epic proportions over
the last few decades and we won’t attempt to provide a comprehensive review.[1]
Instead, we focus on what we think are some of the most intriguing and
disturbing studies.
2.1. The Selection Task
In 1966, Peter Wason published a highly
influential study of a cluster of reasoning problems that became known as the selection
task. As a recent textbook observes, this task has become “the most
intensively researched single problem in the history of the psychology of
reasoning.” (Evans, Newstead & Byrne, 1993, p. 99) Figure 1 illustrates a
typical example of a selection task problem.
Figure 1
What
Wason and numerous other investigators have found is that subjects typically
perform very poorly on questions like this.
Most subjects respond correctly that the E card must be turned over, but
many also judge that the 5 card must be turned over, despite the fact that the
5 card could not falsify the claim no matter what is on the other side. Also, a majority of subjects judge that the
4 card need not be turned over, though without turning it over there is
no way of knowing whether it has a vowel on the other side. And, of course, if it does have a vowel on
the other side then the claim is not true.
It is not the case that subjects do poorly on all selection task
problems, however. A wide range of
variations on the basic pattern have been tried, and on some versions of the
problem a much larger percentage of subjects answer correctly. These results form a bewildering pattern,
since there is no obvious feature or cluster of features that separates versions
on which subjects do well from those on which they do poorly. As we will see in Section 4, some
evolutionary psychologists have argued that these results can be explained if we focus on the
sorts of mental mechanisms that would have been crucial for reasoning about
social exchange (or “reciprocal altruism”) in the environment of our hominid
forebears. The versions of the
selection task we’re good at, these theorists maintain, are just the ones that
those mechanisms would have been designed to handle. But, as we will also see, this explanation is hardly
uncontroversial.
2.
2. The Conjunction Fallacy
Much of the experimental literature
on theoretical reasoning has focused on tasks that concern probabilistic
judgment. Among the best known
experiments of this kind are those that involve so-called conjunction
problems. In one quite famous
experiment, Kahneman and Tversky (1982)
presented subjects with the following task.
Linda is 31 years old,
single, outspoken, and very bright. She
majored in philosophy. As a student,
she was deeply concerned with issues of discrimination and social justice, and
also participated in anti-nuclear demonstrations.
Please rank the following
statements by their probability, using 1 for the most probable and 8 for the
least probable.
(a) Linda is a teacher in
elementary school.
(b) Linda works in a
bookstore and takes Yoga classes.
(c) Linda is active in the
feminist movement.
(d) Linda is a psychiatric
social worker.
(e) Linda is a member of the
League of Women Voters.
(f) Linda is a bank teller.
(g) Linda is an insurance
sales person.
(h) Linda is a bank teller
and is active in the feminist movement.
In
a group of naive subjects with no background in probability and statistics, 89%
judged that statement (h) was more probable than statement (f) despite the
obvious fact that one cannot be a feminist bank teller unless one is a bank
teller. When the same question was
presented to statistically sophisticated subjects – graduate students in the
decision science program of the Stanford Business School – 85% gave the same
answer! Results of this sort, in which
subjects judge that a compound event or state of affairs is more probable than
one of the components of the compound, have been found repeatedly since
Kahneman and Tversky’s pioneering studies, and they are remarkably robust. This
pattern of reasoning has been labeled the conjunction fallacy.
2.
3. Base Rate Neglect
Another well-known cluster of studies concerns the
way in which people use base-rate information in making probabilistic
judgments. According to the familiar Bayesian account, the probability of a hypothesis on a given body of evidence
depends, in part, on the prior probability of the hypothesis. However, in a series of elegant experiments,
Kahneman and Tversky (1973) showed that subjects often seriously undervalue the importance
of prior probabilities. One of these
experiments presented half of the subjects with the following “cover story.”
A panel of psychologists have interviewed and
administered personality tests to 30 engineers and 70 lawyers, all successful
in their respective fields. On the
basis of this information, thumbnail descriptions of the 30 engineers and 70
lawyers have been written. You will
find on your forms five descriptions, chosen at random from the 100 available
descriptions. For each description,
please indicate your probability that the person described is an engineer, on a
scale from 0 to 100.
The
other half of the subjects were presented with the same text, except the
“base-rates” were reversed. They were
told that the personality tests had been administered to 70 engineers and 30
lawyers. Some of the descriptions that
were provided were designed to be compatible with the subjects’ stereotypes of
engineers, though not with their stereotypes of lawyers. Others were designed to fit the lawyer
stereotype, but not the engineer stereotype.
And one was intended to be quite neutral, giving subjects no information
at all that would be of use in making their decision. Here are two examples, the first intended to sound like an
engineer, the second intended to sound neutral:
Jack is a 45-year-old man. He
is married and has four children. He is
generally conservative, careful and ambitious.
He shows no interest in political and social issues and spends most of
his free time on his many hobbies which include home carpentry, sailing, and
mathematical puzzles.
Dick is a 30-year-old man. He is married with no children.
A man of high ability and high motivation, he promises to be quite
successful in his field. He is well
liked by his colleagues.
As
expected, subjects in both groups thought that the probability that Jack is an
engineer is quite high. Moreover, in
what seems to be a clear violation of Bayesian principles, the difference in cover stories between the two groups of
subjects had almost no effect at all.
The neglect of base-rate information was even more striking in the case
of Dick. That description was
constructed to be totally uninformative with regard to Dick’s profession. Thus, the only useful information that
subjects had was the base-rate information provided in the cover story. But
that information was entirely ignored.
The median probability estimate in both groups of subjects was 50%. Kahneman and Tversky‘s subjects were not, however, completely insensitive to base-rate
information. Following the five
descriptions on their form, subjects found the following “null” description:
Suppose now that you are given no information
whatsoever about an individual chosen at random from the sample.
The probability that this man is one of the 30
engineers [or, for the other group of subjects: one of the 70 engineers] in the
sample of 100 is ____%.
In
this case subjects relied entirely on the base-rate; the median estimate was
30% for the first group of subjects and 70% for the second. In their discussion of these experiments,
Nisbett and Ross offer this interpretation.
The implication of this contrast between the “no
information” and “totally nondiagnostic information” conditions seems
clear. When no specific evidence
about the target case is provided, prior probabilities are utilized
appropriately; when worthless specific evidence is given, prior
probabilities may be largely ignored, and people respond as if there were no
basis for assuming differences in relative likelihoods. People’s grasp of the relevance of base-rate
information must be very weak if they could be distracted from using it by
exposure to useless target case information. (Nisbett & Ross, 1980, pp. 145-6)
Before leaving the topic of
base-rate neglect, we want to offer one further example illustrating the way in which
the phenomenon might well have serious practical consequences. Here is a problem that Casscells et. al.
(1978) presented to a group of faculty, staff and fourth-year students and
Harvard Medical School.
If a test to detect a disease whose prevalence is 1/1000 has a false
positive rate of 5%, what is the chance that a person found to have a positive
result actually has the disease, assuming that you know nothing about the
person’s symptoms or signs? ____%
Under
the most plausible interpretation of the problem, the correct Bayesian answer is 2%. But only
eighteen percent of the Harvard audience gave an answer close to 2%. Forty-five percent of this distinguished
group completely ignored the base-rate information and said that the answer was
95%.
2.
4. Overconfidence
One of the most extensively
investigated and most worrisome cluster of phenomena explored by psychologists
interested in reasoning and judgment involves the degree of confidence that
people have in their responses to factual questions – questions like:
In each of the following pairs, which city has more
inhabitants?
(a) Las Vegas (b)
Miami
(a) Sydney (b)
Melbourne
(a) Hyderabad (b)
Islamabad
(a) Bonn (b)
Heidelberg
In each of the following pairs, which historical
event happened first?
(a) Signing of the Magna Carta (b) Birth of Mohammed
(a) Death of Napoleon (b) Louisiana Purchase
(a)
Lincoln’s assassination (b)
Birth of Queen Victoria
After
each answer subjects are also asked:
How confident are you that your answer is correct?
50%
60% 70% 80%
90% 100%
In an
experiment using relatively hard questions it is typical to find that for the
cases in which subjects say they are 100% confident, only about 80% of their
answers are correct; for cases in which they say that they are 90% confident,
only about 70% of their answers are correct; and for cases in which they say
that they are 80% confident, only about 60% of their answers are correct. This tendency toward overconfidence seems to
be very robust. Warning subjects that
people are often overconfident has no significant effect, nor does offering
them money (or bottles of French champagne) as a reward for accuracy. Moreover, the phenomenon has been
demonstrated in a wide variety of subject populations including undergraduates,
graduate students, physicians and even CIA analysts. (For a survey of the literature see Lichtenstein, Fischoff &
Phillips, 1982.)
2.
5. Anchoring
In
their classic paper, “Judgment under uncertainty,” Tversky and Kahneman (1974)
showed that quantitative reasoning processes – most notably the production of
estimates – can be strongly influenced by the values that are taken as a
starting point. They called this phenomenon anchoring. In one
experiment, subjects were asked to estimate quickly the products of numerical
expressions. One group of subjects was given five seconds to estimate the
product of
8´7´6´5´4´3´2´1
while a second group was given the same amount of time
to estimate the product of
1´2´3´4´5´6´7´8.
Under these time constraints, most of the subjects can only
do some steps of the computation and then have to extrapolate or adjust.
Tversky and Kahneman predicted that because the adjustments are usually
insufficient, the procedure should lead to underestimation. They also predicted
that because the result of the first step of the descending sequence is higher
than the ascending one, subjects would produce higher estimates in the first
case than in the second. Both predictions were confirmed. The median estimate
for the descending sequence was 2250 while for the ascending one was only 512.
Moreover, both groups systematically underestimated the value of the numerical
expressions presented to them since the correct answer is 40,320.
It’s hard to see how the above
experiment can provide grounds for serious concern about human rationality
since it results from of imposing serious constraints on the time that people
are given to perform the task. Nevertheless, other examples of anchoring are
genuinely bizarre and disquieting. In one experiment, for example, Tversky and
Kahneman asked subjects to estimate the percentage of African countries in the
United Nations. But before making these estimates, subjects were first shown an
arbitrary number that was determined by spinning a ‘wheel of fortune’ in their
presence. Some, for instance, were shown the number 65 while others the number
10. They were then asked to say if the
correct estimate was higher or lower than the number indicated on the wheel and
to produce a real estimate of the percentage of African members in the UN. The
median estimates were 45% for subjects whose “anchoring” number was 65 and 25%
for subjects whose number was 10. The rather disturbing implication of this
experiment is that people’s estimates can be affected quite substantially by a
numerical “anchoring” value even when they must be fully aware that the
anchoring number has been generated by a random process which they surely know
to be entirely irrelevant to the task at hand![2]
3. The Pessimistic Interpretation: Shortcomings
in Reasoning Competence
The experimental results we’ve been
recounting and the many related results reported in the extensive literature in
this area are, we think, intrinsically unsettling. They are even more alarming
if, as has occasionally been demonstrated, the same patterns of reasoning and
judgment are to be found outside the laboratory. None of us want our illnesses to be diagnosed by physicians who
ignore well-confirmed information about base-rates. Nor do we want public officials to be advised by CIA analysts who
are systematically overconfident. The experimental results themselves do not
entail any conclusions about the nature or the normative status of the
cognitive mechanisms that underlie people’s reasoning and judgment. But a number of writers have urged that
these results lend considerable support to a pessimistic hypothesis about those
mechanisms, a hypothesis which may be even more disturbing than the results
themselves. On this pessimistic view, the examples of problematic reasoning,
judgments and decisions that we’ve sketched are not mere performance errors. Rather,
they indicate that most people’s underlying reasoning competence is irrational or at least normatively problematic.
In order to explain this view more clearly, we first need to explain the
distinction between competence and performance on which it is based and say
something about the normative standards of reasoning that are being assumed by
advocates of this pessimistic interpretation of the experimental results.
3.1.
Competence and Performance
The competence/performance distinction, as we will characterize it, was first
introduced into cognitive science by Chomsky, who used it in his account of the
explanatory strategy of theories in linguistics. (Chomsky, 1965, Ch. 1; 1975;
1980) In testing linguistic theories,
an important source of data are the “intuitions” or unreflective judgments that
speakers of a language make about the grammaticality of sentences, and about
various linguistic properties and relations. To explain these intuitions, and
also to explain how speakers go about producing and understanding sentences of
their language in ordinary discourse, Chomsky and his followers proposed that a
speaker of a language has an internally represented grammar of that language –
an integrated set of generative rules and principles that entail an infinite
number of claims about the language.
For each of the infinite number of sentences in the speaker’s language,
the internally represented grammar entails that it is grammatical; for each
ambiguous sentence in the speaker’s language, the grammar entails that it is
ambiguous, etc. When speakers make the
judgments that we call linguistic intuitions, the information in the internally
represented grammar is typically accessed and relied upon, though neither the
process nor the internally represented grammar are accessible to
consciousness. Since the internally
represented grammar plays a central role in the production of linguistic
intuitions, those intuitions can serve as an important source of data for
linguists trying to specify what the rules and principles of the internally
represented grammar are.
A speaker’s intuitions are not, however,
an infallible source of information about the grammar of the speaker’s
language, because the grammar cannot produce linguistic intuitions by
itself. The production of intuitions is
a complex process in which the internally represented grammar must interact
with a variety of other cognitive mechanisms including those subserving
perception, motivation, attention, short term memory and perhaps a host of
others. In certain circumstances, the
activity of any one of these mechanisms may result in a person offering a
judgment about a sentence which does not accord with what the grammar actually
entails about that sentence. This might
happen when we are drunk or tired or in the grip of rage. But even under
ordinary conditions when our cognitive mechanisms are not impaired in this way,
we may still fail to recognize a sentence as grammatical due to limitations on
attention or memory. For example, there is considerable evidence indicating
that the short-term memory mechanism has difficulty handling center embedded
structures. Thus it may well be the
case that our internally represented grammars entail that the following
sentence is grammatical:
What what what he wanted cost would
buy in Germany was amazing.
even
though our intuitions suggest, indeed shout, that it is not.
Now in the jargon that Chomsky
introduced, the rules and principles of a speaker’s internalized grammar
constitutes the speaker’s linguistic competence. By contrast, the judgments a speaker makes about
sentences, along with the sentences the speaker actually produces, are part of
the speaker’s linguistic performance.
Moreover, as we have just seen, some of the sentences a speaker produces
and some of the judgments the speaker makes about sentences, will not
accurately reflect the speaker’s linguistic competence. In these cases, the speaker is making a performance
error.
There are some obvious analogies
between the phenomena studied in linguistics and those studied by philosophers
and cognitive scientists interested in reasoning. In both cases there is spontaneous and largely unconscious
processing of an open-ended class of inputs; people are able to understand
endlessly many sentences, and to draw inferences from endlessly many
premises. Also, in both cases, people
are able to make spontaneous intuitive judgments about an effectively infinite
class of cases – judgments about grammaticality, ambiguity, etc. in the case of
linguistics, and judgments about validity, probability, etc. in the case of
reasoning. Given these analogies, it is
plausible to explore the idea that the mechanism underlying our ability to
reason is similar to the mechanism underlying our capacity to process
language. And if Chomsky is right about
language, then the analogous hypothesis about reasoning would claim that people
have an internally represented, integrated set of rules and principles of
reasoning – a “psycho-logic” as it has been called – which is usually accessed
and relied upon when people draw inferences or make judgments about them. As in the case of language, we would expect
that neither the processes involved nor the principles of the internally
represented psycho-logic are readily accessible to consciousness. We should also expect that people’s
inferences, judgments and decisions would not be an infallible guide to what
the underlying psycho-logic actually entails about the validity or plausibility
of a given inference. For here, as in
the case of language, the internally represented rules and principles must
interact with lots of other cognitive mechanisms – including attention,
motivation, short term memory and many others.
The activity of these mechanisms can give rise to performance errors – inferences, judgments or decisions that do not
reflect the psycho-logic which constitutes a person’s reasoning competence.
There is, however, an important
difference between reasoning and language, even if we assume that a
Chomsky-style account of the underlying mechanism is correct in both cases. For in the case of language, it makes no
clear sense to offer a normative assessment of a normal person’s competence. The rules and principles that
constitute a French speaker’s linguistic competence are significantly different
from the rules and principles that underlie language processing in a Chinese
speaker. But if we were asked which
system was better or which one was correct, we would have no idea what was
being asked. Thus, on the language side
of the analogy, there are performance errors, but there is no such thing as a competence error or a normatively
problematic competence. If two
otherwise normal people have different linguistic competences, then they simply
speak different languages or different dialects. On the reasoning side of the analogy, however, things look very
different. It is not clear whether
there are significant individual and group differences in the rules and
principles underlying people’s performance on reasoning tasks, as there so
clearly are in the rules and principles underlying people’s linguistic
performance.[3] But if there are significant interpersonal
differences in reasoning competence, it surely appears to make sense to ask
whether one system of rules and principles is better than another.[4]
3.2.
The Standard Picture
Clearly, the claim that one system of rules is
superior to another assumes – if only tacitly – some standard or metric against
which to measure the relative merits of reasoning systems. And this raises the
normative question of what standards we ought to adopt when evaluating human
reasoning. Though advocates of the pessimistic interpretation rarely offer an
explicit and general normative theory of rationality, perhaps the most
plausible reading of their work is that they are assuming some version of what
Edward Stein calls the Standard Picture:
According to this picture, to be rational is to reason in accordance
with principles of reasoning that are based on rules of logic, probability
theory and so forth. If the standard
picture of reasoning is right, principles of reasoning that are based on such
rules are normative principles of reasoning, namely they are the principles we
ought to reason in accordance with.
(Stein 1996, p. 4)
Thus
the Standard Picture maintains that the appropriate criteria against which to
evaluate human reasoning are rules derived from formal theories such as
classical logic, probability theory and decision theory.[5] So, for example, one might derive something
like the following principle of reasoning from the conjunction rule of
probability theory:
Conjunction Principle: One ought not to assign a
lower degree of probability to the occurrence of event A than one does to the
occurrence of A and some (distinct) event B (Stein 1996, 6).
If
we assume this principle is correct, there is a clear answer to the question of
why the patterns of inference discussed in section 2.2 (on the “conjunction
fallacy”) are normatively problematic: they violate the conjunction principle.
More generally, given principles of this kind, one can evaluate the specific
judgments and decisions issued by human subjects and the psycho-logics that
produce them. To the extent that a person’s judgments and decisions accord with
the principles of the Standard Picture, they are rational and to the extent
that they violate such principles, the judgments and decisions fail to be
rational. Similarly, to the extent that
a reasoning competence produces judgments and decisions that accord with the
principles of the Standard Picture, the competence is rational and to the
extent that it fails to do so, it is not rational.
Sometimes, of course, it is far from clear how these
formal theories are to be applied – a problem that we will return to in section
7. Moreover, as we’ll see in section 8, the Standard Picture is not without its
critics. Nonetheless, it does have some notable virtues. First, it seems to
provide reasonably precise standards against which to evaluate human reasoning.
Second, it fits very neatly with the intuitively plausible idea that logic and
probability theory bear an intimate relationship to issues about how we ought
to reason. Finally, it captures an intuition about rationality that has
long held a prominent position in philosophical discussions, namely that the
norms of reason are “universal principles” – principles that apply to all
actual and possible cognizers irrespective of who they are or where they are
located in space and time. Since the principles of the Standard Picture are
derived from formal/mathematical theories –theories that, if correct, are necessarily
correct –- they appear to be precisely the sort of principles that one needs to
adopt in order to capture the intuition that norms of reasoning are universal
principles.
3.3
The Pessimistic Interpretation
We are now, finally, in a position
to explain the pessimistic hypothesis that some authors have urged to account
for the sorts of experimental results sketched in Section 2. According to this hypothesis, the errors
that subjects make in these experiments are very different from the sorts of
reasoning errors that people make when their memory is overextended or when
their attention wanders. They are also
different from the errors people make when they are tired, drunk or emotionally
upset. These latter cases are all examples of performance errors – errors that people make when they infer in ways
that are not sanctioned by their own psycho-logic. But, according to the pessimistic
interpretation, the sorts of errors described in Section 2 are competence errors. In these
cases people are reasoning, judging and making decisions in ways that
accord with their psycho-logic. The subjects in these experiments do not use
the right rules – those sanctioned by the Standard Picture – because they do
not have access to them; they are not part of the subjects’ internally
represented reasoning competence. What
they have instead is a collection of simpler rules or “heuristics“ that may often get the right answer, though it is also the case that
often they do not. So, according to this pessimistic hypothesis, the
subjects make mistakes because their psycho-logic is normatively defective;
their internalized rules of reasoning are less than fully rational. It is not at all clear that Kahneman and Tversky would endorse this interpretation of the experimental results, though
a number of other leading researchers clearly do.[6]
According to Slovic, Fischhoff and Lichtenstein, for example, “It appears that people lack the correct
programs for many important judgmental tasks….
We have not had the opportunity to evolve an intellect capable of
dealing conceptually with uncertainty.” (1976, p. 174)
To sum up:
According to the pessimistic interpretation, what experimental results
of the sort discussed in section 2 suggest is that our reasoning is subject to
systematic competence errors. But is this view warranted? Is it really the most
plausible response to what we've been calling the evaluative project, or is
some more optimistic view in order? In
recent years, this has become one of the most hotly debated questions in
cognitive science, and numerous challenges have been developed in order to show
that the pessimistic interpretation is unwarranted. In the remaining sections
of this paper we consider and evaluate some of the more prominent and plausible
of these challenges.
4.
The Challenge From Evolutionary Psychology
In recent years Gerd Gigerenzer, Leda Cosmides, John
Tooby and other leading evolutionary psychologists have been among the most
vocal critics of the pessimistic account of human reasoning, arguing that the
evidence for human irrationality is far less compelling than advocates of the
heuristics and biases tradition suggest. In this section, we will attempt to
provide an overview of this recent and intriguing challenge. We start in
section 4.1 by outlining the central theses of evolutionary psychology. Then in
4.2 and 4.3 we discuss how these core ideas have been applied to the study of
human reasoning. Specifically, we’ll discuss two psychological hypotheses – the
cheater detection hypothesis and the frequentist hypothesis – and
evidence that’s been invoked in support of them. Though they are ostensibly
descriptive psychological claims, a number of prominent evolutionary
psychologists have suggested that these hypotheses and the experimental data
that has been adduced in support of them provide us with grounds for rejecting
the pessimistic interpretation of human reasoning. In section 5, we consider
the plausibility of this claim.
4.1
The Central Tenets of Evolutionary Psychology
Though the interdisciplinary field of evolutionary
psychology is too new to have developed any precise and widely agreed upon body
of doctrine, there are two theses that are clearly central. First, evolutionary
psychologists endorse an account of the structure of the human mind which is
sometimes called the massive modularity hypothesis (Sperber, 1994;
Samuels 1998). Second, evolutionary psychologists commit themselves to a
methodological claim about the manner in which research in psychology ought to
proceed. Specifically, they endorse the claim that adaptationist considerations
ought to play a pivotal role in the formation of psychological hypotheses.
4.1.1
The Massive Modularity Hypothesis
Roughly stated, the massive modularity hypothesis
(MMH) is the claim that the human mind is largely or perhaps even entirely
composed of highly specialized cognitive mechanisms or modules. Though
there are different ways in which this rough claim can be spelled out, the
version of MMH that evolutionary psychologists defend is heavily informed by
the following three assumptions:
Computationalism. The human mind is an information processing
device that can be described in computational terms – “a computer made out of
organic compounds rather than silicon chips” (Barkow et. al, 1992, p.7). In
expressing this view, evolutionary psychologists clearly see themselves as
adopting the computationalism that is prevalent in much of cognitive
science
Nativism. Contrary to what has surely been the dominant view
in psychology for most of the Twentieth Century, evolutionary psychologists
maintain that much of the structure of the human mind is innate. Evolutionary
psychologists thus reject the familiar empiricist proposal that the innate
structure of the human mind consists of little more than a general-purpose
learning mechanism. Instead they embrace the nativism associated with
Chomsky and his followers (Pinker, 1997).
Adaptationism. Evolutionary psychologists
invariably claim that our cognitive architecture is largely the product of
natural selection. On this view, our minds are composed of adaptations
that were “invented by natural selection during the species’ evolutionary
history to produce adaptive ends in the species’ natural environment” (Tooby
and Cosmides, 1995, p. xiii). Our minds, evolutionary psychologists maintain,
are designed by natural selection in order to solve adaptive problems: “evolutionary
recurrent problem[s] whose solution promoted reproduction, however long or
indirect the chain by which it did so” (Cosmides and Tooby, 1994, p. 87).
Evolutionary
psychologists conceive of modules as a type of computational mechanism – viz.
computational devices that are domain-specific as opposed to
domain-general.[7] Moreover, in
keeping with their nativism and adaptationism, evolutionary psychologists also
typically assume that modules are innate and that they are adaptations produced
by natural selection. In what follows we will call cognitive mechanisms that
posses these features Darwinian modules.[8]
The version of MMH endorsed by evolutionary psychologists thus amounts to the
claim that:
MMH. The human mind is largely or perhaps even entirely
composed of a large number of Darwinian modules – innate, computational
mechanisms that are domain-specific adaptations produced by natural selection.
This
thesis is a far more radical than earlier modular accounts of cognition, such
as the one endorsed by Jerry Fodor (Fodor, 1983). According to Fodor, the
modular structure of the human mind is restricted to input systems (those
responsible for perception and language processing) and output systems (those
responsible for producing actions).
Though evolutionary psychologists accept the Fodorian thesis that such peripheral
systems are modular in character, they maintain, pace Fodor, that
many or perhaps even all so-called central capacities, such as
reasoning, belief fixation and planning, can also “be divided into
domain-specific modules” (Jackendoff, 1992, p.70). So, for example, it has been
suggested by evolutionary psychologists that there are modular mechanisms for
such central processes as ‘theory of mind’ inference (Leslie, 1994;
Baron-Cohen, 1995) social reasoning (Cosmides and Tooby, 1992), biological
categorization (Pinker, 1994) and probabilistic inference (Gigerenzer, 1994 and
1996). On this view, then, “our
cognitive architecture resembles a confederation of hundreds or thousands of
functionally dedicated computers (often called modules) designed to solve
adaptive problems endemic to our hunter-gatherer ancestors” (Tooby and
Cosmides, 1995, p. xiv).
4.1.2
The Research Program of Evolutionary Psychology
A central goal of evolutionary
psychology is to construct and test hypotheses about the Darwinian modules
which, MMH maintains, make up much of the human mind. In pursuit of this goal, research may proceed in two quite
different stages. The first, which
we’ll call evolutionary analysis, has as its goal the generation of
plausible hypotheses about Darwinian modules.
An evolutionary analysis tries to determine as much as possible about
the recurrent, information processing problems that our forebears would have
confronted in what is often called the environment of evolutionary
adaptation or the EEA – the environment in which our ancestors
evolved. The focus, of course, is on adaptive
problems whose successful solution would have directly or indirectly
contributed to reproductive success. In some cases these adaptive problems were
posed by physical features of the EEA, in other cases they were posed by
biological features, and in still other cases they were posed by the social
environment in which our forebears were embedded. Since so many factors are involved in determining the sorts of
recurrent information processing problems that our ancestors confronted in the
EEA, this sort of evolutionary analysis is a highly interdisciplinary
exercise. Clues can be found in many
different sorts of investigations, from the study of the Pleistocene climate to
the study of the social organization in the few remaining hunter-gatherer
cultures. Once a recurrent adaptive problem has been characterized, the
theorist may hypothesize that there is a module which would have done a good job at solving that problem in the EEA.
An important part of the effort to
characterize these recurrent information processing problems is the
specification of the sorts constraints that a mechanism solving the problem
could take for granted. If, for
example, the important data needed to solve the problem was almost always
presented in a specific format, then the mechanism need not be able to handle
data presented in other ways. It could
“assume” that the data would be presented in the typical format. Similarly, if it was important to be able to
detect people or objects with a certain property that is not readily observable,
and if, in the EEA, that property was highly correlated with some other
property that is easier to detect, the system could simply assume that people
or objects with the detectable property also had the one that was hard to
observe.
It is important to keep in mind that
evolutionary analyses can only be used as a way of suggesting plausible
hypotheses about mental modules. By themselves evolutionary analyses
provide no assurance that these hypotheses are true. The fact that it would
have enhanced our ancestors’ fitness if they had developed a module that solved a certain problem is no guarantee that they did develop
such a module, since there are many reasons why natural selection and the other
processes that drive evolution may fail to produce a mechanism that would
enhance fitness (Stich, 1990, Ch. 3).
Once an evolutionary analysis has succeeded in
suggesting a plausible hypothesis, the next stage in the evolutionary
psychology research strategy is to test the hypothesis by looking for evidence
that contemporary humans actually have a module with the properties in question.
Here, as earlier, the project is highly interdisciplinary. Evidence can come from experimental studies
of reasoning in normal humans (Cosmides, 1989; Cosmides and Tooby, 1992,
1996; Gigerenzer, 1991a; Gigerenzer and Hug, 1992), from
developmental studies focused on the emergence of cognitive skills (Carey and
Spelke, 1994; Leslie, 1994; Gelman and Brenneman, 1994), or from the
study of cognitive deficits in various abnormal populations (Baron-Cohen,
1995). Important evidence can also be
gleaned from studies in cognitive anthropology (Barkow, 1992; Hutchins, 1980), history, and even from such
surprising areas as the comparative study of legal traditions (Wilson and Daly,
1992). When evidence from a number of
these areas points in the same direction, an increasingly strong case can be
made for the existence of a module suggested by evolutionary analysis.
In 4.2 and 4.3 we consider two applications of this
two-stage research strategy to the study of human reasoning. Though the interpretation of the studies we
will sketch is the subject of considerable controversy, a number of authors
have suggested that they show there is something deeply mistaken about the
pessimistic hypothesis set out in Section 3.
That hypothesis claims that people lack normatively appropriate rules or
principles for reasoning about problems like those set out in Section 2. But when we look at variations on these
problems that may make them closer to the sort of recurrent problems our
forebears would have confronted in the EEA, performance improves
dramatically. And this, it is argued,
is evidence for the existence of at least two normatively sophisticated
Darwinian modules, one designed to deal with probabilistic reasoning when
information is presented in a frequency format, the other designed to deal with
reasoning about cheating in social exchange settings.
4.2 The Frequentist Hypothesis
The experiments reviewed in Sections
2.2 and 2.3 indicate that in many cases people are quite bad at reasoning about
probabilities, and the pessimistic interpretation of these results claims that
people use simple (“fast and dirty”) heuristics in dealing with these problems because their cognitive systems have no
access to more appropriate principles for reasoning about probabilities. But, in a series of recent and very
provocative papers, Gigerenzer (1994, Gigerenzer & Hoffrage, 1995) and
Cosmides and Tooby (1996) argue that from an evolutionary point of view this
would be a surprising and paradoxical result. “As long as chance has been loose
in the world,” Cosmides and Tooby note, “animals have had to make judgments
under uncertainty.” (Cosmides and Tooby, 1996, p. 14; for the remainder of this
section, all quotes are from Cosmides and Tooby, 1996, unless otherwise
indicated.) Thus making judgments when
confronted with probabilistic information posed adaptive problems for all sorts
of organisms, including our hominid ancestors, and “if an adaptive problem has
endured for a long enough period and is important enough, then mechanisms of
considerable complexity can evolve to solve it” (p. 14). But as we saw in the
previous section, “one should expect a mesh between the design of our cognitive
mechanisms, the structure of the adaptive problems they evolved to solve, and
the typical environments that they were designed to operate in – that is, the
ones that they evolved in” (p. 14). So in launching their evolutionary analysis
Cosmides and Tooby’s first step is to ask: “what kinds of probabilistic
information would have been available to any inductive reasoning mechanisms
that we might have evolved?” (p. 15)
In the modern world we are
confronted with statistical information presented in many ways: weather
forecasts tell us the probability of rain tomorrow, sports pages list batting
averages, and widely publicized studies tell us how much the risk of colon
cancer is reduced in people over 50 if they have a diet high in fiber. But
information about the probability of single events (like rain tomorrow) and
information expressed in percentage terms would have been rare or unavailable
in the EEA.
What was available in the environment in
which we evolved was the encountered frequencies of actual events – for
example, that we were successful 5 times out of the last 20 times we hunted in
the north canyon. Our hominid ancestors
were immersed in a rich flow of observable frequencies that could be used to
improve decision-making, given procedures that could take advantage of
them. So if we have adaptations for
inductive reasoning, they should take frequency information as input. (pp.
15-16)
After a cognitive system has
registered information about relative frequencies it might convert this
information to some other format. If,
for example, the system has noted that 5 out of the last 20 north canyon hunts
were successful, it might infer and store the conclusion that there is a .25
chance that a north canyon hunt will be successful. However, Cosmides and Tooby argue, “there are advantages to
storing and operating on frequentist representations because they preserve
important information that would be lost by conversion to single-event
probability. For example, ... the
number of events that the judgment was based on would be lost in
conversion. When the n
disappears, the index of reliability of the information disappears as well.”
(p. 16)
These and other considerations about the environment
in which our cognitive systems evolved lead Cosmides and Tooby to hypothesize
that our ancestors “evolved mechanisms that took frequencies as input,
maintained such information as frequentist representations, and used these
frequentist representations as a database for effective inductive reasoning.”[9] Since evolutionary psychologists expect the mind to contain many specialized modules, Cosmides and
Tooby are prepared to find other modules involved in inductive reasoning that
work in other ways.
We are not hypothesizing that every cognitive mechanism involving
statistical induction necessarily operates on frequentist principles, only that
at least one of them does, and that this makes frequentist principles an
important feature of how humans intuitively
engage the statistical dimension of the world. (p. 17)
But,
while their evolutionary analysis does not preclude the existence of inductive
mechanisms that are not focused on frequencies, it does suggest that when a mechanism
that operates on frequentist principles is engaged, it will do a good job, and
thus the probabilistic inferences it makes will generally be normatively
appropriate ones. This, of course, is
in stark contrast to the bleak implications hypothesis which claims that people
simply do not have access to normatively appropriate strategies in this area.
From their hypothesis, Cosmides and
Tooby derive a number of predictions:
(1)
Inductive reasoning performance will differ depending on whether subjects
are asked to judge a frequency or the probability of a single event.
(2)
Performance on frequentist versions of problems will be superior to
non-frequentist versions.
(3) The more
subjects can be mobilized to form a frequentist representation, the better
performance will be.
(4) ...
Performance on frequentist problems will satisfy some of the constraints that a
calculus of probability specifies, such as Bayes’ rule. This would occur because some inductive
reasoning mechanisms in our cognitive architecture embody aspects of a calculus
of probability. (p. 17)
To test these predictions Cosmides
and Tooby ran an array of experiments designed around the medical diagnosis
problem which Casscells et. al. used to demonstrate that even very
sophisticated subjects ignore information about base rates. In their first experiment Cosmides and Tooby
replicated the results of Casscells et. al. using exactly the same wording that
we reported in section 2.3. Of the 25
Stanford University undergraduates who were subjects in this experiment, only 3
(= 12%) gave the normatively appropriate bayesian answer of “2%”, while 14
subjects (= 56%) answered “95%”.[10]
In another experiment, Cosmides and
Tooby gave 50 Stanford students a similar problem in which relative frequencies
rather than percentages and single event probabilities were emphasized. The “frequentist” version of the problem
read as follows:
1 out of every 1000
Americans has disease X. A test has
been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the
test comes out positive. But sometimes
the test also comes out positive when it is given to a person who is completely
healthy. Specifically, out of every
1000 people who are perfectly healthy, 50 of them test positive for the
disease.
Imagine
that we have assembled a random sample of 1000 Americans. They were selected by lottery. Those who conducted the lottery had no
information about the health status of any of these people.
Given the information above:
on average,
How many people who test positive for the disease
will actually have the disease?
_____ out of _____.[11]
On this problem the results were dramatically different. 38 of the 50 subjects (= 76%) gave the correct bayesian answer.[12]