The Evolution of Inference

Malcolm R. Forster

Draft: May 24, 1999

Imagine, for a moment, that the elementary laws of logic grew out of the evolutionary development of language, driven by the adaptive advantages of communication. Consider two such primordial human beings. Suppose, as described in Skyrms (1996), they have learned to communicate facts *A* and *B* in signaling games (Lewis 1969). Members of the population, such as our prehistoric pair, are occasionally faced with the following ‘game’. Let one of the players be the *receiver* and the other the *sender*. The *receiver* needs to know whether *B* is true or not, but only possesses information about whether *A* is true or not. In some environmental contexts, *A* is sufficient for *B*, in others it is not. The sender knows nothing about *A* or *B*, but does know that *A* is sufficient for *B* in some environments. This is a higher-order signaling game in which both players can benefit from sharing the information that they possess. How does a communication strategy evolve, and is it evolutionarily stable?

To simplify the presentation, suppose that the sender is the only one who sends information, and the receiver is therefore the only one who received information. If the receiver infers *B* and *B* is true, then both receive a payoff of 1. If the receiver infers *B* and *B* is false, then both receive a payoff of 0. If the receiver does not infer *B* and *B* is true, then both receive a payoff of 0.

The receiver has knows whether *A* or not-*A* is true, and the sender knows which of two environments the players are situated, *E* or *F*. *E* is an environment in which *A* is always followed by *B* (and so the players will receive the payoff if and only if the receiver infers *B *from *A*). The environment *F* is different. Here *B* follows *A* just a frequently as it follows not-*A*. The two events are probabilistically independent. Suppose the sender can choose to send a signal always, never, or solely when in environment *E* (the case in which the sender sends a signal solely when environment *F* will impart the same information, so can be omitted without loss of generality). Let us denote the signal by *C*. Let us suppose that four inference strategies are available to the receiver:

*Always*: Always infer *B* (ignoring* *all information),

*Never*: Never infer *B* (again ignoring all information),

*Induction*: Always infer *B* from *A* (ignoring *C*),

*Deduction*: Infer *B* from *A* if and only if *C* is present.

It is intuitive to think that there is nothing to be lost by the sender sending the signal *C* if and only environment *E* is present. Certainly, if the receiver ignores the information (as in the inference strategies *Always*, *Never*, and *Induction*), then it makes no difference what information is carried by the signal *C* (Dretske 1981). However, it is not obvious whether *Deduction* does better. To investigate that question, one needs to make some assumptions about the frequency of *B* in the four possible environments: *E* and *A*, *E* and not-*A*, *F* and *A*, *F* and not-*A*. By hypothesis, the relative frequency in the first case is one, since *A* ensures the occurrence of *B* in environment *E*. For simplicity, assume that the frequency of *B* in the other three cases is *r*. We must also make assumptions about the frequencies of the four environments themselves, which we will initial assume to be equally probable (¼ each).

First, consider the case in which *C* is sent if and only if the environment is *E*. Then is possible to prove that the expected payoffs are *U*(*Always*) = ¼ + ¾ e
, *U*(*Never*) = ¾ (1-
e
),* U*(*Induction*) = ¼ + ½ *e *+ ¼ (1-
*e*), and* U*(*Deduction*) = ¼ + ¾(1-
*e*). On the one hand, *Deduction* always does better than *Never*. However, *Always* can do better than *Deduction* if *e* is greater than ½. Since *Always* involves no ‘reasoning’ at all, we may say that reasoning is only worthwhile in environments where *B* is relatively rare, and therefore hard to predict *a priori* (Sober 1994). Moreover, under the same condition (*e* > ½), *Always* and *Never* will do better than *Induction*. So, therefore let us restrict our attention to those situations in which reasoning strategies do better than the no-reasoning strategies, and investigation the differences between *Induction* and *Deduction*. If the frequency of *B* is sufficiently small outside of the case in which *A* and *E* hold (*e* < ½), then we see that *Deduction* does better than *Induction* in this case.

What if *C* is never sent? Then the expected payoffs are *U*(*Always*) = ¼ + ¾ *e*, *U*(*Never*) = ¾ (1-
*e*),* U*(*Induction*) = ¼ + ½ *e* + ¼ (1-
*e*), and* U*(*Deduction*) = ¾(1-
*e*). By the same argument as before, we assume that *e* < ½. Now *Deduction* will do better than *Induction* if and only if *e* < ¼. That is, *Deduction* is no longer universally better than *Induction* because the missing premise added, *C*, is not always ‘true’. Therefore, the situations in which e
has an intermediate value (¼ < *e* < ½) is one in which *Deduction* is sensitive to the quality of the information it is using.

What if *C* is always sent? In that case, *Deduction* and *Induction* are equivalent—they both have an expected payoff of ¼ + ½ *e* + ¼ (1-
*e*).

Finally, let us lump all the cases together and ask which is optimal (under the condition *e* < ½). It is clear by inspection that the case in which the receiver uses *Deduction* and the sender sends information about the environment *E* has the highest expected payoff of ¼ + ¾(1-
*e*). As e
®
0, this will tend to 1, which is the maximum of any possible strategy.

Is the *Deduction* strategy evolutionarily stable? That is, is there any other reasoning strategy that can obtain a higher expected payoff in interacting with the established members of the population, and thereby invade the population? I believe the that answer is clearly ‘no’ because any mutant sender who fails to correlate *C* with *E* will do worse, and any mutant receiver who fails to take advantage of whatever information is sent will do worse. It is clear that any mutation in which the correlation between *C* and *E* is less than perfect must do worse. This conclusion relies on the fact that e
is sufficiently small and that there is no cost to sending a signal (unlike the evolutionary example of sending predator alarm signals, which alert the predator to the location of the sender).

The situation described so far is extraordinarily simple. Real situations are bound to be vastly more complicated. One obvious complication is that the receiver is poised to make many possible inferences at any one time; not merely *B* from *A*, but also *K* from *L*, *Q* from *P*, and so on. Different signals from the sender will be relevant to different inferences. It makes sense, therefore, for the signals to be coded in a way that connects the information sent to the relevant inference. Obviously, a code involving an ‘if…then’ pattern would serve such a purpose. The signal *C* would have the form ‘if *A* then *B*’. I think it is easy to imagine that such a coding would have adaptive advantages, and could become established in a more complex evolutionary situation.

Now suppose that the *Deduction* strategy evolves and is implemented in this way. What is the meaning of a signal ‘If *A* then *B*’? From our point of view, we can say ‘If *A* then *B*’ means that environment *E* is present. But what does it mean *to the receiver*, who knows nothing of environments, or at least nothing about the connection of *C* with environments? We could say that ‘if *A* then *B*’ has the meaning of a material conditional, or we could say that it means ‘if *A* were the case, then *B* would be the case’, which is a subjunctive conditional. Is there any reason to favor one interpretation over the other in this primitive evolutionary setting?

One might argue against the subjunctive conditional interpretation on the basis that its content goes beyond anything that the receiver actually uses, at least in the counterfactual case in which *A* is false. There is no adaptive distinction between receiving and not receiving the signal ‘if *A* then *B*’ when *A* is false. Granted, the way I have set up the problem, there is a meta-level justification for the subjunctive interpretation. Evolution forces a strict correlation between the signal ‘if *A* then *B*’ and the presence of environment *E*, and the presence of *E* appears to ground the truth of the subjunctive conditional. Or does it? Consider the following example (adapted from van Fraassen 1980). Suppose that *A* = ‘Tom has lit the fuse, which he attached to the dynamite’, *B* = ‘The dynamite will explode’, and *E* = ‘Tom is the dynamite expert.’ *E* is sufficient for *B* *in the presence* of *A* because Tom is an expert, the fuse is reliably attached to the dynamite. Now imagine that *A* is false. Tom does not light the fuse. Does it now follow that if Tom were to have lit the fuse then the dynamite would have exploded. No, because the lecture room is crowded in the present case, and Tom is an expert, who knows the dangers, and if he were to have lit the fuse, it would only be because he would have ensured that there would be no explosion. The point is that the condition *A* contains part of what goes into establishing the counterfactual connection between *A* and *B*, and so *E* alone is insufficient. Therefore, the signal ‘if *A* then *B*’ does not have to carry the information that *B* would be the case if *A* were the case. For me, that proves that the receiver does not receive that information—that it does not mean that to the receiver—even when the signal does carry that information.

If we reject the subjunctive conditional because of its excess content, then perhaps we should choose the weakest possible interpretation that will do the job. So we come to the material conditional. To prove that it is the weakest possible interpretation, consider the possible worlds interpretation of logical entailment. On this view, the meaning an element of an inference system, a statement, claim, sentence, representation, or signal, is determine by the set of possible worlds in which it is true. For any element *P*, denote this set by *s*(*P*). Then *P entails* *Q* if and only if *s*(*P*) is a subset of *s*(*Q*). That is, *P entails* *Q* if and only if any possible world in which *P* is true is also a world in which *Q* is true. Now consider the following question. What is the weakest claim *X* such that *A* & *X* entails *B*? The set-theoretic formulation of this problem is: What is the largest set *s*(*X*) such that the intersection of *s*(*X*) and *s*(*A*) is contained in *s*(*B*).

First note that we should include *every* possible world outside of *s*(*A*), because their inclusion or exclusion makes no difference to the entailment property. The only requirement for the entailment is that the part of *s*(*X*) within *s*(*A*) must be inside *s*(*B*). Therefore *s*(*X*) contains all worlds outside of *s*(*A*). Which worlds *inside* *s*(*A*) can it contain. Well it can’t contain worlds that are *not* *B*-worlds, or removing the double negative, it should contain all *B* worlds. Therefore, the largest *s*(*X*) contains all worlds that are not *A * worlds and all *B*- worlds. In other words, it contains all worlds except those in which the conditional is ‘obviously’ false—that is, worlds in which *A* is true and *B* is false. This leads to the standard truth table for the material conditional.

Given that *any* interpretation of the conditional can be represented in a possible worlds framework, it follows that the material conditional is the weakest of all possible interpretations that do the job. ‘Doing the job’ means ensuring the *modus ponens *entailment of *B* from *A* and ‘if *A* then *B*’. (However, in this context, it this interpretation of ‘doing the job’ is merely a working hypothesis). So, our argument is this. Within the ‘veil of ignorance’ in which the receiver is operating, the meaning of the signal received by the receiver is the weakest one possible, and this is the material conditional.

Now comes a puzzle. Suppose that the environment *F* is actually composed of two separate environments *G* and *H*, and the sender discovers that *G* is just like *E*. That is, within *G*, *A* is sufficient for *B*. Such individuals can now correlate the signal ‘if *A* then *B*’ with the condition *E* or *G*. Such a strategy will have greater evolutionary utility, and will invade the population. Why? Because *E* or *G* is a weaker sufficient condition for the entailment of *B* from *A* than *E*. So, doesn’t the signal now change its meaning to something weaker than before? But how is this possible if it already had the weakest possible meaning?

Part of the resolution of this puzzle involves the distinction between types and tokens. Each instance of the signal ‘if *A* then *B*’ is called a token signal. All these tokens are of the same kind, which we refer to as the type. The notation ‘if *A* then *B*’ is ambiguous in that it sometimes refers to a token, sometimes to a type. The ambiguity is usually resolved by the context, but we need to pay attention.

So, consider token signals that are sent when the environment *E* is present. It is not paradoxical to consider that these token signals have the same meaning as before. After all, why should their meaning be affected by the fact that similar tokens would now be sent if the environment had been *G* whereas before they would not have been sent. Now consider tokens sent in environment *G*. Before, there would have been no signal sent. So, if there were no previous tokens, then there is no problem about an invariance in meaning. The meanings of tokens is the same before and after the change. What has changed is the number of tokens sent.

Perhaps we want to talk about the meaning of types? If the meaning of types supervenes on the meaning of possible tokens, then there is no meaning change at this level either. Is this paradoxical? Well, no, because types are not being sent or interpreted. Tokens may be interpreted in virtue of their type, but the type itself does not need to treated a kind of pilot wave accompanying the token signal.

Nevertheless, there is an important conflict of intuitions here if one introduces the idea of meaning for a sender. *Surely* the meaning for the sender does change when it is correlated with *E* or *G* rather than just *E*? And if senders are also receivers and receivers are also senders, then surely the meanings of some tokes change. So, how can the material conditional be the right interpretation in every case? The puzzle reappears.

Perhaps we need to make a distinction between sender’s meaning and receiver’s meaning? In this case, the material conditional is only the right interpretation for received tokens of ‘if *A* then *B*’? Perhaps this is how other interpretations of conditional enter natural language? Or perhaps we should deny that the sender’s meaning of ‘if *A* then *B*’ does change. Perhaps meaning is essentially a social concept and therefore depends only on a receiver’s interpretation? These are questions that may produce interesting answers within an evolutionary framework.

Dretske, Fred (1981): *Knowledge and the Flow of Information*, Cambridge: Bradford/MIT Press.

Lewis, David (1969): *Convention*, Cambridge, MA, Harvard University Press.

Skyrms, Brian (1996): *Evolution of Social Contract*, Cambridge University Press.

Sober (1994): *From A Biological Point of View - Essays in evolutionary philosophy*, Cambridge University Press.

van Fraassen, Bas (1980): *The Scientific Image. * Oxford: Oxford University Press.