One of the classes I’m taking this term is about modelling the evolution of communication systems. Everything in the class is done via simulation, which is probably the best way to do it, and certainly necessary at the point where it starts to involve genetic algorithms and such. However, some of the earlier content in the class dealt with problems that I suspected were solvable by a purely mathematical approach, so as somebody with a maths degree I felt it necessary to rise to the challenge and try to derive the solutions mathematically. This post is my attempt to do that.

Let us begin by thinking very abstractly about a system which takes something in and gives something out. Suppose there is a finite, positive number *m* of things which may be taken in (possible inputs), which we shall call input 1, input 2, … and input *m*. Suppose likewise that there is a finite, positive number *n* of things which may be given out (possible outputs), which we shall call output 1, output 2, … and output *n*.

One way in which the behavior of such a system could be modelled is as a straightforward mapping from inputs to outputs. However, this might be too deterministic: perhaps the system doesn’t always output the same output for a given input. So let’s use a more general model, and think of the system as a mapping from inputs to *probability distributions* over outputs. For every pair (*i*, *j*) of integers such that 0 ≤ *i* ≤ *m* and 0 ≤ *j* ≤ *n*, let *p*_{i, j} denote the probability that input *i* is mapped to output *j*. The mapping as a whole is determined by the *mn* probabilities of the form *p*_{i, j}, and therefore it can be thought of as an *m*-by-*n* matrix **A**:

The rows of **A** correspond to the possible inputs and the columns of **A** correspond to the possible outputs. Probabilities are non-negative real numbers, so **A** is a non-negative real matrix. Also, the probabilities of mutually exclusive, exhaustive outcomes sum to 1, so the sum of each row of **A** is 1. This condition can be expressed as a system of linear equations:

Alternatively, and more compactly, it may be expressed as the matrix equation

where **x** is the *n*-dimensional vector whose components are all equal to 1 and **y** is the *m*-dimensional vector whose components are all equal to 1.

In general, if **x** is an *n*-dimensional vector, and we think of **x** as a random variable determined by the output of the system, then **A****x** is the vector of expected values of **x** conditional on each input. That is, for every integer *i* such that 1 ≤ *i* ≤ *m*, the *i*th component of **A****x** is the expected value of **x** conditional on meaning *i* being the input to the system.

Accordingly, if we have not just one, but *p* *n*-dimensional vectors **x**_{1}, **x**_{2}, … and **x**_{p} (where *p* is a positive integer), we can think of these *p* vectors as the columns of an *n*-by-*p* matrix **B**, and then we can read off all the expected values from the matrix product

like so: for every pair (*i*, *k*) of integers such that 0 ≤ *i* ≤ *m* and 0 ≤ *k* ≤ *p*, the (*i*, *k*) entry of **AB** is the expected value of **x**_{k} conditional on meaning *i* being the input to the system.

In the case where **B** happens to be another non-negative real matrix such that

so that the entries of **B** can be interpreted as probabilities, the matrix **B** as a whole can be interpreted as *another* input-output system whose possible inputs happen to be the same as the possible outputs of **A**. In order to emphasize this identity, let us now call the possible outputs of **A** (= the possible inputs of **B**) the *signals*: signal 1, signal 2, … and signal *n*. The other things—the possible inputs of **A**, and the possible outputs of **B**—can be thought of as *meanings*. Note that there is no need at the moment for the input meanings (the possible inputs of **A**) to be the same as the output meanings (the possible outputs of **B**); we make a distinction between the input meanings and the output meanings.

Together, **A** and **B** can be thought of as comprising a “product system” which works like this: an input meaning goes into **A**, a signal comes out of **A**, the signal goes into **B**, and an output meaning comes out of **B**. For every integer *k* such that 0 ≤ *k* ≤ *p*, the random variable **x**_{k} (the *k*th column of **B**) can now be interpreted as the probability of the product system outputting output meaning *k*, as a random variable whose value is determined by the signal. That is, for every integer *j* such that 0 ≤ *j* ≤ *n*, the *j*th component of **x**_{k} (the (*j*, *k*) entry of **B**) is the probability of output meaning *k* coming out if the signal happens to be signal *j*. It follows by the law of total probability that the probability of output meaning *k* coming out, if *i* is the input meaning, is the expected value of **x**_{k} conditional on *i* being the input meaning. Now, by what we said a couple of paragraphs above, we have that for every integer *i* such that 0 ≤ *i* ≤ *m*, the expected value of **x**_{k} conditional on *i* being the input meaning is the (*i*, *k*) entry of **AB**. So the “product system”, as a matrix, is the matrix product **AB**. That’s why we call it the “product system”, see? 🙂

In the case where the possible input meanings are the same as the possible output meanings and *m* = *p*, we may think about the “product system” as a *communicative dyad*. The speaker is **A**, the hearer is **B**. The speaker is trying to express a meaning, the input meaning, and producing a signal in order to do so, and the hearer is interpreting that signal to have some meaning, the output meaning. The output meaning the hearer understands is not necessarily the same as the input meaning the speaker was trying to express. If it is different, we may regard the communication as unsuccessful; if it is the same, we may regard the communication as successful.

The key question is: what is the probability that the communication is successful? Given the considerations above, it’s very easy to answer. If the input meaning is *i*, we’re just looking for the probability that output meaning *i* given this input meaning. That probability is simply the (*i*, *i*) entry of **AB**, i.e. the *i*th entry along **AB**‘s main diagonal.

What if the input meaning isn’t fixed? Then the answer will in general depend on the probability distribution over the possible input meanings. But in the simplest case, where the distribution is uniform (no input meaning is any more probable than any other), the probability of successful communication is just the mean of the input meaning-specific probabilities, that is, the sum of the main diagonal entries of **AB**, divided by *m* (the number of the main diagonal entries, i.e. the number of meanings). In linear algebra, we call the sum of the main diagonal entries of a square matrix its *trace*, and we denote it by tr(**C**) where **C** is the matrix. So our formula for the communication success probability *p* is

If the probability distribution over the input meanings isn’t uniform, the probability of successful communication is just the weighted average of the input meaning-specific probabilities, with the weights being the respective input meaning probabilities. The general formula can therefore be written as

where **D** is the diagonal matrix of size *m* whose main diagonal is the probability distribution over the input meanings (i.e. for every integer *i* such that 0 ≤ *i* ≤ *m*, the *i*th diagonal entry of **D** is the probability of input meaning *i* being the one the speaker tries to express). It doesn’t matter whether **D** is left-multiplied or right-multiplied, because the trace of the product is the same in either case. In the case where the probability distribution over the input meanings is uniform the diagonal entries of **D** are all equal to , i.e , where **I**_{m} is the identity matrix of size *m*, and therefore (3) reduces to (2).

To leave you fully convinced that this formula works, here are some simulations. The 5 graphs below were generated using a Python script which you can view on GitHub. Each one involves 3 possible meanings, 3 possible signals, randomly-generated speaker and hearer matrices and a randomly-generated probability distribution over the input meanings. If you look at the code, you’ll see that the blue line is generated by simulating communication in the obvious way, by randomly drawing an input meaning, randomly drawing a signal based on that particular input meaning, and finally randomly drawing an output meaning based on that particular signal. The position on the *x*-axis corresponds to the number of trials (individual simulated communicative acts) carried out so far and the position on the *y*-axis corresponds to the proportion of those trials involving a successful communication (one where the output meaning ended up being the same as the input meaning). For each graph, there were 10 sets of 500 trials; each individual set of trials corresponds to one of the light blue lines, while the darker blue lines gives the results averaged over those ten sets. The horizontal green line indicates the success probability as calculated by our formula. This should be close to the success proportion for a large number of trials, so we should see the blue and green lines converging on the right side of each graph. That is what we see, so the formula works.