I’ve been thinking about how to use a computer program to randomly generate a probability distribution of a given finite size. This has turned out to be an interesting problem.

My first idea was to generate *n* − 1 uniform variates in [0, 1] (where *n* is the desired size), sort them, add 0 to the front of the list and 1 to the back of the list, and then take the non-negative differences between the adjacent variates. In Python code:

```
def randpd1(size):
variates = [random.random() for i in range(size - 1)]
variates.sort()
return [j - i for i, j in zip([0] + variates, variates + [1])]
```

My second idea was to simply generate *n* uniform variates in [0, 1], add them together, and take the ratio of each individual variate to the sum.

```
def randpd2(size):
variates = [random.random() for i in range(size)]
s = sum(variates)
return [i/s for i in variates]
```

Both of these functions do reliably generate probability distributions, i.e. lists of non-negative real numbers (encoded as Python `float`

objects) that sum to 1, although very large lists generated by `randpd2`

sometimes sum to something slightly different from 1 due to floating-point imprecision.

```
>>> sample1 = [randpd1(2**(i//1000 + 1)) for i in range(10000)]
>>> all(all(p >= 0 for p in pd) for pd in sample1)
True
>>> all(sum(pd) == 1 for pd in sample1)
True
>>> sample2 = [randpd2(2**(i//1000 + 1)) for i in range(10000)]
>>> all(all(p >= 0 for p in pd) for pd in sample2)
True
>>> all(sum(pd) == 1 for pd in sample2)
False
>>> all(math.isclose(sum(pd), 1) for pd in sample2)
True
```

But do they both generate a *random* probability distribution? In precise terms: for a given `size`

argument, are the probability distributions of the return values `randpd1(size)`

and `randpd2(size)`

always uniform?

I don’t really know how to answer this question. In fact, it’s not even clear to me that there *is* a uniform distribution over the probability distributions of size *n* for every positive integer *n*. The problem is that the probability distributions of size *n* are the solutions in of the equation , where , , … and are dummy variables, and therefore they comprise a set *S* whose dimension is *n* − 1 (not *n*). Because *S* is missing a dimension, continuous probability distributions over it cannot be defined in the usual way via probability density mappings on . Any such mapping would have to assign probability density 0 to every point in *S*, because for every such point **x**, there’s a whole additional dimension’s worth of points in every neighbourhood of **x** which are not in *S*. But then the integral of the probability density mapping over *S* would be 0, not 1, and it would not be a probability density mapping.

But perhaps you can map *S* onto a subset of , and do something with a uniform distribution over the image. In any case, I’m finding thinking about this very confusing, so I’ll leave it for readers to ponder over. Given that I don’t currently know what a uniform probability distribution over the probability distributions of size *n* even *looks like*, I don’t know how to test whether one exists.

I can look at the marginal distributions of the *individual items* in the returned values of `randpd1`

and `randpd2`

. But these marginal distributions are not straightforwardly related to the joint distribution of the list as a whole. In particular, uniformity of the joint distribution does not imply uniformity of the marginal distributions, and uniformity of the marginal distributions does not imply uniformity of the joint distribution.

But it’s still interesting to look at the marginal distributions. First of all, they allow validation of another desirable property of the two functions: the marginal distributions are the same for each item (regardless of its position in the list). I’m not going to demonstrate this here because it would be tedious, but it does look like this is the case. Therefore we can speak of “the marginal distribution” without reference to any particular item. Second, they reveal that `randpd1`

and `randpd2`

do not do exactly the same thing. The marginal distributions are different for the two functions. Let’s first look just at the case where `size`

is 2.

>>> data1 = [randpd1(2)[0] for i in range(100000)] >>> plt.hist(data1)

>>> data2 = [randpd2(2)[0] for i in range(100000)] >>> plt.hist(data2)

The first plot looks like it’s been generated from a uniform distribution over [0, 1]; the second plot looks like it’s been generated from a non-uniform distribution which concentrates the probability density at . It’s easy to see why the distribution is uniform for `randpd1`

: the function works by generating a single uniform variate `p`

and then returning `[min(p, 1 - p), max(p, 1 - p)]`

, and given that the distribution of `p`

is uniform the distribution of `1 - p`

is also uniform. The function `randpd2`

, on the other hand, works by generating two uniform variates `p + q`

and returning `[p/(p + q), q/(p + q)]`

. However, I don’t know what the distribution of `p/(p + q)`

and `q/(p + q)`

is exactly, given that `p`

and `q`

are uniformly distributed. This is another thing I hope readers who know more about probability and statistics than me might be able to enlighten me on.

Here are the graphs for size 3:

>>> data1 = [randpd1(3)[0] for i in range(100000)] >>> plt.hist(data1)

>>> data2 = [randpd2(3)[0] for i in range(100000)] >>> plt.hist(data2)

The marginal distribution for `randpd1`

is no longer uniform; rather, it’s right triangular, with the right angle at the point (0, 0) and height 2. That means that, roughly speaking, a given item in the list returned by `randpd1(3)`

is twice as likely to be close to 0 as it is to be close to , and half as likely to be close to 1 as it is to be close to .

In general, the marginal distribution for `randpd1`

is the distribution of the minimum of a sample of uniform variates in [0, 1] of size *n* − 1, where *n* is the value of `size`

. This is because `randpd1`

works by generating such a sample, and the minimum of that sample always ends up being the first item in the returned list, and the marginal distributions of the other items are the same as the marginal distribution of the first item.

It turns out to be not too difficult to derive an exact formula for this distribution. For every , the minimum is greater than *x* if and only if all *n* − 1 variates are greater than *x*. Therefore the probabilities of these two events are the same. The probability of an individual variate being greater than *x* is 1 − *x* (because, given that the variate is uniformly distributed, *x* is the probability that the variate is less than or equal to *x*) and therefore, given that the variates are independent of each other, the probability of all being greater than *x* is . It follows that the probability of the minimum being less than or equal to *x* is . That is, the cumulative distribution mapping (CDM) *f* of the marginal distribution for `randpd1`

is given by

The probability distribution defined by this CDM is a well-known one called the beta distribution with parameters (1, *n* − 1). That’s a nice result!

The marginal distribution for `randpd2`

, on the other hand, is similar to the one for size 2 except that the mean is now something like rather than , and because the support is still the whole interval [0, 1] this results in a left-skewing of the distribution. Again, I don’t know how to characterize this distribution exactly. Here are the graphs for sizes 4 and 5:

>>> data = [randpd2(4)[0] for i in range(100000)] >>> plt.hist(data)

>>> data = [randpd2(5)[0] for i in range(100000)] >>> plt.hist(data2)

It looks like the marginal distribution generally has mean , or something close to that, for every positive integer *n*, while still having density approaching 0 at the left limit.

In conclusion… this post doesn’t have a conclusion, it just has a bunch of unanswered questions which I’d like to know the answers to.

- Is the concept of a uniform distribution over the probability distributions of size
*n*sensical? - If so, do the returned values of
`randpd1`

and`randpd2`

have that distribution? - If not, what distributions do they have?
- What’s the exact form of the marginal distribution for
`randpd2`

? - Which is better:
`randpd1`

or`randpd2`

? Or if one isn’t clearly better than the other, what is each one best suited to being used for? - Are there any other algorithms one could use to generate a random probability distribution?

Probability distributions on a finite set {1, …, n} are the same thing as points of the simplex conv{e_1, …, e_n}, which can be given its usual Lebesgue measure. With respect thereto, randpd1 is uniform: after sorting, i.e. doing a bunch of reflections in the hyperplanes x_i = x_j, you have chosen uniformly from the simplex conv{0, e_1, e_1+e_2, …, e_1+…+e_(n-1)} which you then map linearly onto the former one.

randpd2, otoh, is the pushforward of Lebesgue measure on the unit n-cube by projection through the origin. That’s kind of funny; I’d expect its utility to be mostly because it’s an approximation to some other measure which is easy to hack up. Pushforward from an orthant of a ball (equivalently sphere) might have some applications, though?