Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Preface

The book provides some math as preface to understand the first chapter.

The Binomial Distribution

Example 1.1 - Binomial

A coin has probability of coming up heads. The coin is tossed times. What is the probability distribution of the number of heads, ? What are the mean and variance of ?

The number of heads has a binomial distribution. Each instance has probability as such, and there are ways that it can occur.

The mean is:

The variance is:

Computing these sums are cumbersome. It is easier to observe that (the number of heads) is the sum of independent random variables - each random variable is whether the th toss results in a head.

In general, for any random variables and , if and are independent:

So the mean:

And the variance (note that we subtract as the expectation of a single trial is ):

Stirling's Approximation

We will use the poisson distribution and the gaussian approximation to derive stirling's approximation for the factorial function.

Start with a poisson distribution with mean :

For large , the poisson distribution is well approximated in the vicinity of by a gaussian distribution with mean and variance :

Plugging , we get:

Applying , we get stirling's approximation for the factorial function:

Now we apply stirling's approximation to and re-organize:

Note that:

  • We are using the approximation of
  • In the second line, we replace all the factorials with the approximation, and notice that the part of the approximation will cancel out between all the terms
  • In the third line, we group terms together and use the trick to flip signs
  • In the last line, we artificially add and subtract to create the final form

Now, the whole point of this derivation is to write the approximation in a form that resembles binary entropy. We now define that concept here.

Binary Entropy

The binary entropy function is ( denotes logarithm with base ):

Note that the binary entropy function is just the special case of the general shannon entropy when we have only two possible outcomes, assuming represents a probability.

Let us rewrite using the binary entropy function:

Or equivalently,

So we see that the binary entropy function nicely describes the combinatorial explosion of the binomial function. We can observe that since the binary entropy function is maximized when , the binomial combination is maximized likewise.