Preface
The book provides some math as preface to understand the first chapter.
The Binomial Distribution
Example 1.1 - Binomial
A coin has probability of coming up heads. The coin is tossed times. What is the probability distribution of the number of heads, ? What are the mean and variance of ?
The number of heads has a binomial distribution. Each instance has probability as such, and there are ways that it can occur.
The mean is:
The variance is:
Computing these sums are cumbersome. It is easier to observe that (the number of heads) is the sum of independent random variables - each random variable is whether the th toss results in a head.
In general, for any random variables and , if and are independent:
So the mean:
And the variance (note that we subtract as the expectation of a single trial is ):
Stirling's Approximation
We will use the poisson distribution and the gaussian approximation to derive stirling's approximation for the factorial function.
Start with a poisson distribution with mean :
For large , the poisson distribution is well approximated in the vicinity of by a gaussian distribution with mean and variance :
Plugging , we get:
Applying , we get stirling's approximation for the factorial function:
Now we apply stirling's approximation to and re-organize:
Note that:
- We are using the approximation of
- In the second line, we replace all the factorials with the approximation, and notice that the part of the approximation will cancel out between all the terms
- In the third line, we group terms together and use the trick to flip signs
- In the last line, we artificially add and subtract to create the final form
Now, the whole point of this derivation is to write the approximation in a form that resembles binary entropy. We now define that concept here.
Binary Entropy
The binary entropy function is ( denotes logarithm with base ):
Note that the binary entropy function is just the special case of the general shannon entropy when we have only two possible outcomes, assuming represents a probability.
Let us rewrite using the binary entropy function:
Or equivalently,
So we see that the binary entropy function nicely describes the combinatorial explosion of the binomial function. We can observe that since the binary entropy function is maximized when , the binomial combination is maximized likewise.