Posterior Predictive Distribution

Suppose we're trying to solve Tenenbaum Number Guessing. This is Concept Learning which is a Classification problem.

We have

  • Prior: the likelihood of a hypothesis before we've seen any data. So for example, the concept "powers of 2" would have a larger prior that "powers of 2, except 32" or "powers of 2, but with 38". Let's say this is simply p(h)p(h).
  • Likelihood: the probability of seeing a distribution. see Occam's Razor. This is p(Dh)=1(size of hypothesis)np(D|h) = \frac{1}{(\text{size of hypothesis})^n}
  • Posterior: the probability of a particular concept, given the data. This is likelihood times the prior, normalized.
p(hD)=p(h)p(Dh)hp(h)p(Dh)p(h|D) = \frac{p(h)p(D|h)}{\sum_{h'}{p(h')p(D|h')}}

Now the posterior predictive distribution uses these to figure out the probability that the next number belongs to our concept or not.

p(xCD)=hp(y=1x,h)p(hD)p(x \in C|D) = \sum_hp(y=1|x, h)p(h|D)

This is the weighted average of each particular hypothesis.

This method of classification is also called Bayes model averaging.