Posterior Predictive Distribution
Suppose we're trying to solve Tenenbaum Number Guessing. This is Concept Learning which is a Classification problem.
We have
- Prior: the likelihood of a hypothesis before we've seen any data. So for example, the concept "powers of 2" would have a larger prior that "powers of 2, except 32" or "powers of 2, but with 38". Let's say this is simply .
- Likelihood: the probability of seeing a distribution. see Occam's Razor. This is
- Posterior: the probability of a particular concept, given the data. This is likelihood times the prior, normalized.
Now the posterior predictive distribution uses these to figure out the probability that the next number belongs to our concept or not.
This is the weighted average of each particular hypothesis.
This method of classification is also called Bayes model averaging.