<h1 id="posterior-predictive-distribution">Posterior Predictive Distribution<a aria-hidden="true" class="anchor-heading icon-link" href="#posterior-predictive-distribution"></a></h1>
Suppose we're trying to solve <a href="/notes/swd69s10di0x91f0rtmstrq">Tenenbaum Number Guessing</a>. This is <a href="/notes/vlh3vnn5l0pst53e4oi9pba">Concept Learning</a> which is a <a href="/notes/9j7bdv7uakenbluvvuk8yd8">Classification</a> problem.
We have
<ul>
<li><a href="/notes/v4vv3jh2r3lijaig6cf8xpl">Prior</a>: the likelihood of a hypothesis before we've seen any data. So for example, the concept "powers of 2" would have a larger prior that "powers of 2, except 32" or "powers of 2, but with 38". Let's say this is simply <math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>h</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">p(h)</annotation></semantics></math>p(h).</li>
<li><a href="/notes/bf40f88ssyqqebkp1bqohwo">Likelihood</a>: the probability of seeing a distribution. see <a href="/notes/9a2q3pvwe9rghw0idv8tg7i">Occam's Razor</a>. This is <math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>D</mi><mi mathvariant="normal">∣</mi><mi>h</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mo stretchy="false">(</mo><mtext>size of hypothesis</mtext><msup><mo stretchy="false">)</mo><mi>n</mi></msup></mrow></mfrac></mrow><annotation encoding="application/x-tex">p(D|h) = \frac{1}{(\text{size of hypothesis})^n}</annotation></semantics></math>p(D∣h)=(size of hypothesis)n1​</li>
<li><a href="/notes/ay06o1vph46s5yv2p0rxt3b">Posterior</a>: the probability of a particular concept, given the data. This is likelihood times the prior, normalized.</li>
</ul>
<div class="math math-display"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>h</mi><mi mathvariant="normal">∣</mi><mi>D</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>h</mi><mo stretchy="false">)</mo><mi>p</mi><mo stretchy="false">(</mo><mi>D</mi><mi mathvariant="normal">∣</mi><mi>h</mi><mo stretchy="false">)</mo></mrow><mrow><munder><mo>∑</mo><msup><mi>h</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup></munder><mrow><mi>p</mi><mo stretchy="false">(</mo><msup><mi>h</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="false">)</mo><mi>p</mi><mo stretchy="false">(</mo><mi>D</mi><mi mathvariant="normal">∣</mi><msup><mi>h</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="false">)</mo></mrow></mrow></mfrac></mrow><annotation encoding="application/x-tex">p(h|D) = \frac{p(h)p(D|h)}{\sum_{h'}{p(h')p(D|h')}}</annotation></semantics></math>p(h∣D)=∑h′​p(h′)p(D∣h′)p(h)p(D∣h)​</div>
Now the posterior predictive distribution uses these to figure out the probability that the next number belongs to our concept or not.
<div class="math math-display"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo>∈</mo><mi>C</mi><mi mathvariant="normal">∣</mi><mi>D</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo>∑</mo><mi>h</mi></munder><mi>p</mi><mo stretchy="false">(</mo><mi>y</mi><mo>=</mo><mn>1</mn><mi mathvariant="normal">∣</mi><mi>x</mi><mo separator="true">,</mo><mi>h</mi><mo stretchy="false">)</mo><mi>p</mi><mo stretchy="false">(</mo><mi>h</mi><mi mathvariant="normal">∣</mi><mi>D</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">p(x \in C|D) = \sum_hp(y=1|x, h)p(h|D)</annotation></semantics></math>p(x∈C∣D)=h∑​p(y=1∣x,h)p(h∣D)</div>
This is the weighted average of each particular hypothesis.
This method of classification is also called Bayes model averaging.

Posterior Predictive Distribution


Hi! I'm Param! This is a place for my personal notes.

My website is https://param.codes, and I write more readable
stuff on my substack: https://newsletter.param.codes.

I've found that taking notes helps me remember things, and it's nice
to look back on information that you processed years ago. I jot down random things, there's no real structure, but some of these
notes could eventually become blog posts.

Some notes you might find interesting:

- [[history.laphams_quarterly.democracy.campaign_finance]]
- [[engineering.being_a_mentor]]
- [[history.china.dynasties]]
- [[history.india.indira_gandhi]]

I also keep notes on books I read [[here|books]].

This is built using [[Dendron|engineering.dendron]] and hosted using
[GitHub Pages](https://github.com/paramsingh/notes).

Get in touch via [Twitter](https://twitter.com/iliekcomputers) or email me at `me [at] param [dot] codes`!