For a minimum length code, length of symbols should be in the opposite order of the probability of the symbol occuring.

We want a instantaneous, uniquely decodable code for given set of input symbols sis_i, along with their probabilities pip_i.


If pip_i are in descending order, then lil_i should be in ascending order

p1p2...pqp_1 \geq p_2 \geq ... \geq p_q
l1l2...lql_1 \leq l_2 \leq ... \leq l_q


Suppose pip_i are in order, but at least one pair of lil_i are not.

Their contribution to the average length LL is

before=pnln+pmlmbefore = p_nl_n + p_ml_m

After interchanging the two codes

after=pnlm+pmlmafter = p_nl_m + p_ml_m

Difference is

beforeafter=pn(lnlm)+pm(lmln)=(pnpm)(lnlm)before - after = p_n(l_n - l_m) + p_m(l_m - l_n) = (p_n - p_m)(l_n - l_m)

One of these two is factors is negative, hence the difference is negative. So if we interchange, we get a smaller contribution to the average length LL. This means that in a minimum length code, the two running inequalities must be true.

Source: The Art of Doing Science and Engineering: Learning to learn