K Nearest Neighbours

  • non-parametric classifier
  • Simply look at the K points in the training set that are closest to the input, counts how many members of each class are in this set, and returns the probability
p(y=cx,D,K)=1KiNk(x,D)I(yi=c)p(y=c|x, D, K) = \frac{1}{K}\sum_{i \in N_k(x, D)}{I(y_i = c)}


Nk(x,D)N_k(x, D) are the indices of the K nearest points to xx in DD.

I(e)I(e) is the indicator function defined as followes:

I(e)={1 if e is true2 if e is falseI(e) = \{\begin{array}{lr} 1 \text{ if } e \text{ is true} \\ 2 \text{ if } e \text{ is false} \end{array}
  • Generally works quite well if the distance metric is good and has enough labeled training data.
  • The main problem is that they do not work well with high dimensional inputs due to the need for dense datasets and the Curse of Dimensionality.