Lesson 6

  • gini - how likely is it that you go into the sample and grab two items, they're the same.

  • less processing for decision trees, no real dummy variables.

  • don't have to care about outliers, categorical variables

  • for tabular data, could start with decision trees for baseline approaches

  • really hard to mess it up.

  • can't split too deep because at some point the leaf nodes will not have much data.

  • bagging: take the average of a number of models.

  • find importances of the columns from random forests

  • https://explained.ai/gradient-boosting/

  1. Create an effective validation set
  2. Iterate rapidly to find changes which improve results on a validation set.

dls.test_dl