Lesson 6
-
gini - how likely is it that you go into the sample and grab two items, they're the same.
-
less processing for decision trees, no real dummy variables.
-
don't have to care about outliers, categorical variables
-
for tabular data, could start with decision trees for baseline approaches
-
really hard to mess it up.
-
can't split too deep because at some point the leaf nodes will not have much data.
-
bagging: take the average of a number of models.
-
find importances of the columns from random forests
- Create an effective validation set
- Iterate rapidly to find changes which improve results on a validation set.
dls.test_dl