@[toc]
- Training error: model error on the training data
- Generalization error: model error on new data
training | error | ||
---|---|---|---|
Low | High | ||
generalization | Low | Good | Bug? |
error | High | Overfitting | Underfitting |
data | complexity | ||
---|---|---|---|
Low | High | ||
model | Low | Normal | Underfitting |
complexity | High | Overfitting | Normal |
Model Complexity
-
The capacity of a set of function to fit data points
-
In ML, model complexity usually refers to:
-
The number of learnable parameters
-
The value range for those parameters
-
-
It’s hard to compare between different types of ML models
- E.g. trees vs neural network
-
More precisely measure of complexity: VC dimension
- VC dim for classification model:
the maximum number of examples the model can shatter
- VC dim for classification model:
Data Complexity
-
Multiple factors matters
-
of examples
-
of features in each example
-
the separability of the classes
-
-
Again, hard to compare among very different data
- E.g a char vs a pixel
-
More precisely, Kolmogorov complexity
- A data is simple if it can be generated by a short program
Generalization error
-
Generalization error bound (an informal statement)
|error on unseen data - training error|
- D: VC-dim, M: number of training examples
-
Generalization error also depends on the training algorithm
-
Adding regularization can penalize complex models
-
Model trained with stochastic gradient methods generalizes better
-
Model Selection
- Pick a model with a proper complexity for your data
- Minimize the generalization error
- Also consider business metrics
- Pick up a model family, then select proper hyper-parameters
- Trees: #trees, maximal depths
- Neural networks: architecture, depth (#layers), width (#hidden units), regularizations
Summary
- We care about generalization error
- Model complexity: the ability to fit various functions
- Data complexity: the richness of information
- Model selection: match model and data complexities
网友评论