Machine Learning:Decision Tree Models
Published On 2022/06/15 Wednesday, Singapore
Decision Tree Models can be used for both regression and classification tasks. It is the building block for popular ensemble models such as random forest.
A decision tree is a sequence of simple decision rules: one feature and one threshold at a time. Unlike linear models, decision trees are non-parametric models: they are not controlled by a mathematical decision function and do not have weights or intercepts to be optimized.
Regression
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
tree.fit(X_train, X_train)
Classification
from sklearn.tree import DecisionTreeClassifier
model= DecisionTreeClassifier()
model.fit(X_train, X_train)
Overfitting & Underfitting
There are 2 types of hyperparameters that control the trade-off between underfitting and overfitting.
- The depth of the tree:
max_depth
- The symmetricity of th tree:
min_samples_leaf
,min_samples_split
,max_leaf_nodes
, ormin_impurity_decrease
.
from sklearn.model_selection import GridSearchCV
param_grid = {"max_depth": np.arange(2, 10, 1)}
model = GridSearchCV(DecisionTreeRegressor(), param_grid=param_grid)
Even if we choose the best trade-off, a single decision tree is too limiting. It is more powerful when used as the building block for ensemble models.
Reference & Resources
- Tree-based Models, Scikit-learn MOCC