Machine Learning：Decision Tree Models

Published On 2022/06/15 Wednesday, Singapore

Decision Tree Models can be used for both regression and classification tasks. It is the building block for popular ensemble models such as random forest.

A decision tree is a sequence of simple decision rules: one feature and one threshold at a time. Unlike linear models, decision trees are non-parametric models: they are not controlled by a mathematical decision function and do not have weights or intercepts to be optimized.

Regression

from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
tree.fit(X_train, X_train)

Classification

from sklearn.tree import DecisionTreeClassifier

model= DecisionTreeClassifier()
model.fit(X_train, X_train)

Overfitting & Underfitting

There are 2 types of hyperparameters that control the trade-off between underfitting and overfitting.

The depth of the tree: max_depth
The symmetricity of th tree: min_samples_leaf, min_samples_split, max_leaf_nodes, or min_impurity_decrease.

from sklearn.model_selection import GridSearchCV

param_grid = {"max_depth": np.arange(2, 10, 1)}
model = GridSearchCV(DecisionTreeRegressor(), param_grid=param_grid)

Even if we choose the best trade-off, a single decision tree is too limiting. It is more powerful when used as the building block for ensemble models.

Reference & Resources

Tree-based Models, Scikit-learn MOCC

💚 Back to Home