Use Real Tree to Visualize Your Tree-based Model
Classic Tree Visualization
The tree-based model is quite popular in industrial scenario as it is “white box” models and has strong interpretability. Any user can clearly understand the logic behind the model and be sure how the outcome is predicted.
Regarding the visualization of trees, the most classic one is like this:
This visualization is not bad as it presents clear idea about tree, like a classic car, classic but old-fashioned.
Since it it being the 21st century, why tree model visualization can’t be upgraded to a fancy but practical sports car ? Maybe like below:
What Information Behind Tree View
At the first glance, I have the impression that picture is too fancy to use it officially. But soon, I changed the mind after I digging into the detail of this plot.
- First of all, this view of structure looks like a tree, which is not a disgrace to the name of the Tree model.
- Take a closer look, it is using various colors indicate multiple categories (targets). The conditions for splitting are marked at each fork, so the division logic is clear demonstrated.
- The depth of the tree is also neatly reflected.
- Comparing to the classic tree view, the diameter of the branch (width of the fork) is properly used. It represents the number (proportion) of the samples rather than decoration. More samples under this division condition are assigned , the thicker the branch will be. When we realize that the lowest branches are too thin and fragile, shouldn’t we consider the risk of overfitting? Try to adjust the minimum number of samples to avoid it?
- The branch quantity is too large and not good to view ? First of all, you can save it as an svg file and zoom in to see it. Alternatively, it can be pruned and show only the major layer, e.g. top 3 layers.
Best Case for Classification
In practice, I use it to interpret the abnormal detection models. Since it is an abnormal detection, which means the there are few samples as abnormally. When we plot the tree view, you can see a very thick branch, which means most of the data samples are normal.
As we can see from plot that the conditions of each split point and the proportion of each branch. Likewise, I’ll keep an eye on the forks that are “dancing in the wind” to see if they’re really reasonable forks.
More Than A Single Tree
Of course, we can also draw a “forest” than a single tree.
We can also try AdaBoostClassifier.
Are You Interested?
If you are still staying on this page and glad to use it to explain your models. Just for go it:
$ pip install pybaobabdt
Then draw your trees with single line of code:
import pybaobabdt
## your dt,rf tree#
## e.g. clf = DecisionTreeClassifier().fit(X,y)
ax = pybaobabdt.drawTree(clf, size=10, dpi=72, features=features)
Reference: