Choosing a Split: Strategic Splitting in Decision Trees

4 min readJul 31, 2024

Today, we’re diving into the fascinating world of decision trees to master the art of choosing the best split, whether we’re classifying data or making predictions through regression. Especially, if you’re just starting out, understanding these concepts will enhance your ability to build accurate and effective models. Let’s get started!

Imagine you’re building a decision tree. You’re standing at a crossroads, deciding which path to take. The path you choose will determine how well your tree performs. This brings us to the quest for the perfect split. So, how do you make the best choice? It all comes down to purity. Yes, purity. The purer your splits, the more accurate your model. Let’s break it down.

The Importance of Purity in Classification

In classification tasks, our goal is to split the data into subsets that are as pure as possible. Think of it like sorting candies by color — you want each group to be all red, all blue, or all green, not a mix. In data terms, purity is measured by entropy.

Entropy is our way of measuring disorder or impurity in the dataset. Lower entropy means higher purity. Here’s the formula for you:

Exemplar: Finding the best split for Classification

Imagine we have three features (Ear shape, Face shape, Whiskers) as in the following figure and we need to decide which one to use as our root node.

So, earlier we talked about different methods for selecting the best split in a decision tree. You might be wondering, “Why not just compute the weighted average entropy for all possible splits and pick the one with the lowest value?” Great thinking! That’s actually a solid approach. However, there’s a more conventional and widely-used way to do this. This nifty formula is called information gain, and it helps us efficiently identify the best split.

Information gain is simply the reduction in entropy. This tells us how much we’ve reduced entropy by making a split. Think of it as our reward for making a good choice. Here’s the formula:

So, now we can calculate the information gain for each split as in the figure.

Looks like the feature Ear shape wins with the highest information gain (0.28). It’s our best choice for the root node. Easy, right?

The Importance of Reducing Variance in Regression

Now, let’s switch gears to regression. Here, we’re all about minimizing variance. Variance measures how spread out our data points are. The smaller the variance, the more precise our predictions.

Variance is like the scatter of your data points. Here’s the formula:

Exemplar: Finding the best split for Regression

Consider the same scenario we discussed previously in regression basis.

Now, unlike classification, where we care about entropy, here we focus on variance. The goal is simple: the lower the variance, the better the split. So, instead of measuring impurity, we look at how spread out our data points are and aim to minimize that spread for the best results.

Similar to the classification case, you might be considering calculating the weighted average variance and picking the smallest one as the best split. That’s again a solid approach! However, we use the concept of reduction in variance here. It’s not just about finding the lowest variance; it’s about how much we can reduce it. Think of this as the regression counterpart to reducing entropy in classification, similar to information gain.

So now we can calculate the information gain for each split as in the figure.

Feature Ear shape wins here with the largest reduction in variance (8.84). It’s our best choice for the split.

To sum up, choosing the right split in a decision tree is crucial for building powerful models. For classification, we focus on minimizing entropy to maximize information gain. For regression, we aim to reduce variance to improve precision. By get to knowing these concepts, you’ll be well on your way to creating decision trees that excel in accuracy and performance. Now, go ahead and experiment with your own datasets. Let’s keep exploring the world of decision trees together!

Choosing a Split: Strategic Splitting in Decision Trees

The Importance of Purity in Classification

The Importance of Reducing Variance in Regression

Written by Nethmi de Silva

No responses yet