Analyzing decision tree bias towards the minority class
There is a widespread and longstanding belief that machine learning models are biased towards the majority class when learning from imbalanced binary response data, leading them to neglect or ignore the minority class. Motivated by a recent simulation study that found that decision trees can be biased towards the minority class, our paper aims to reconcile the conflict between that study and other published works. First, we critically evaluate past literature on this problem, finding that failing to consider the conditional distribution of the outcome given the predictors has led to incorrect conclusions about the bias in decision trees. We then show that, under specific conditions, decision trees fit to purity are biased towards the minority class, debunking the belief that decision trees are always biased towards the majority class. This bias can be reduced by adjusting the tree-fitting process to include regularization methods like pruning and setting a maximum tree depth, and/or by using post-hoc calibration methods. Our findings have implications on the use of popular tree-based models, such as random forests. Although random forests are often composed of decision trees fit to purity, our work adds to recent literature indicating that this may not be the best approach.
💡 Research Summary
The paper challenges the long‑standing belief that decision trees are inherently biased toward the majority class in imbalanced binary classification problems. After a concise introduction that highlights the importance of correctly modeling rare events in domains such as cancer detection, flood forecasting, suicide ideation, and terrorism, the authors review the literature and point out a common methodological flaw: many studies evaluate bias using a fixed 0.5 decision threshold without considering the conditional distribution of the outcome given the predictors. This can make a perfectly calibrated tree appear to “neglect” the minority class simply because the optimal threshold for rare events is far below 0.5.
The core contribution consists of two theoretical scenarios. In the first, the outcome Y is deterministic given a single covariate X, which follows a uniform distribution on opposite intervals for the two classes. The optimal decision boundary is a true threshold t. When a tree is grown to purity, it selects a split at the average of the maximum X among negative observations and the minimum X among positive observations. By deriving the expected value of this estimated threshold T̂ as a function of the class prevalence p, sample size n, and true t, the authors show that E
Comments & Academic Discussion
Loading comments...
Leave a Comment