Surprise #1: They Can Overfit Aggressively
The first big shock for many beginners is how perfectly a decision tree can learn the data you give it—and why that’s a bad thing. This is called overfitting. Imagine you’re training a model to identify
cat photos. An overfit tree doesn’t just learn the general features of a cat (whiskers, pointy ears); it memorizes the exact pixel pattern of every cat in your training photos, including the background noise and lighting quirks. When you show it a new, unseen cat photo, it gets confused. "This cat has a slightly different fur pattern than the 1,432 cats I memorized, so it must not be a cat." It’s like a student who crams for a test by memorizing the answers to practice questions. They’ll ace that specific test but fail spectacularly when given new questions on the same topic. For practitioners, this means a model that looks 100% accurate during training can be completely useless in the real world, leading to flawed predictions about customer behavior or market trends.
Surprise #2: Tiny Data Changes Create Huge Tree Changes
You’d expect a robust model to be stable. If you add a few more data points, the underlying logic shouldn't completely flip, right? Wrong. Decision trees are notoriously unstable. Removing or slightly altering even a small fraction of your training data can cause the algorithm to generate a wildly different tree structure. The root question might change, branches might get reordered, and the final predictions can shift dramatically.
Think of it like a game of Jenga. The tower might be standing, but pulling out a single, seemingly unimportant block from the bottom can make the entire structure unstable and force a complete rebuild. This instability is a major surprise because it undermines a model's reliability. If your model for predicting loan defaults changes drastically every time you add a week's worth of new applications, you can't trust its logic or its output from one day to the next.
Surprise #3: They’re ‘Greedy’ and Not Truly Optimal
Decision tree algorithms are "greedy." This doesn't mean they're malicious; it means they make the best possible decision at each step, without looking ahead to see if that choice will lead to the best overall outcome. At every node, the algorithm asks, "What single question can I ask right now to split the data most cleanly?" It then commits to that question and moves on.
This is like trying to drive from New York to Los Angeles by only ever taking the road that points most directly west at each intersection. You might make great initial progress, but you'll eventually get stuck in a local neighborhood or hit a dead end, when a slightly less direct turn early on would have put you on a major interstate. Because of this greedy approach, a decision tree is almost never the globally optimal tree. There are likely better, more efficient trees out there, but finding them is computationally so expensive it’s practically impossible.
Surprise #4: ‘Interpretability’ Isn't Always So Simple
The number one selling point for decision trees is their interpretability. They are often called "white-box" models because you can supposedly look inside and understand exactly how they work. This is true for a tiny, textbook example with three branches and four leaves. But a real-world decision tree built on a complex dataset can have hundreds or even thousands of nodes.
When you’re faced with a flowchart that’s deeper than the org chart of a multinational corporation, the claim of easy interpretation falls apart. Trying to trace a single prediction through that maze is just as confusing as trying to understand the inner workings of a neural network. The surprise isn't that they *can* be simple, but that they rarely *stay* simple when applied to messy, real-world problems. The promise of crystal-clear logic often gives way to a tangled web of rules that’s anything but intuitive.






