What Is a ‘Baseline,’ Anyway?
In machine learning, a baseline is a simple, well-understood model that serves as a reference point. [1, 2, 6] Think of it as the current champion you have to beat. If you’re claiming your new algorithm is revolutionary, you first need to prove it’s better
than the existing, often simpler, methods. [2, 4] This could be a basic statistical model, a 'dummy' algorithm that just predicts the most common outcome, or the previous state-of-the-art model in your specific area. [1, 4] A baseline's performance provides context. [2] An accuracy score of 90% sounds great in isolation, but it’s meaningless if a simple, five-line code solution can achieve 89%. The baseline grounds a research claim in reality, forcing authors to answer the most important question: is your complex new solution actually an improvement?
The Temptation of the Straw Man
Here’s where the trouble begins. In the hyper-competitive world of academic publishing, there’s immense pressure to show a significant leap forward. This creates a powerful incentive to compare a new model against a deliberately weak or poorly tuned baseline—a practice known as building a 'straw man.' It’s like a heavyweight boxer bragging about knocking out an amateur. Many researchers, in a rush to publish, fall prey to this. A recent systematic review of papers in one subfield found that 79% of articles claiming to outperform standard methods were, in fact, comparing themselves to weak baselines. [25, 27, 29] This inflates the perceived contribution of the new research, creating what some call 'overoptimistic' results that pollute the field and mislead other scientists. [24, 25, 27]
A Reproducibility Crisis in the Making
The problem of weak baselines is a major contributor to what's known as the 'reproducibility crisis' in machine learning. [13, 16, 17] When new research is built on the foundation of a paper with exaggerated claims, the whole structure becomes wobbly. Other researchers waste time trying to replicate or build upon results that were never as strong as they appeared. [16, 22] This has led conference reviewers at top-tier venues like ICML to become increasingly skeptical. [11, 20] They are now trained to hunt for this specific flaw. A poorly chosen baseline is no longer seen as a minor oversight but as a potential red flag indicating a lack of scientific rigor. [13] For authors submitting to ICML 2026, whose review process is notoriously rigorous, getting the baseline right isn't just good practice—it's a matter of survival. [7, 15]
How to Build an Honest Baseline
Choosing a strong baseline is a sign of intellectual honesty. It shows you're committed to pushing the field forward, not just your publication numbers. So, what does a good baseline look like? It should be a strong, commonly accepted method for the task at hand. If it’s a well-studied problem, this usually means comparing against the current state-of-the-art. Researchers should also ensure they've put in a reasonable effort to tune the baseline's hyperparameters, giving it a fair shot to perform well. Finally, the best papers often include multiple baselines, from the simple to the complex. [8] This provides a much richer context for the results and demonstrates a thorough, good-faith evaluation. It’s more work, but it’s what separates a paper that simply gets accepted from one that truly makes an impact.













