The researchers reviewed how LLMs are being used currently as a technological tool and how they may be used in the future as they become more powerful tools and develop into powerful scientific assistants.
They found that LLMs face challenges due to hallucinations that produce plausible but incorrect results, limiting reliability in research and business use. Their black-box nature reduces transparency and trust, while embedded biases from training data risk reinforcing disparities. As a result, AI outputs require verification through human oversight or algorithmic confidence testing.
Hallucinations and accuracy: A double-edged sword
One of the central challenges with LLMs is their tendency to produce hallucinations, which sound plausible but are factually incorrect. The paper said that while such speculative results can sometimes spark creative hypotheses, they are risky if relied upon for experimental validation.
Also Read: Outlook 2026 | How FinOps 2.0 will turn tech spend into business value in the year ahead
The paper emphasises that LLMs must be treated neither as entirely trustworthy nor wholly unreliable. Instead, researchers are encouraged to adopt a model of “algorithmic confidence,” a continuous measure of trustworthiness that quantifies how likely an AI-generated output is to be accurate.
As per the World Bank’s “Digital Progress and Trends Report 2025, Strengthening AI Foundations,” current AI’s inherent flaws may limit its broader economic significance.
The hallucinations that GenAI tools produce are rooted in the mathematical and logical structure of LLMs, making them unreliable in business environments where mistakes can be costly.
Despite their capabilities, such as DeepMind’s AlphaFold dramatically transformed protein structure prediction, a longstanding scientific challenge, using deep learning to predict protein folding accurately. Furthermore, AI4Science could reverse the slowdown in scientific productivity in recent years, where literature search and peer-review evaluation are bottlenecks.
Yet LLMs are not yet equipped to operate as independent scientific agents. The authors stress that all AI-assisted research should be verified either by human experts or through algorithmic confidence testing. This ensures that errors, biases, or hallucinations do not propagate into published research or influence critical decisions.
They found that during the research process, human involvement remains essential to improve the safety, reliability, and effectiveness of LLMs. In literature reviews, humans provide deeper perspectives and guide LLM agents toward the needs of scientists.
During reasoning, humans can identify uncertain thoughts and correct errors, improving the accuracy of chain-of-thought methods. Human scientists also support disambiguation and troubleshooting in LLM-powered systems, help select generated hypotheses to reduce workload, and play a critical role in implementing experiments and correcting invalid experimental plans.
Transparency and interpretability challenges
Another major concern is the black-box nature of LLMs. The inner workings of these models are often opaque, making it difficult to understand why they produce certain outputs.
This lack of interpretability can limit trust, particularly in high-stakes research contexts. Scientists are exploring methods such as neuron activation visualisations, probing, and logit lens techniques to improve transparency. Ironically, LLMs themselves are also being used to explain other black-box systems, showcasing their potential while highlighting the need for continued scrutiny.
Also Read: China hits EU Dairy imports with duties up to 43% after probe
Bias, fairness, and access
LLMs also carry ethical considerations beyond accuracy. While they have the potential to democratise access to scientific knowledge - helping researchers from non-English-speaking backgrounds participate more fully in global science - they can also perpetuate existing biases present in training data. Such biases may influence outputs and reinforce disparities in research, underscoring the importance of careful oversight.
Balancing AI creativity and scientific rigour
The paper notes that LLMs can contribute creatively to hypothesis generation, expanding the boundaries of research. However, over-reliance on AI risks undermining scientific rigour if speculative outputs are treated as validated results. Maintaining human oversight and applying rigorous verification processes are essential to ensure that AI complements rather than compromises research integrity.
As per the above World Bank report, the current AI tools are primarily pattern-recognition engines without true understanding, logical reasoning, or common sense.
Much scientifically valuable knowledge is tacit and context-specific, making it difficult for AI to interpret or apply reliably. Hallucinations in AI outputs, if unchecked, can have serious consequences in both scientific and business applications. Until AI can interact reliably with the physical world and understand context in complex scenarios, human judgment remains indispensable.
Responsible integration is key
The perspective paper concludes that LLMs hold immense promise for accelerating scientific discovery, but their ethical and interpretability limitations cannot be ignored.
Responsible integration requires human oversight, transparency, and careful verification of AI-generated insights. By applying these safeguards, researchers can harness AI’s capabilities while preserving the integrity of scientific inquiry.
The World Bank report also found that AI will become more useful when it can understand and interact with the physical world. At present, AI models are used mainly to optimise software and generate information, text, and images, affecting a narrow range of activities that are mostly confined to virtual spaces.
For AI to have a broader impact, it needs to be able to perceive, comprehend, and respond to physical environments reliably, even in novel and unique situations. By bridging the gap between digital intelligence and physical action, AI could revolutionise more industries and solve practical challenges in ways that go far beyond current applications.
As AI continues to evolve, the scientific community faces a critical question: how much autonomy should be entrusted to machines, and how can humans ensure that innovation remains rigorous, ethical, and reliable? The answer, the authors argue, will shape the next era of science.










