AI fake citations taint research

AI hallucinations have created 146,900 fake citations in research.
Researchers found these non-existent references in four databases.
arXiv now plans to ban authors who submit hallucinated AI content.

Summarized by AI ⓘ

Mastering AI

SEE ALL

NewsBytes

India tests critical systems for risks from Anthropic's Mythos AI

NewsBytes

Plan zero-waste meals using these AI tools

News18

CurvetAI Is Bringing Simplicity to AI Automation and Workflow Management

What is the story about?

Discover how AI's tendency to fabricate information is polluting scientific papers with fake citations, potentially undermining decades of research and global trust in scholarly findings.

The Rise of Fake Citations

The integrity of scientific research hinges on the trustworthiness of its cited sources. However, a recent investigation connected to Cornell and UCLA

has uncovered a disturbing phenomenon: 146,900 AI-generated citations that appear to be entirely fabricated. These fake references have infiltrated scientific papers hosted across four prominent research databases. The underlying issue stems from the inherent limitations of advanced language models like Gemini and ChatGPT, which can produce information that sounds convincing but is factually incorrect – a phenomenon often termed 'hallucination.' When researchers leverage these AI tools to draft citations without meticulous verification, the models may invent references that hold no real-world publication. While scientific research itself is often behind paywalls, its impact is far-reaching, influencing everything from the development of life-saving medicines to innovative solutions for climate change. The inclusion of AI-generated hallucinations in these foundational documents poses a substantial risk, capable of eroding public faith in the reliability and quality of scientific endeavors.

Analyzing the Sloppy Science

To quantify this growing problem, a dedicated research team meticulously analyzed an immense dataset comprising 111 million references drawn from 2.5 million scientific papers. Their objective was to identify citations that did not correspond to any actual published works. While acknowledging that some discrepancies might be simple typographical errors, the team's analysis revealed a significant prevalence of AI-induced hallucinations. The researchers also took into account the historical context of citation manipulation, examining the rates of unmatchable citations in research published before 2023, a period preceding the widespread adoption of AI chatbots. Their findings indicated a marked and sudden increase in non-existent references coinciding with the surge in large language model usage. Crucially, these problematic citations were not isolated to a few outlier papers; they were broadly distributed across a multitude of publications. This widespread distribution strongly suggests that numerous researchers are relying on AI-generated references without undertaking the necessary due diligence to confirm their authenticity, thus propagating the issue.

Understanding the Warning Signs

Usha Haley, a professor of management at Wichita State University, has voiced significant concern regarding the escalating proliferation of these fake citations, labeling it a critical warning signal for the scientific community. She articulated that the presence of fraudulent or AI-generated citations fundamentally undermines the trust placed in the scholarly record, which serves as the bedrock for the entire process of peer review and the cumulative advancement of knowledge. Haley further emphasized the gravity of the situation by noting that this growing skepticism is not originating from external critics but is emerging from within academia itself, particularly from early-career scholars. The four major databases where these questionable citations were identified – arXiv, bioRxiv, SSRN, and PubMed Central – are scientific repositories that hold considerable influence in the research ecosystem. Researchers commonly upload their papers to these platforms before formal journal publication, enhancing their visibility and enabling immediate access for the global scientific community. The very paper detailing the issue of AI hallucinating citations is itself currently hosted on arXiv, highlighting the immediate relevance of the problem. In response, arXiv has begun implementing measures to curb the influx of false citations, including a policy to ban authors who submit work containing hallucinated citations or any unchecked AI-generated content. Steinn Sigurdsson, arXiv's scientific director, previously stated that the scientific corpus is becoming diluted by a large volume of AI content that is either actively incorrect or meaningless, contributing to noise and making it more difficult to discern genuine scientific progress, potentially leading researchers astray.

AI fake citations taint research

Related Stories

The Rise of Fake Citations

Analyzing the Sloppy Science

Understanding the Warning Signs

AI Generated Content

AI Generated Content

More stories you might like

Mumbai Police Deploys Massive Security Cover Across City Ahead Of Bakri Eid 2026; Over 10,000 Personnel On Duty

Mumbai Weather Update: Heatwave Alert Continues As IMD Sees No Rain Until May 31; Showers Likely By June 4-5

Mumbai: BMC MARD Issues Ebola Advisory To Resident Doctors Across Civic Hospitals

Cong will ensure no Mumbaikar is removed from voters' list during upcoming SIR process: Gaikwad

MCD removes encroachments in Qutub Institutional Area

Run only by Europeans: The Mumbai club in focus amid Delhi Gymkhana row

Mumbai Weather Update: City Records Highest May Temperature In 10 Years At 37.8°C As IMD Issues Heatwave Alert

Mumbai: BEST Announces Additional Bus Services For Bakri Eid On May 29 To Manage Passenger Rush

Congress holds protest in Mumbai against hikes in petrol, diesel prices

India amongst front-runners to drive AI augmentation, demographic shifts and energy security in the APAC region: Colliers India

AI Generated