What's Happening?
Researchers at the Center for AI Safety and Scale AI have developed a new test called 'Humanity’s Last Exam' to evaluate the capabilities of current artificial intelligence (AI) models against human-level knowledge. Launched in January 2025, the exam
comprises 2,500 questions across over 100 subjects, crafted with input from more than 1,000 experts from 500 institutions worldwide. The test is designed to be extremely challenging, with questions that are precise, unambiguous, and not easily answerable through internet searches. Initial testing of AI models like OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro showed low scores, with OpenAI’s o1 model achieving only 8.3%. As of February 2026, the highest score recorded is 48.4% by Google’s Gemini 3 Deep Think, while human experts typically score around 90% in their fields.
Why It's Important?
The development of 'Humanity’s Last Exam' represents a significant step in assessing the progress of AI towards achieving human-level intelligence. This test challenges AI models to demonstrate their ability to process and understand complex, domain-specific knowledge without relying on simple data retrieval. The results highlight the current limitations of AI in reaching artificial general intelligence (AGI), a milestone that would signify machines possessing the ability to perform any intellectual task that a human can. The exam's rigorous standards and broad scope provide a more comprehensive measure of AI capabilities, pushing the boundaries of what these technologies can achieve and informing future AI development strategies.
What's Next?
As AI models continue to evolve, researchers anticipate that these systems may eventually surpass the 50% accuracy threshold on 'Humanity’s Last Exam'. This would mark a significant advancement in AI capabilities, though it would not necessarily indicate the achievement of AGI. The ongoing development and refinement of AI models will likely focus on improving their ability to handle complex, non-searchable questions, which could lead to breakthroughs in various fields such as scientific research, technology, and education. The results of this exam may also influence policy discussions around AI safety and ethics, as stakeholders consider the implications of increasingly intelligent machines.
Beyond the Headlines
The introduction of 'Humanity’s Last Exam' raises important ethical and philosophical questions about the nature of intelligence and the role of AI in society. As AI models become more sophisticated, there is a growing need to address issues related to transparency, accountability, and the potential impact on employment and privacy. The exam also underscores the importance of interdisciplinary collaboration in AI research, as experts from diverse fields contribute to the development of more robust and reliable AI systems. These discussions will be crucial in shaping the future of AI and ensuring that its benefits are realized in a responsible and equitable manner.









