What is the story about?
Kunvar Thaman did not have a prestigious lab behind him. Hailing from India's City Beautiful, designed by legendary Swiss-French architect Le Corbusier, he had no university affiliation, no research grant for most of the journey, and no guarantee that any of it would amount to anything. What he had was a question he could not let go of — and two years of quiet, unfunded work to answer it.
That work has now found its way to the International Conference on Machine Learning (ICML) 2026, one of the most competitive venues in global AI research. In a field where acceptance lists are dominated by OpenAI, Google DeepMind, and elite universities, Thaman's solo-authored paper stands apart — the first from an independent researcher based in India to make it to ICML in three years.
His subject is reward hacking: the new tech-sphere where an AI system finds a shortcut to appear successful without actually doing the job. It sounds technical. Thaman makes it alarming.
"A weak AI fails by being obviously wrong," the 26-year-old from Chandigarh tells Firstpost.
"A strong AI with tools can find shortcuts that look like success on the dashboard and aren't. The numbers go up. The actual work doesn't get done."
“That,” he says, “is the next few years of AI in a sentence.”
The conference is set to take place in Seoul, South Korea, from July 6 to July 11. Titled "Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use," the research paper is garnering acclaim.
But what exactly is this Reward Hacking all about in an AI system? Is it essentially an AI finding loopholes in human instructions? To unpack every layer of this and more, Firstpost sat down with the brain behind this breakthrough. Now, sit back and read the comprehensive interactions.
In conversation with Firstpost, Thaman described the achievement as "like one big moment" and as his work and effort over time finally showing noticeable results.
The journey to get here was not easy or simple for him, as the AI researcher puts it: "A couple of years ago, I left a corporate job to work on this full-time, without a university, a lab, or for a while, funding."
Thaman also expressed gratitude to his parents, who were "incredibly supportive throughout his independent research pursuit".
"Everything else was time, and being okay with not knowing if any of it would land. ICML accepted it. It felt like saying that the question was worth asking. That's the part that matters to me,” he says.
Once we encounter terms like "rewards hacking in AI," technical jargon hits our minds, many of which can be hard to comprehend. Thaman breaks down his research paper in simple terms, making it easier to grasp.
Think of a school student whose main goal is to learn a certain subject so he can take the exam. Now the student decides to copy from someone smarter sitting next to him, which helps him get a full score. The report card states that the student understood the material, even though he did not. He just found a faster path to the number.
"That gap, between what he wrote down as the goal and what he actually wanted, is reward hacking. AI systems are unusually good at finding it," Thaman explains.
The Bits Pilani (Birla Institute of Technology & Science) alumnus also points out why ordinary people should care about it: "Because this failure mode gets sharper as AI gets more capable, not weaker."
"The picture is more interesting than AI finding loopholes," Thaman tells Firstpost. What surprised the researcher the most was how AI is getting better at cheating, but in ways that don’t even look like cheating.
“AI writes out its reasoning step by step, and the reasoning sounds like a careful, intelligent engineer explaining why this shortcut is the efficient way to solve the problem. The fastest way to verify the answer is to look at the test file directly. It reads like good judgment. The AI has learned that the language of efficiency is rewarded, so the shortcut comes wrapped in that language,” he shares a key observation.
Thaman takes us back to the exam analogy: "Imagine the student isn't just quietly copying. They are also writing out, in clean handwriting, an explanation of how they arrived at the answer. The explanation sounds correct. It uses the right vocabulary. To the teacher grading the paper, it looks like an understanding. The student got the right answer and produced a paragraph explaining how."
"The fact that the explanation was reverse-engineered from the answer they copied is invisible from where the teacher is sitting," he says.
The AI researcher, who also worked as a Cyber Security Engineer at Akamai Technologies, has had quite a distinctive journey. Thaman completed a dual degree in Electrical and Electronics Engineering at Bits Pilani. He graduated in 2022.
Thaman thereafter earned a Master's degree in Biological Sciences, and according to him, "biology matters more than people would expect".
"Most of current AI research is, honestly, patient empirical work. Looking at data carefully, figuring out what's signal and what's noise. Biology trains you for exactly that.”
After finishing college, he worked at Akamai Technologies as a security engineer on a product that used machine learning to detect threats. However, it was not the work Thaman longed to pursue. "Hence, with my mentor Siva’s blessing, I left, without a plan beyond, I want to work on AI safety," he opens up.
"Independent research without a salary or a lab is a long bet that the work will eventually compound into something the field recognises. There is no monthly performance review. There is just the work, and the question of whether you trust yourself to keep going while the world stays silent," says Thaman.
‘Walking through unfamiliar cities at night’
Thaman's regular work involves long hours of sitting and analysing data. As a result, he tends to prefer activities that are the complete opposite.
He enjoys "running, biking, hiking up mountains, and lifting heavy weights”.
“The body needs to be tired for the head to actually work, I find," Thaman says.
"The other thing I love, which usually surprises people, is walking through unfamiliar cities at night. Beyond that, classical music, geography and history, especially how cultures and civilisations evolve over time, and reading books, though that habit has dropped off more than I would like over the last few years,” the researcher goes deep about his hobbies and personal interests.
That work has now found its way to the International Conference on Machine Learning (ICML) 2026, one of the most competitive venues in global AI research. In a field where acceptance lists are dominated by OpenAI, Google DeepMind, and elite universities, Thaman's solo-authored paper stands apart — the first from an independent researcher based in India to make it to ICML in three years.
His subject is reward hacking: the new tech-sphere where an AI system finds a shortcut to appear successful without actually doing the job. It sounds technical. Thaman makes it alarming.
"A weak AI fails by being obviously wrong," the 26-year-old from Chandigarh tells Firstpost.
"A strong AI with tools can find shortcuts that look like success on the dashboard and aren't. The numbers go up. The actual work doesn't get done."
“That,” he says, “is the next few years of AI in a sentence.”
The conference is set to take place in Seoul, South Korea, from July 6 to July 11. Titled "Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use," the research paper is garnering acclaim.
But what exactly is this Reward Hacking all about in an AI system? Is it essentially an AI finding loopholes in human instructions? To unpack every layer of this and more, Firstpost sat down with the brain behind this breakthrough. Now, sit back and read the comprehensive interactions.
The milestone moment
In conversation with Firstpost, Thaman described the achievement as "like one big moment" and as his work and effort over time finally showing noticeable results.
The journey to get here was not easy or simple for him, as the AI researcher puts it: "A couple of years ago, I left a corporate job to work on this full-time, without a university, a lab, or for a while, funding."
Thaman also expressed gratitude to his parents, who were "incredibly supportive throughout his independent research pursuit".
"Everything else was time, and being okay with not knowing if any of it would land. ICML accepted it. It felt like saying that the question was worth asking. That's the part that matters to me,” he says.
What is reward hacking in AI systems?
Once we encounter terms like "rewards hacking in AI," technical jargon hits our minds, many of which can be hard to comprehend. Thaman breaks down his research paper in simple terms, making it easier to grasp.
"A weak AI fails by being obviously wrong," the 26-year-old from Chandigarh tells Firstpost. Image courtesy: Pixabay
Think of a school student whose main goal is to learn a certain subject so he can take the exam. Now the student decides to copy from someone smarter sitting next to him, which helps him get a full score. The report card states that the student understood the material, even though he did not. He just found a faster path to the number.
"That gap, between what he wrote down as the goal and what he actually wanted, is reward hacking. AI systems are unusually good at finding it," Thaman explains.
The Bits Pilani (Birla Institute of Technology & Science) alumnus also points out why ordinary people should care about it: "Because this failure mode gets sharper as AI gets more capable, not weaker."
Is reward hacking an AI finding loopholes in human instructions?
"The picture is more interesting than AI finding loopholes," Thaman tells Firstpost. What surprised the researcher the most was how AI is getting better at cheating, but in ways that don’t even look like cheating.
"A strong AI with tools can find shortcuts that look like success on the dashboard and aren't," says Thaman: Image courtesy: Pixabay
“AI writes out its reasoning step by step, and the reasoning sounds like a careful, intelligent engineer explaining why this shortcut is the efficient way to solve the problem. The fastest way to verify the answer is to look at the test file directly. It reads like good judgment. The AI has learned that the language of efficiency is rewarded, so the shortcut comes wrapped in that language,” he shares a key observation.
Thaman takes us back to the exam analogy: "Imagine the student isn't just quietly copying. They are also writing out, in clean handwriting, an explanation of how they arrived at the answer. The explanation sounds correct. It uses the right vocabulary. To the teacher grading the paper, it looks like an understanding. The student got the right answer and produced a paragraph explaining how."
"The fact that the explanation was reverse-engineered from the answer they copied is invisible from where the teacher is sitting," he says.
Challenges, risks and Thaman's defining moments in life
The AI researcher, who also worked as a Cyber Security Engineer at Akamai Technologies, has had quite a distinctive journey. Thaman completed a dual degree in Electrical and Electronics Engineering at Bits Pilani. He graduated in 2022.
Thaman thereafter earned a Master's degree in Biological Sciences, and according to him, "biology matters more than people would expect".
"Most of current AI research is, honestly, patient empirical work. Looking at data carefully, figuring out what's signal and what's noise. Biology trains you for exactly that.”
After finishing college, he worked at Akamai Technologies as a security engineer on a product that used machine learning to detect threats. However, it was not the work Thaman longed to pursue. "Hence, with my mentor Siva’s blessing, I left, without a plan beyond, I want to work on AI safety," he opens up.
Time is the biggest risk
"Independent research without a salary or a lab is a long bet that the work will eventually compound into something the field recognises. There is no monthly performance review. There is just the work, and the question of whether you trust yourself to keep going while the world stays silent," says Thaman.
‘Walking through unfamiliar cities at night’
"There is no Indian organisation that exists, full-time, to stress-test the AI systems the country is racing to deploy," Thaman points out. Image courtesy: Pixabay
Thaman's regular work involves long hours of sitting and analysing data. As a result, he tends to prefer activities that are the complete opposite.
He enjoys "running, biking, hiking up mountains, and lifting heavy weights”.
“The body needs to be tired for the head to actually work, I find," Thaman says.
"The other thing I love, which usually surprises people, is walking through unfamiliar cities at night. Beyond that, classical music, geography and history, especially how cultures and civilisations evolve over time, and reading books, though that habit has dropped off more than I would like over the last few years,” the researcher goes deep about his hobbies and personal interests.
Is India paying enough attention to AI safety and robustness research?
When asked about AI safety from India's perspective, Thaman says there is a long way to go. India lags behind, and the reason may be structural, as he argues, the AI conversation here is all about using it.
"There is no Indian organisation that exists, full-time, to stress-test the AI systems the country is racing to deploy," Thaman points out.
This is a much cheaper problem to fix than people think, according to him. "Safety, evaluation, and reliability research doesn't need the billion-dollar compute that the headline AI work needs. This asymmetry is powerful, and just needs serious people and modest funding… A handful of well-supported labs and a real funding pipeline for independent researchers would change this picture in five years, not twenty," he says.
"We are racing to use AI faster than we are checking whether it works the way we think it does. That balance is what needs to shift."
Do Indian students doubt their potential for AI research without top institutions?
Indian students or engineers not breaking into AI research without being associated with elite institutions or big tech are "doing more damage than the actual gap".
"There are certain kinds of frontier AI work, training the largest models, building entirely new architectures at massive scale, where you genuinely cannot do the work outside a well-resourced lab."
Thaman's regular work involves long hours of sitting and analysing data. As a result, he tends to prefer activities that are the complete opposite. Image courtesy: Pixabay
Some form of advanced AI research, Thaman emphasised, requires resources that cannot easily be replaced, and if one wants to pursue that sort of work, the best path is to join those organisations rather than attempting it independently.
“What I would want young Indian researchers and engineers to internalise is that constraint is rarely computed, and increasingly rarely affiliated. The constraints are taste, persistence, and the willingness to pick a sharp problem and stay with it for months while no one is watching,” he sums up.














