Sitting in a risk review at a large financial institution not long ago, I watched something I have not been able to shake off. A junior analyst had run a counterparty exposure question through an AI system
before the meeting. The output was well-structured, hedged in the right places, fluent in the register of credit risk. He walked the committee through it with the quiet confidence of someone who believed he had prepared correctly. Two things happened. The senior risk officers in the room could not immediately identify what was wrong with the analysis. And they knew something was wrong with it. There were a few seconds of silence, the particular kind that has a texture in rooms like that one, before a senior officer found the thread to pull. She pulled it and the whole thing unraveled in about ninety seconds.
I have been thinking about those ninety seconds ever since.
The public debate about AI and expertise has mostly organised itself around the wrong question. Will machines eventually surpass human judgment? Will doctors, lawyers, engineers be automated away? Those questions have dramatic appeal, but they are probably decades from resolution and possibly unanswerable. The transformation I’m more concerned about is already happening, in rooms like that one, and it works not through capability but through perception. Large language models are not replacing expertise. They are dissolving the social distance that made expertise legible to other people. That is a different problem, and in some ways a harder one.
What expertise actually is tends to get flattened in these conversations. A senior risk officer does not simply hold more information than a junior analyst. She has internalised, through years of graded exposure to actual failure, a set of pattern-recognition heuristics that let her navigate genuine uncertainty. She knows which numbers are structurally suspicious before she can articulate why. That knowledge lives below the threshold of conscious explanation. It accreted slowly, through feedback loops that required real stakes and real consequences. You can’t retrieve it from a database because it was never stored in one.
Transformer-based language models work on a different substrate. They learn the linguistic surface of expertise: the vocabulary, the hedges, the structure of a considered professional judgment. And because they’re trained against the preferences of human evaluators, they get progressively better at satisfying the expectations of people who are not, in most domains, in a position to verify correctness. The system optimises for plausibility. Plausibility is not correctness. But in real time, under actual conditions, without deep domain knowledge of your own, the two are nearly impossible to separate at first contact.
I want to spend a moment on the mechanical clock, because I think it’s more useful than the comparisons we usually reach for.
Before standardised timekeeping, time was locally negotiated. The guild master’s sense of a working day was authoritative in ways that are genuinely hard to reconstruct now. Not because he was more trustworthy, but because he controlled the measurement. When mechanical clocks arrived in European towns in the fourteenth and fifteenth centuries, they didn’t simply make time more accurate. They redistributed who could know what time it was. The asymmetry that gave certain people authority over others’ days collapsed not because workers became more skilled at anything but because the infrastructure changed underneath everyone’s feet. What followed was not a smooth transition. It was a century-long social re-organisation, ugly in places, marked by labour conflicts and institutional collapses that nobody fully anticipated when the first clock tower went up. The people who built it were solving a coordination problem. The destabilisation was a side effect, and it took generations to absorb.
I keep coming back to that clock tower. Not because the analogy is perfect but because it corrects for a particular optimism in the way we tend to narrate technological change: the idea that disruptions get worked through gradually and rationally, with institutions adapting as the technology matures. They don’t, usually. The social re-organisation lags badly behind the shift, and the costs distribute unevenly across the people who couldn’t see it coming.
In the course of research I conducted across financial services and technology governance institutions, something emerged from conversations with senior practitioners that I didn’t quite expect. The anxiety was not about automation in the conventional sense. It was more specific than that, and harder to name. The difficulty of explaining to boards, to regulators, to clients, why a human judgment should carry more weight than a system producing coherent, confident, well-structured analysis at a fraction of the cost. Nobody I spoke with believed their expertise had diminished. What several described, in different ways, was a loss of legibility. The expertise was intact. The audience for it was becoming harder to hold.
This is the institutional problem without a clean answer. Professional licensing, credentialing, the governance infrastructure around expert knowledge, all of it rests partly on the assumption that the gap between expert and non-expert is visible enough to justify protecting. When that gap becomes imperceptible to the people the system is meant to serve, institutions begin functioning more as ritual than as protection. I think this is already underway, quietly, in ways that won’t be obvious until they are.
The failure mode to worry about is not the dramatic one. A system producing plausible expert-sounding analysis most of the time, but wrong in ways that are structurally hard to detect, is not dangerous mainly because of the catastrophic errors. Those tend to surface. It’s dangerous because the plausible outputs accumulate across millions of ordinary decisions, shifting behaviour and eroding deference long before anyone has assembled the data to name what is happening.
What survives in that environment, as the actual marker of deep knowledge, is not the credential. It’s something harder to teach and harder to automate. The capacity to interrogate a confident answer rather than receive it. To notice which assumptions an explanation is quietly carrying past you. To ask the question that exposes the edge of a system’s competence rather than its center. These skills are learnable. They are not currently what we select for, and they are not what AI systems are designed to develop in the people who use them. Those two facts together are worth sitting with.
The officer who unraveled that analysis did so because she had spent 20 years learning to distrust fluency. She had been wrong in exactly that register herself, early in her career, and she had not forgotten what plausible-but-wrong felt like from the inside. That memory was the tool. You don’t develop it from a training dataset. You develop it by being accountable for outcomes over a long time, in an environment that doesn’t let you quietly move on from your errors.
The question that actually keeps me up is not whether AI systems will get better. They will. It’s whether we’ll build the next generation of experts in conditions that still allow for that kind of formation. Or whether we’ll optimise the pipeline for speed and fluency, and discover some years from now that we have produced professionals who are skilled at working with AI outputs and rather less practiced at knowing when not to trust them.
That is not a technology problem. It’s a judgment problem. And judgment, so far, is the one thing that doesn’t transfer.
Aditya Vikram Kashyap is currently Vice-President at Morgan Stanley, New York. Kashyap is an award-winning technology leader. His core competencies focus on enterprise-scale AI, digital transformation, and building ethical innovation cultures. Views expressed are personal and solely those of the author, and do not necessarily reflect News18’s views.













