From Words to Tokens
When you type a sentence into a generative AI, the first thing it does is chop that sentence into smaller pieces called tokens. [1] Think of it like a chef prepping ingredients. [15] A token isn't always a full word; it can be a part of a word (like 'un-'
or '-ing'), a single character, or a common word like 'the'. [3, 4] For example, the sentence "AI is fascinating!" might be broken into tokens like ["AI", " is", " fascinat", "ing", "!"]. This process, called tokenization, is the fundamental first step that allows a machine to handle the messiness of human language. [5] Instead of needing a dictionary with every word imaginable (including typos and slang), the AI uses a more manageable vocabulary of these recurring pieces. [3, 15]
The Mathematics of Meaning
Once the text is broken into tokens, the AI's real work begins. It can't work with the tokens directly; it needs to convert them into numbers. [1, 11] Each token is transformed into a long list of numbers called a 'vector embedding'. [6, 8] This isn't just a random ID. This vector represents the token's 'meaning' in a vast, multi-dimensional mathematical space. [9] Words with similar meanings or that are used in similar contexts will have vectors that are 'close' to each other in this space. For instance, the vectors for 'king' and 'queen' would be near each other, while the vector for 'king' would be far from 'car'. [13] In this way, the AI doesn't understand meaning like a person does, through experience, but rather through geometry—the relationships and distances between these numerical points. [13]
Predicting the Next Best Token
With language converted into a series of mathematical vectors, the generative AI model can finally get to work on its core task: predicting what comes next. [12] When you ask it a question, the model processes your tokenized input and then calculates the probability for every single token in its vocabulary to be the next one in the sequence. [11] It essentially asks, "Based on the billions of text examples I've been trained on, what is the most statistically likely token to follow this sequence?" It picks the most probable token, adds it to the sequence, and then repeats the process over and over. This chain of predictions, with each new token influencing the next, is how a seemingly simple mathematical operation generates coherent sentences, paragraphs, and entire articles. [2]
Why This Explains AI's Quirks
Understanding tokenization helps demystify both the power and the strangeness of AI. Because the model is fundamentally a token prediction machine, its 'knowledge' is based on statistical patterns, not genuine understanding. [12] This is why AI can sometimes produce 'hallucinations'—responses that sound plausible but are factually incorrect. The model is simply stringing together tokens that are statistically likely to appear together, even if the resulting statement doesn't reflect reality. [9] It also explains why context limits are measured in tokens, not words, and why AI might struggle with novel slang, complex reasoning, or sarcasm—these are situations where statistical patterns from its training data are less likely to provide a good guide for the next token. [2]
















