From Words to Tokens
The first step in an AI’s process is called tokenization. Instead of seeing a sentence, the model breaks the text down into smaller chunks called 'tokens'. [1, 5] These can be whole words, parts of words, or even individual characters and punctuation
marks. [2, 13] For example, the sentence "AI is powerful" might become three tokens: "AI", "is", and "powerful". More complex words like "unbelievable" might be split into subwords like "un", "believ", and "able", which allows the model to handle words it has never seen before by recognizing their component parts. [6] This process transforms an unstructured sentence into a structured list that a machine can begin to process. [3]
The Dictionary of Numbers
Computers can't work with words directly; they require numbers. [4, 13] After tokenization, each unique token is assigned a specific number from a vast vocabulary, which can contain tens of thousands of tokens. [4, 17] So, in our example, "AI" might become 502, "is" might be 4, and "powerful" might be 987. The sentence "AI is powerful" is no longer text; it's a sequence of numbers: [502, 4, 987]. This numerical conversion is the fundamental bridge between human language and the mathematical world of AI. [3, 22] At this stage, however, these numbers are just simple identifiers, like names in a phonebook. They don't yet capture the rich meaning of the words they represent.
Creating a Map of Meaning
This is where the true magic happens. The simple numerical IDs are transformed into complex 'word embeddings' or 'vectors'. [8, 12, 19] Instead of a single number, each token is represented by a list of hundreds or even thousands of numbers—a vector. [4] This vector acts like a coordinate, placing the word in a vast, multi-dimensional space. [15, 20] In this space, words with similar meanings are positioned close together. [8, 19] The vector for 'cat' will be near the vector for 'kitten', and 'king' will be near 'queen'. This process creates a giant 'map' where the location and direction of each vector encodes its semantic relationship to all other words. [16]
The Power of Matrix Math
These vectors are then organized into large tables of numbers, known as matrices. [9, 21] All of the AI's complex reasoning is performed through mathematical operations on these matrices. [9, 18, 25] Famously, word embeddings can capture analogies through simple vector math. For example, the calculation 'vector(King) - vector(Man) + vector(Woman)' results in a vector that is extremely close to 'vector(Queen)'. [7] The AI isn't thinking about royalty or gender; it's simply calculating the spatial relationships between these points on its high-dimensional map. [23] This ability to perform mathematical operations on concepts is what allows AI to generate coherent, contextually relevant text. [14]
Predicting the Next Number
Ultimately, a large language model is a prediction engine. When it 'writes' a response, it's not crafting sentences based on understanding. Instead, based on the mathematical patterns it has learned from analyzing billions of texts, it is calculating the most probable next token (number) in the sequence. [2, 12] After it predicts a number, it converts it back into a token, adds it to the sequence, and then repeats the process, predicting the next most likely number. It continues this process, token by token, until it generates a complete response. The human-like fluency we see is the result of countless matrix multiplications and probability calculations, not genuine comprehension. [9, 24]
















