AI Tokens

ELI5 (Explain Like I'm 5)

Imagine you are building a castle using Lego blocks. You have to build 4 walls, then a roof, and so on. And each of these walls is made with individual Lego blocks.

Now imagine that sentences are like castles and words are like the walls. To build words and sentences, the AI uses small building pieces called tokens.

The LLM breaks everything you type into tokens and uses them to build an answer to be sent back to you.

The more Lego pieces you use, the more time and space it takes to build the castle. Similarly, longer messages use more tokens, they cost more, and they can take more time to be generated.

How Does Tokenization Work?

Tokenization allows AI models to handle a vast vocabulary more efficiently, recognizing patterns and relationships between common word parts.

Tokens are not exactly words. A token can be a word, part of a word, a number, punctuation, or even a space.

For example, “ChatGPT” may be one token, but a word like “misunderstanding” may be split into “mis,” “understand,” and “ing,” each counted as a separate token.

When a user submits a prompt, the AI first converts this natural language text into a sequence of tokens.

The model then processes these input tokens, using its training to generate a response, which is also produced as a sequence of tokens.

Finally, these output tokens are translated back into natural language for the user.

Different languages can have very different token counts. Emojis, symbols, and code increase token usage.

Why Tokens Matter in Real-World Usage

In practice, token usage affects three things that users care about most: cost, performance, and prompt design.

Cost Control

Tokens are the hidden currency of AI. Since LLM billing is token-based, verbose prompts and long outputs directly impact cost, especially in high-volume or team environments.

Models designed for deeper reasoning typically cost more per token. This is because they perform additional internal computation per token, such as multi-step reasoning, internal planning, or verification before producing an answer.

Therefore, to control costs, you need to be aware of your task complexity and whether it requires a model that charges a higher cost per token.

Performance

Every model has a maximum token window. If your conversation exceeds it, older context is truncated or the request fails. This is why long chats sometimes forget earlier details. If you feel your model is not behaving as expected, you may need to check if you have exceeded the context window.

Faster, latency-optimized models (labeled Flash, Fast, or Mini) usually cost less per token because they perform fewer internal reasoning steps and are optimized for speed and throughput.

Prompt Design

Token awareness improves prompt design by including the right context (inputs, constraints, examples) and clear instructions while avoiding repetitive or irrelevant text. This reduces token wastage, especially in production or team workflows.

Lot of weird behaviors and problems of LLMs actually trace back to tokenization. - Andrej Karpathy

At-a-Glance