Glossary
50 terms. No jargon. Real numbers.
50 totalPricing & economics
Token
You're probably spending more than you need to — here's how to fix it
Input price
This is the number your unit economics are built on
Output price
It is always higher than input — knowing why helps you choose
Per 1M tokens
Once you understand this unit, you can price any model in under a minute
Free tier
Free has a ceiling — knowing where it is means no surprise invoice
Rate limit
Hit this in production at 2am and you will never forget it exists
Context caching
You might be paying full price for tokens the provider already has in memory
Batch pricing
Some API calls cost half as much if you can wait
Price per request
The number your CFO actually cares about
Overage
The cost you did not plan for
Batch API
Half price, same model, no latency guarantee
Cost per query
The number your budget actually depends on
Architecture & capabilities
Context window
It determines what your product can actually do — and what it cannot
Function calling
This is what turns a chatbot into a product that actually does things
Vision / image input
Send the image and let the model read it
Streaming (SSE)
The difference between a product that feels alive and one that feels broken
Model Context Protocol (MCP)
Why AI tools suddenly started working together
RAG
Give a model your data without retraining it — at a fraction of fine-tuning cost
Max output tokens
Your output might be getting silently cut off
Tool use
The feature that turns a chatbot into a software agent
System prompt
The instruction layer your users never see
Grounding
The difference between a model that guesses and one that cites
Prompt engineering
Get better results from the same model at the same price
Temperature
The knob that controls creativity vs consistency
Agents
The word everyone uses and almost nobody defines precisely
Structured output
Get JSON, not paragraphs
Models & training
LLM
The term behind everything on this site — and what it actually means
Open weights
Run it yourself or pay forever — this is the real infrastructure decision
Hallucination
Understanding why it happens tells you how to reduce it
Fine-tuning
When RAG is not enough — knowing the difference saves you months
Model parameters
The number in the model name — what 70B actually means
Multimodal
Models that see, hear, and read — and what that costs
Reasoning models
The models that think before they answer
Model family
Why "Claude" means six different things
Quantisation
Run a 70B model on a laptop
Infrastructure & integration
API
Everything you build on top of AI runs through this
REST API
Recognise this pattern once and you understand most of the internet
API key
Leak this once and you will understand why everyone warns about it
SDK
Saves you hours — or locks you in. Know which before you choose.
Latency
Your users measure seconds, not tokens
API endpoint
The URL your application talks to
Webhook
Get notified when something changes — without polling
Throughput
How many requests your provider can actually handle
Async vs sync
The architecture decision behind every API call
Benchmarks & evaluation
MMLU
Everyone cites this score — understanding what it measures helps you decide how …
HumanEval
Before you trust a model with your code, know what this test contains
AI benchmarks
The numbers everyone cites and almost nobody understands
Benchmark gaming
Why the highest score might not be the best model