live · platform v1.0.0 · 271 entities · 174 companies
#25 of 50

Model parameters

The number in the model name — what 70B actually means
Worked example

What does 70B mean?

When you see Llama 3.3 70B, the 70B is the parameter count. 70 billion parameters. It is not a version number. It is not a benchmark score. It is a measure of the model’s size — specifically, the number of numerical values learned during training that determine how the model responds.

A parameter is a weight in a neural network. A number, adjusted billions of times during training, until the model’s outputs match what good outputs should look like. The final values of all those numbers — frozen after training — are the model weights. The count of those numbers is the parameter count.

The A-to-B that makes it concrete

GPT-3 in 2020: 175 billion parameters. Trained on approximately 45TB of text. Took weeks to train on hundreds of GPUs. Cost estimated at several million dollars. State of the art for its time.

Llama 3.3 70B in 2024: 70 billion parameters. Runs on a single high-end consumer GPU. Available as open weights. Performance competitive with models that were closed, expensive, and inaccessible three years earlier.

The number went down. The capability did not. Training efficiency improved faster than parameter counts grew.

What the number tells you — and what it does not

More parameters generally means more capability, more nuance, better reasoning. A 70B model is generally more capable than a 7B model from the same family. A 405B model is generally more capable than a 70B.

But parameter count across model families is not directly comparable. A 7B model from 2024 can outperform a 70B model from 2022. Architecture improvements, training data quality, and fine-tuning all matter as much as raw size.

The number is a useful guide within a model family. Across families, compare benchmark scores and test on your actual task.

Why this matters to you

Parameter count predicts hardware requirements for open weights models. A 7B model runs on a laptop. A 70B model needs a serious GPU. A 405B model needs multiple GPUs. If you are evaluating whether to self-host a model, the parameter count is the first number to check against your available hardware.

For hosted APIs, parameter count correlates loosely with price and latency. Larger models cost more and respond more slowly. Smaller models are cheaper and faster. The right size is the smallest one that does your task well.