Hardware — oss.report

◇

Apple Silicon

Unified memory = VRAM. The reason Macs became AI machines.

Mac Studio M3 Ultra (512GB) 512GB unified

Fits: 405B full precision, any model you want

The ultimate local AI machine. From $7,999

Mac Studio M3 Ultra (256GB) 256GB unified

Fits: 405B Q4, 70B full precision

Serious AI workstation. 819GB/s bandwidth

Mac Studio M3 Ultra (96GB) 96GB unified

Fits: 70B Q4-Q6, 34B full

Entry M3 Ultra config

Mac Studio M4 Max (128GB) 128GB unified

Fits: 70B Q4, 34B full

Max chip, not Ultra. Solid for most models. From $3,299

MacBook Pro M4 Max (128GB) 128GB unified

Fits: 70B Q4, portable

Laptop that runs 70B models

MacBook Pro M4 Pro (48GB) 48GB unified

Fits: 34B Q4, most 7-13B comfortably

Practical daily driver for local AI. From $2,499

Mac Mini M4 Pro (64GB) 64GB unified

Fits: 34B full, 70B Q3

Tiny form factor, real AI power. From $1,799

Mac Mini M4 (32GB) 32GB unified

Fits: 13B Q4, 7B full

Entry point — tight on anything over 13B. From $799

△

NVIDIA Desktop / Workstation

Consumer and professional GPUs with CUDA.

NVIDIA RTX PRO 6000 Blackwell 96GB GDDR7

Fits: 70B full, 405B Q4

Workstation class. Blackwell architecture, 96GB is a game changer

NVIDIA DGX Spark 128GB unified

Fits: 200B models locally

Grace Blackwell desktop supercomputer. GB10 chip, 1 PFLOP FP4. ~$3,000

GeForce RTX 5090 32GB GDDR7

Fits: 13B Q8, 34B Q4

New flagship. 32GB finally addresses the VRAM ceiling. ~$1,999

GeForce RTX 5080 16GB GDDR7

Fits: 7-13B Q4

Blackwell consumer mid-range. ~$999

GeForce RTX 5070 Ti 16GB GDDR7

Fits: 7-13B Q4

Good value Blackwell. ~$749

GeForce RTX 5070 12GB GDDR7

Fits: 7B Q4-Q6

Entry Blackwell. ~$549

GeForce RTX 4090 (prev gen) 24GB GDDR6X

Fits: 13B Q4, 7B full

Previous gen but 24GB still useful. Used ~$1,200

GeForce RTX 3090 (prev gen) 24GB GDDR6X

Fits: 13B Q4, 7B full

Used market gem — 24GB for ~$700

○

AMD Consumer GPUs

ROCm support improving but still behind CUDA.

Radeon RX 9070 XT 16GB GDDR6

Fits: 7-13B Q4

Latest RDNA 4. ROCm + llama.cpp works

Radeon RX 7900 XTX 24GB GDDR6

Fits: 13B Q4, 7B full

Best AMD VRAM option. ROCm improving

□

Data Center / Cloud GPUs

What the providers run. Relevant if you rent your own.

NVIDIA B200 (Blackwell) 192GB HBM3e

Fits: 405B+ full precision

Latest Blackwell. 2x perf over H100. Shipping 2025

NVIDIA GB200 (Grace Blackwell) 384GB HBM3e (2x B200)

Fits: Trillion+ parameter models

Grace CPU + 2x B200 in NVL72 racks

NVIDIA H200 141GB HBM3e

Fits: 70B+ full precision, 405B Q4

H100 successor. More HBM, faster. ~$25-30K

NVIDIA H100 (80GB) 80GB HBM3

Fits: 70B full, 405B Q4 (multi-GPU)

Workhorse. Widely available on cloud ~$2-3/hr

NVIDIA L40S 48GB GDDR6

Fits: 34B full, 70B Q4

Good balance of VRAM and cost

NVIDIA A100 (80GB) 80GB HBM2e

Fits: 70B full

Previous gen but still everywhere. ~$1-2/hr on cloud