Infrastructure
Hardware
What runs what. VRAM is the constraint — here's the map.
Apple Silicon
Unified memory = VRAM. The reason Macs became AI machines.
Fits: 405B full precision, any model you want
The ultimate local AI machine. From $7,999
Fits: 405B Q4, 70B full precision
Serious AI workstation. 819GB/s bandwidth
Fits: 70B Q4-Q6, 34B full
Entry M3 Ultra config
Fits: 70B Q4, 34B full
Max chip, not Ultra. Solid for most models. From $3,299
Fits: 70B Q4, portable
Laptop that runs 70B models
Fits: 34B Q4, most 7-13B comfortably
Practical daily driver for local AI. From $2,499
Fits: 34B full, 70B Q3
Tiny form factor, real AI power. From $1,799
Fits: 13B Q4, 7B full
Entry point — tight on anything over 13B. From $799
NVIDIA Desktop / Workstation
Consumer and professional GPUs with CUDA.
Fits: 70B full, 405B Q4
Workstation class. Blackwell architecture, 96GB is a game changer
Fits: 200B models locally
Grace Blackwell desktop supercomputer. GB10 chip, 1 PFLOP FP4. ~$3,000
Fits: 13B Q8, 34B Q4
New flagship. 32GB finally addresses the VRAM ceiling. ~$1,999
Fits: 7-13B Q4
Blackwell consumer mid-range. ~$999
Fits: 7-13B Q4
Good value Blackwell. ~$749
Fits: 7B Q4-Q6
Entry Blackwell. ~$549
Fits: 13B Q4, 7B full
Previous gen but 24GB still useful. Used ~$1,200
Fits: 13B Q4, 7B full
Used market gem — 24GB for ~$700
AMD Consumer GPUs
ROCm support improving but still behind CUDA.
Data Center / Cloud GPUs
What the providers run. Relevant if you rent your own.
Fits: 405B+ full precision
Latest Blackwell. 2x perf over H100. Shipping 2025
Fits: Trillion+ parameter models
Grace CPU + 2x B200 in NVL72 racks
Fits: 70B+ full precision, 405B Q4
H100 successor. More HBM, faster. ~$25-30K
Fits: 70B full, 405B Q4 (multi-GPU)
Workhorse. Widely available on cloud ~$2-3/hr
Fits: 34B full, 70B Q4
Good balance of VRAM and cost
Fits: 70B full
Previous gen but still everywhere. ~$1-2/hr on cloud