| ID | Model | Prompt | Backend | Ver | Fmt | Quant | KV Cache | TTFT | Decode | Prefill | Tokens | Total | Peak RSS |
|---|
TTFT = time to first token (warm, with prefix cache where available) · Cold = first-request TTFT (shown in parentheses when >1.5× warm) · Decode = generation tokens/s · Prefill = prompt eval tokens/s · Total = wall-clock time for full response · Peak RSS = process tree RAM during inference. All values are median of 3 runs except Peak RSS (max). Backends: mlx-lm 0.31.2–0.31.3, mlx-vlm 0.4.3/0.4.4, Ollama 0.19–0.21, oMLX 0.3.4, llama.cpp b5220–b8920, vllm-mlx 0.1–0.2.9, LM Studio, Docker Model Runner.