Skip to content

Local LLM Hardware

Considerations for running a local LLM (e.g. Magic Context historian) on the main CachyOS system.

GPU Options

For inference only (not training), VRAM is the main constraint:

GPU VRAM Max Model Size (Q4) Tok/s (approx) Est. Price Notes
GTX 1050 2GB ~3B params 15-30 ~$30-40 Tight VRAM, 1.5-3B models only
GTX 1050 Ti 4GB ~7B params 15-25 ~$50-80 Sweet spot budget option
GTX 1060 6GB 6GB ~9B params 20-35 ~$60-90 Solid mid-range
Intel Arc A310 4GB ~7B params 20-30 ~$60-80 Good encoding too, LP bracket available
P40 (Tesla) 24GB ~34B params 20-30 ~$150-200 Needs power adapter + active cooling mod

CPU-Only Fallback

llama.cpp runs on CPU at ~2-5 tok/s for a 3B model on modern hardware. Fine for a background historian task. Zero cost, no GPU needed.

Use Case: Magic Context Historian

The historian model compresses conversation in the background — it doesn't need to be fast. A 1.5-3B Q4 model is adequate. Running it locally avoids per-token API costs.

  • [[Index]]