Local LLM Hardware¶
Considerations for running a local LLM (e.g. Magic Context historian) on the main CachyOS system.
GPU Options¶
For inference only (not training), VRAM is the main constraint:
| GPU | VRAM | Max Model Size (Q4) | Tok/s (approx) | Est. Price | Notes |
|---|---|---|---|---|---|
| GTX 1050 | 2GB | ~3B params | 15-30 | ~$30-40 | Tight VRAM, 1.5-3B models only |
| GTX 1050 Ti | 4GB | ~7B params | 15-25 | ~$50-80 | Sweet spot budget option |
| GTX 1060 6GB | 6GB | ~9B params | 20-35 | ~$60-90 | Solid mid-range |
| Intel Arc A310 | 4GB | ~7B params | 20-30 | ~$60-80 | Good encoding too, LP bracket available |
| P40 (Tesla) | 24GB | ~34B params | 20-30 | ~$150-200 | Needs power adapter + active cooling mod |
CPU-Only Fallback¶
llama.cpp runs on CPU at ~2-5 tok/s for a 3B model on modern hardware. Fine for a background historian task. Zero cost, no GPU needed.
Use Case: Magic Context Historian¶
The historian model compresses conversation in the background — it doesn't need to be fast. A 1.5-3B Q4 model is adequate. Running it locally avoids per-token API costs.
Related¶
- [[Index]]