Skip to content

Dini's Vault

Local LLM Hardware

aussiedini-prog/vault

Local LLM Hardware¶

Considerations for running a local LLM (e.g. Magic Context historian) on the main CachyOS system.

GPU Options¶

For inference only (not training), VRAM is the main constraint:

GPU	VRAM	Max Model Size (Q4)	Tok/s (approx)	Est. Price	Notes
GTX 1050	2GB	~3B params	15-30	~$30-40	Tight VRAM, 1.5-3B models only
GTX 1050 Ti	4GB	~7B params	15-25	~$50-80	Sweet spot budget option
GTX 1060 6GB	6GB	~9B params	20-35	~$60-90	Solid mid-range
Intel Arc A310	4GB	~7B params	20-30	~$60-80	Good encoding too, LP bracket available
P40 (Tesla)	24GB	~34B params	20-30	~$150-200	Needs power adapter + active cooling mod

CPU-Only Fallback¶

llama.cpp runs on CPU at ~2-5 tok/s for a 3B model on modern hardware. Fine for a background historian task. Zero cost, no GPU needed.

Use Case: Magic Context Historian¶

The historian model compresses conversation in the background — it doesn't need to be fast. A 1.5-3B Q4 model is adequate. Running it locally avoids per-token API costs.

[[Index]]