RTX vs Apple Silicon
Local AI hardware is basically a religious war with better benchmarks.
NVIDIA RTX GPUs are the sports cars: fast VRAM, CUDA, absurd token throughput if the model fits.
Apple Silicon is the weirdly elegant camper van: unified memory means you can often fit much larger models locally, especially on something like an M4 Max with up to 128GB RAM.
So the tradeoff is simple:
RTX = faster kitchen
Mac = bigger fridge
I run Qwen 3.6 27B locally on an RTX 5090 inside Thoth because 32GB VRAM is the sweet spot for my daily driver setup: fast, private, and no API round trips.
But Thoth is designed local-first, not NVIDIA-first.
Ollama, llama.cpp, OpenAI-compatible local endpoints, the point is that your AI should run where you want it to run.
Your machine. Your models. Your memory. Your data. Cloud optional. Local by default.