Retrospective
Running Local LLMs on a Mac Studio Taught Me How to Use Slow Models Powerfully
After moving from dual 3090 Ti GPUs to an M3 Ultra, I found that for individuals and small teams, slow local LLMs work best as queue-based workers rather than real-time chat replacements.
Feb 4, 202610 min read