
Benjamin Cowen (Modal) on Why Fine-Tuning Domain-Specific Models Is Becoming the Default — and How Serverless Makes It Practical
Just read a sharp write-up on StartupHub summarizing a talk by **Benjamin Cowen**, Forward Deployed ML Engineer at Modal, on the shift from pure API usage to fine-tuned domain-specific models.
**Key takeaways that resonated:**
- **The Model Spectrum** — Frontier APIs on one end (easy, no infra, but black-box + cost creep) vs. full scratch servers on the other (total control but you’re now running a GPU cluster). Serverless platforms like Modal sit in a sweet spot for a lot of teams.
- Companies are treating base models as raw materials. The real product is your fine-tuned version on proprietary data.
- Real examples called out:
- • Intercom’s Fin Apex reportedly beats GPT-5.4 at ~1/5th the cost
- • Pinterest CEO on “orders of magnitude” cost reduction via fine-tuning open-source models
- **Modal’s approach** makes this much more approachable. Unified GPUs + sandboxed environments let you run SFT, RL, hyperparameter sweeps, etc., with very concise codebases (examples ~300 lines of Python). It auto-scales containers and abstracts away most of the infrastructure pain so you spend time on data and modeling instead of Kubernetes.
- **Signals it’s time to fine-tune**: Evals plateauing despite prompt work, latency/throughput issues, unit economics breaking, or you already have solid evals + data pipelines in place. If you’ve done the agent/eval/data-collection work, fine-tuning is often the natural next step.
Cowen frames fine-tuning as moving from a research-heavy task to a more standard engineering practice, and serverless infra is a big part of why it’s becoming realistic for more teams.
**Questions for the community:**
- Have you moved (or tried to move) from pure frontier APIs to fine-tuning your own models? What were the biggest wins or surprises?
- Anyone using Modal, Fireworks, Together.ai, self-hosted vLLM/TGI, or similar for fine-tuning/inference? How’s the experience compared to the article’s description?
- What signals made you decide it was time to fine-tune in your own projects?
Would love to hear real-world experiences—especially around cost, latency, and the actual engineering effort involved