Prototyping Solana AI agents without burning OpenAI credits what's everyone using for the LLM layer?
Seeing a lot of AI-agent-on-Solana projects here lately (MCP servers, sol-agent-wallet, agentic research workflows). One recurring pain when prototyping these: the LLM bill. You're iterating on prompt > tool-call > on-chain action loops, the agent fires dozens of calls per test run, and OpenAI/Anthropic credits evaporate before you've even got the happy path working.
For the prototyping / dev-loop phase specifically (not production), what's everyone's stack for cheap or free inference?
For full transparency since it's relevant: I run a free LLM chat API (apifreellm.comPOST /api/v1/chat, bearer key from a sign-in, there's also an OpenAI-style /v1/chat/completions). Honest caveats so nobody rage-quits: free tier is rate-limited (a fixed delay between requests), it's general-purpose chat not a frontier model, and it's best for dev/agent prototyping and non-critical loops I would not put it on a latency-critical production path. It's the kind of thing that's handy for hammering an agent loop while you're still building it, then you swap to a paid model for prod.
Genuinely curious how others handle this, because the answers seem to be: (a) local models (Ollama/llama.cpp) fine until you need bigger context, (b) free tiers of the big providers generous but they cut you off mid-iteration, (c) just eating the cost. Is anyone running agents long-term on free/cheap inference, or is it always "prototype cheap > production paid"?
Also: for agents that actually execute on-chain, how are you sandboxing the LLM's tool calls so a bad completion can't drain a wallet? That part scares me more than the inference cost.