u/antonusaca

▲ 2 r/ollama

Best $20 setup for content writing & local file access?

Hey Reddit, need some help optimizing a workflow setup for my wife without completely overpaying.

Our current home setup uses the $10/mo Google One family plan (2TB). The web version of Gemini is great and rarely gives us limit issues, but she needs to work locally with files and folders for content creation (blogs, social copy, deep content planning—no video or image work).

I tried putting her on the new Antigravity Desktop app to let her work out of her local directories. Huge mistake—30 minutes of multi-file agent work and she completely exhausted a weekly limit. The rate limits on these local desktop apps feel way tighter than standard web chats.

(For context: I run Ollama and OpenCode Go with open-source models for my own programming work, not content writing.)

I have a $200 Codex plan for my business, but sharing it on two devices sounds like a recipe for a messy, overlapping history. I’m debating whether to buy her a separate $20 Gemini Advanced sub to keep it simple, or pivot her over to OpenAI / GPT-5.5.

  1. Between Gemini Advanced and OpenAI ($20 tiers), which model actually writes better content? We need something that excels at long-form blogs and strategic planning without sounding robotic.
  2. How do I bypass these local app limits without buying another flat subscription? Is there a smarter way to let her work with local folders without hitting an immediate wall?

Thanks for any advice!

reddit.com
u/antonusaca — 1 day ago
▲ 27 r/ollama

I built ollamatps.com to compare Ollama Cloud models by 24h TPS + intelligence

Hey everyone, I recently built ollamatps.com for my own needs and thought I’d share it here in case it helps others too.

It shows the last 24 hours of Ollama cloud models, sorted by average TPS, and I also added the Artificial Analysis Intelligence Index so it’s easier to compare speed vs. smartness in one place.

My personal takeaway: GLM-4.7 looks like the best speed/intelligence balance with averate 93 TPS. My favorite is still Kimi K2.6, but in my tests it’s much slower, around 32 TPS.

Link: https://architects-movies-termination-agreed.trycloudflare.com/ollama-tps-aa-comparison.html

Happy to hear feedback or model suggestions.

reddit.com
u/antonusaca — 6 days ago

How many concurrent requests can each service handle simultaneously?

Allegretto costs $39 per month, while Allegro costs $99 per month. How many concurrent requests can each service handle simultaneously?

Consider the scenario where the main agent (Hermes) can run multiple subagents.

My current provider, Ollama Cloud Pro, offers a cost of $20 with a maximum of 3 requests per time, while Ollama Cloud Max offers a cost of $100 with a maximum of 10 requests per time.

reddit.com
u/antonusaca — 11 days ago
▲ 8 r/kimi

How many concurrent requests can each service handle simultaneously?

Allegretto costs $39 per month, while Allegro costs $99 per month. How many concurrent requests can each service handle simultaneously?

Consider the scenario where the main agent (Hermes) can run multiple subagents.

My current provider, Ollama Cloud Pro, offers a cost of $20 with a maximum of 3 requests per time, while Ollama Cloud Max offers a cost of $100 with a maximum of 10 requests per time.

reddit.com
u/antonusaca — 13 days ago
▲ 24 r/kimi+1 crossposts

I’m currently using Ollama Cloud heavily ($100/month), mostly with Kimi K2.6. However, Ollama can be quite slow at times during the day, so I’m considering switching to Kimi directly. For additional models, I can use OpenCode Go ($10/month).

Just curious if anyone has had recent experience with Kimi and Ollama and would like to share their experience.

reddit.com
u/antonusaca — 16 days ago

I’m thinking running a local LLM for coding and embedding. I have both a PC and a MacBook. I’ll be doing this for the first time, and I can install Linux on my PC if necessary. I’m looking for advice on which good modern model can be run on my devices. Ideally, I’d like a good TPS, if possible, of 50 and above.

Here are my current specifications:

- PC: AMD Ryzen 7 7700x, 48GB DDR5, RTX 4060Ti 8GB

- MacBook: Apple M2 Max, 32GB

reddit.com
u/antonusaca — 19 days ago
▲ 9 r/ollama

I’m thinking running a local LLM for coding and embedding. I have both a PC and a MacBook. I’ll be doing this for the first time, and I can install Linux on my PC if necessary. I’m looking for advice on which good modern model can be run on my devices. Ideally, I’d like a good TPS, if possible, of 50 and above.

Here are my current specifications:

- PC: AMD Ryzen 7 7700x, 48GB DDR5, RTX 4060Ti 8GB

- MacBook: Apple M2 Max, 32GB

reddit.com
u/antonusaca — 19 days ago
▲ 3 r/ollama

Service is slow for me. I have a $100 Ollama Cloud plan and try to use it at maximum, but with this speed, I didn't find a good model and fast except Gemini 3 Flash. It's because Gemini 3 Flash is not self-hosted, I guess.

reddit.com
u/antonusaca — 20 days ago
▲ 33 r/coolify+1 crossposts

I spent the last week manually deploying Paperclip (an open-source AI agent orchestration dashboard) on my Coolify v4 server.

After fighting through 8 different gotchas — wrong branch (master, not main), UID mismatch on Docker volumes, missing curl inside the container breaking health checks, and a 64-character auth secret I had to generate manually — I got it working.

Then I asked myself: "Why should anyone else go through this?"

So I captured the entire deployment as an OpenCode skill. Now you can literally paste this into any AI agent with Coolify MCP access:

>"Deploy Paperclip on my Coolify server."

...and the agent walks through every step: project creation, environment variables, persistent storage mount, permission fixes, deployment, and even post-deploy CEO onboarding.

Repo: https://github.com/antongulin/coolify-paperclip-deployer
Skill format: 8-phase workflow with all gotchas documented + eval benchmarks (100% pass rate vs. 17% baseline without the skill)

What I love about this: the skill wasn't written by hand in one go. I used the opencode-skill-creator — a free, open-source tool that takes your raw battle-tested knowledge, generates eval test cases, runs A/B benchmarks, and optimizes the description until it triggers reliably.

It's basically applying QA discipline to AI agent skills: capture → evaluate → optimize → ship.

Anyone else running non-marketplace apps on Coolify? What's your workflow for not repeating the same manual steps every time?

u/antonusaca — 23 days ago
▲ 17 r/opencodeCLI+1 crossposts

Just wanted to share some experience to anyone who interested in Deepseek V4 Pro in Ollama Cloud.

I had been waiting for Deepseek V4 Pro to be available on Ollama Cloud for the past few days, but unfortunately, it hasn’t been working. Every other minute, the responses are dropped. On the other hand, my OpenCode Go (which has a processing speed of approximately 50TPS) works exceptionally well and swiftly.

I sincerely hope that Ollama will resolve this issue soon. I’m currently subscribed to the Ollama Cloud Max plan, which costs $100, and I expect to receive good service.

reddit.com
u/antonusaca — 24 days ago