Somewhat capable and quick local model for CPU?
I exhausted my budget for the week and started looking at local models.
My Hermes lives on a VPS with no GPU, and 24GB RAM. I tested three models, and gave each the task to essentially look up a skill by name that contained the project context, and to tell me about the project.
Phi4-mini: Failed hard, after waiting for half an hour it produced a long, rambling answer with no usable content.
Llama3.1:8b: Stayed better on track, but couldn't figure out how to use the skill view tool apparently. Wait a bit less ~25 minutes.
Qwen2.5:7b: Took about 15 minutes to view the skill, then another ~36 minutes to produce a somewhat usable answer that did reference project specific knowledge.
I think I could make Qwen2.5 work, but it's too slow. Nothing meaningful will get done at that pace even if the quality were ok (which I am sure will be noticeably worse than my regular cloud model).
Is there any model out there useable for Hermes on CPU at a somewhat useable speed? It doesn't need to be super fast, but I'd rather wait 5 than 50 minutes for a simple response.