
MacBook 48gb RAM with DeepSeek V4 Flash Local
US4 V6 — Apple Edition. Universal State Runtime for local LLM inference on Apple Silicon (M1..M5+). C++17/20 + MLX + Metal + NEON + ANE

US4 V6 — Apple Edition. Universal State Runtime for local LLM inference on Apple Silicon (M1..M5+). C++17/20 + MLX + Metal + NEON + ANE
Suppose a PM shipped a care coordination agent. Week one, patient says "I've been getting chest pain in the evenings." Agent logs the note and demo looks great. Week three, same patient comes back "should I be worried about that pain again?" Agent replies: "What pain?"
By default, agents forget everything the moment a turn ends. If you want continuity, you build it yourself:
Thus Memory is a product decision, not a model feature. Your job is designing what gets summarized, what gets stored, what gets retrieved.
You can checkout this video from SkillAgents YT for more details. Subscribe for similar content.
Hi All i need to generate a fairly extensive curriculum for maths and physics and I was wondering what the best model(s) to do this would be? This curriculum would consist of templates for quiz generation which would need to be in Rust so its not just raw content explanation that it would need to be good at. Its my first time building something of this scale with Agents and I am a bit lost as to which models make sense here. Ive done some testing with Opus and Sonnet but those two are pretty expensive. Any help/suggestions would be greatly appreciated!
So I've been experimenting with a lot of local LLMs lately, tried a bunch of Qwen and Gemma models with different quantisations however I feel I'm still not able to max out the tps I can possibly get out of my machine because of the wrong choice of llm server. I'm using a Macbook M4 Pro with 24 GB unified mem with ollama hooked to claude code and I would like if someone suggests a good combination of a llm server and a cli tool like opencode if they have tried multiple combinations.