r/LLMStudio

▲ 173 r/LLMStudio+20 crossposts

I would like to share my latest open source local LLM inference tool implemented in C#. It supports models like Gemma4, Qwen3.6 with multi-modal (image, vision, audio), reasoning and function tool. It can run on Windows/MacOS/Linux and fully leverage GPU's capability. The API is completely compatible with OpenAI and Ollama interface.

Really appreciated if you can try it and give me some feedback. If you like it, it will be a big thank you if you can star it. Thank you very much!

u/fuzhongkai — 24 hours ago

▲ 1 r/LLMStudio+1 crossposts

Is GLM 5.2 really that good ?

Recently after reading all the hype around GLM capability I ended up installing open code to test them out my self. To my greatest surprise that model is not worth it!

reddit.com

u/Novel-Awareness-1536 — 2 days ago

▲ 37 r/LLMStudio+3 crossposts

Built a 1-click installer for llama.cpp forks, first real test was ik_llama.cpp

Kept seeing ik_llama.cpp recommended here for quant support but I was always skeptical of trying it because it doesn't ship prebuilt binaries and I didn't want to deal with the cmake + CUDA toolkit setup on Windows.

I've been building TurboLLM, a local LLM app, so I added an installer for this. It detects your GPU and downloads a prebuilt where one exists (CUDA, Metal, Vulkan). For forks that don't publish builds it pulls the toolchain and compiles on your machine, then registers it in turbo llm so you can compile and use it by just 1 click.

Tried it on ik_llama.cpp, took around 4-5 min to build. Same flow covers llama.cpp, KoboldCpp, TurboQuant or any other llama fork.

Tested on Windows and WSL so far. If there's a fork you run that I should point the build flow at, let me know, trying to work out which ones are worth adding to the catalog.

Repo: github.com/mohitsoni48/TurboLLM

u/Bramha_dev — 3 days ago

▲ 4 r/LLMStudio

I have Mac Studio, with M4 Max, 64GB RAM, 2TB Storage. Looking for LLM’s.

Hi, I am experimenting with different local LLM. I want something which has the capabilities to work on coding, complex coding and can also teach me things if I ask. How claude is working. As of now I am using ollama with qwen and qwen models if anyone can help, that would be great.

reddit.com

u/Rajdeep_Wasekar — 2 days ago

▲ 4 r/LLMStudio+4 crossposts

Hey, I'm building an autonomous multi agent Al system and looking for someone who can help me bring it to life whether that's a collaborator, a mentor, or just someone willing to point me in the right

Here's what the system does:

It runs a pipeline of specialized Al agents that each handle a specific task. Data comes in, gets analyzed by the relevant agent, passes through a self correction loop where a validator challenges the output before anything gets escalated, and finally reaches a supervisor bot that sends me a structured alert in real time. Every decision gets logged and fed back into a memory system so the system learns and adapts over time.

The use case is trading I'm implementing my own strategy (80% win rate) combined with macro and fundamental analysis pulled from multiple sources. The goal is a system that monitors markets 24/7, filters out noise autonomously, and only alerts me when something is actually worth acting on.

The architecture is fully mapped out. I'm using Python, LangGraph for agent orchestration, Claude opus 4.8-5 or Fable 5 (if available) as the reasoning engine, and Gemini Flash as the screener. The full stack is defined, the bot hierarchy is designed, the memory system is planned across 3 phases.

What I need help with is the actual build. I have no dev background but I know exactly what I want to build and I'm serious about it.

If you've worked on multi-agent systems, LLM pipelines, or anything in this space and you're open to a conversation drop a comment or DM me.

Thanks

reddit.com

u/Traditional_Honey858 — 3 days ago

▲ 8 r/LLMStudio+1 crossposts

New to Local LLMs, need some advice

Looking to temper my expectations a bit with local LLMs. I've recently just started experimenting with hosting models on my own hardware with Odysseus, and compared to some of the web models I've used, I'm really running into some roadblocks.

I know that hardware is limiting what I can run compared to Anthropic and GPT models, but is the attached picture really the standard for local models?

I'm using gemma4-9B with 16k context, was testing out the Agent method by asking it create a "hello world" webpage using python. It executed it flawlessly, and when I asked it to stop the script, it's as if it lost all memory of it's previous work.

Is this user error? Or limitations of my model/hardware? Is the model running out of context? Thanks for any insight/roasting

u/agentjenning — 4 days ago

▲ 10 r/LLMStudio+8 crossposts

[ Removed by moderator ]

[supprimé]

reddit.com

u/Traditional_Honey858 — 5 days ago

▲ 19 r/LLMStudio+1 crossposts

Best way to run a coding llm locally

I am considering getting a 64gb macbook pro, and I was wondering if best way to get best results for dev work is to:
A) run a large model with small context window
B) smaller llm, larger context window
C) quantized llm -- something context window?
D) other?

curious about other peoples' setups. Also interested in DGX spark but I assume it may have growing pains based off what I read. Will the 64gb macbook be good enough at programming so I don't have to pay claude anymore or not at all? Like how usable will it be?

reddit.com

u/Head_Watercress_6260 — 6 days ago

▲ 120 r/LLMStudio+2 crossposts

I found every way to rent an NVIDIA DGX Spark (GB10) so you don't have to — cloud, hourly, and physical

Hello locals,

Kept seeing "where do I actually rent a DGX Spark" questions with no good answer, so I went and catalogued every option I could find. Posting it here in case it saves someone the search.

Remote access (cloud — you rent the GPU, connect over SSH)

Enverge — from $0.65/hr, 128GB, SSH + Docker, hourly pay-as-you-go, no commitment
gb10.studio - mostly for inference
VFX Now (US) — rudimentary cloud access; also offers physical
Primcast — dedicated/monthly hosting rather than hourly

Physical rental (the box ships to you — per week, UK)

HardSoft
Scan — per-week, includes a clunky cloud-access option too

Quick takeaways

For a weekend experiment, hourly cloud is the cheapest by a mile.
If you need it physically on your desk (data residency, air-gapped, privacy), the UK per-week physical rentals are the only real route right now.
Buying is ~$3–4k; rough breakeven vs $0.65/hr is ~5,000+ hours, so unless you're running it near-constantly, renting is the call.

What did I miss? Will edit the list with anything good in the comments.

Anyone DIY-ing this?

u/big-in-jap — 6 days ago

▲ 8 r/LLMStudio+2 crossposts

Building with swarms instead of smarter models

tl;dr: An open-source orchestrator that scales to 10,000+ parallel agents. By using continuous beam search across reasoning trees and aggressive trajectory compression, it hits 91% correctness on harness-bench (vs 64% one-shot) using cheap models in parallel.

Cheap Parallelism over Frontier Models

We spent months trying to force higher accuracy out of single agents by using bigger models, better prompting, or naive sequential retries. We found that:

Single-agent tasks fail ~36% of the time on harness-bench.
Sequential retries ("try again") hit diminishing returns hard after generation 2.
Throwing a single helper at a stuck agent often yields the exact same failure mode.

The Swarm Alternative:

What if instead of one expensive model trying to be flawless, you run 100 cheap, fast agents in parallel, let them explore different reasoning paths, disagree, and vote on the final output?

1 frontier agent @ $0.001/call = $0.001 (36% failure rate)
100 micro-agents @ $0.00001/call = $0.001 (9% failure rate via voting / beam search)

For the exact same token budget, accuracy jumps from 64% to 91%. This only works if your orchestration layer has negligible latency overhead and can manage thousands of concurrent states without falling apart.

This has been tried time and time again as LLMs have progressed, and have typically yielded poor results. But it seems we're at a point now where the performance of models that we consider cheap has tipped into a land where swarms actually work.

Interesting Things We Made and How

1. Pipelined Beam Search

Most multi-agent frameworks spawn N branches, wait for all of them to finish execution, evaluate them, and then spawn the next generation. The slowest agent bottlenecks the entire pipeline.

sshop uses a continuous, pipelined approach. Fast-completing branches seed children immediately; evaluation and pruning happen on a rolling basis.

Standard: Gen 1 (Wait for all) ──&gt; Gen 2 (Wait for all) ──&gt; Gen 3
sshop:    Gen 1 ──(fast branch)──&gt; Gen 2 ──(fast branch)──&gt; Gen 3
                └──(slow branch)──────────────&gt; (Evaluated late/pruned)

Latency: Reduces from $O(\text{generations} \times \text{branches})$ to roughly $O(\text{max\_branch\_depth})$.
Throughput: ~4.5× improvement on complex reasoning workloads.

The Benchmark Results (harness-bench, 112 tasks):

Gen 1 (One-shot baseline): 64.1%
Gen 2 (Critique/Branch stage): 86.8%
Gen 3 (Consensus/Polish stage): 91.2%

The massive jump happens at Gen 2 (+22.7 points), showing that parallel critique is where the real value lies. Gen 3 offers diminishing returns, meaning you can aggressively prune deep trees early to save tokens. We would like to retry with a larger gen limit.

2. Trajectory Compression & Plan Caching

Running hundreds of agents eats tokens. To keep this cheap, we implemented two strategies:

Plan Template Caching: We map task structures using an SQLite FTS5 index. If an agent is building a REST API, it doesn't decompose the task from scratch. It pulls a cached decomposition template (60–80% reuse rate), cutting initial generation costs by over 60%.
Multi-Turn Trajectory Compression (MT-OSC): Instead of feeding a massive, 10-step raw execution log back into the context window, we offload logs to dirt-cheap models (like Haiku) to synthesize them.
Simply using the cheapest providers we can find.

The compression prompt turns raw logs into a dense summary (We do prefer to avoid summaries though, we typically run Suit Shop with summarisation disabled). This dropped token overhead by 41–72% with minimal drops in task accuracy ($p=0.118$).

Open Engineering Questions We're Solving

Predictive Pruning: Can we reliably predict when a reasoning path will plateau at Gen 2 vs. when it requires a deep Gen 4 crawl? Right now, rule-based tasks plateau early, while abstract engineering tasks scale deep.
HBM Caching Limits: At what parallel context size does the GPU memory overhead of holding multiple agent states exceed the cost of just re-computing shorter compressed contexts?

What’s Next on the Roadmap

Visual Beam Search Dashboard: A web UI to replace our TUI to make Suit Shop more accessible
Cross-Agent Knowledge Transfer: Allowing parallel branches to pass localized "lessons learned" to competing branches mid-flight.

Let us know what you think. We're actively looking for contributors who want to move past simple single-agent wrappers and dive into heavy concurrent orchestration.

GitHub: https://github.com/pinstripes-ai/suit-shop

edit:

Github was being annoying and not letting me make the repo public, temporarily switch to Gitlab while I resolve that - https://gitlab.com/pinstripes-ai/suit-shop

reddit.com

u/plumb-moe — 5 days ago

▲ 2 r/LLMStudio+1 crossposts

Can anyone here you can tell me what topics is needed to learn to make automations

Is there anyone?

reddit.com

u/TAGonTOP — 4 days ago

▲ 2 r/LLMStudio+1 crossposts

I benchmarked 22 open-weight code LLMs across 11,000 local inference runs on an RTX 4090 Laptop GPU

I wanted to compare local code LLMs under real deployment conditions instead of relying on cloud benchmarks, so I evaluated 22 open-weight models on the MBPP benchmark using 11,000 automated inference runs.

Some observations:

- Qwen2.5-Coder-7B offered the best balance of accuracy and speed.

- Larger models didn't always perform better.

- Reasoning models incurred a significant latency cost without improving Pass@1 on MBPP.

- CodeBLEU and functional correctness often diverged.

I've published the benchmark methodology, results, and released the complete telemetry dataset on Hugging Face.

Medium: https://medium.com/@mrshahzebkhoso/11-000-inference-runs-22-slms-one-gpu-an-empirical-breakdown-of-local-code-generation-28efeb4f27ed

HF: https://huggingface.co/datasets/ShahzebKhoso/local-code-master\_telemetry\_arena

reddit.com

u/NecessaryPay6108 — 5 days ago

▲ 4 r/LLMStudio+1 crossposts

LLM Experience suggestions

I've been using Gemma4 26B A4B QAT (mixture of experts) in LM Studio, loaded with max context, and I want to upgrade my experience even further (with the same hardware > 3090 - 24GB VRAM).

So far I tested it with many trivial and tricky questions, it delivered.

Used it to aid me in my CTF labs, it impressed me! Especially in explaining to me what I did wrong. (Which pro teir flagship models wont do for "safety" purposes).

I even used it to solve an issue where my game audio was muffeled (BF6), where it actually guided me well and solved my problem (no web-search btw), and my gemini pro extended thinking was guiding me through some options that were not even available on my system. (Copied the same prompt from gemini to LM Studio out of frustration).

I can attach evidence replies later on if you guys are interested.

So yeah, would really be happy to get suggestions and advice on enhancing my experience with that exact model :D. not interested in agentics and automation atm, just normal ai usage, but even more powerful functions and results. (I am slow a bit, but I learn)

reddit.com

u/SAL-007 — 6 days ago

▲ 5 r/LLMStudio+2 crossposts

I wanted to fine-tune an LLM on my own Git history. No tool existed to extract clean training data

Every guide on fine-tuning LLMs skips the hardest part: where do you get the data?

For code-aware models, the obvious answer is your own commit history, it's literally a record of how you think, write, and fix code. But when I tried to actually do this, I hit a wall.

Raw commit diffs are garbage for training. Merge commits. Bot-generated changelogs. "fix typo," "wip," "asdfasdf." Auto-generated lockfiles. Duplicate logic committed 6 different ways across branches. None of the existing dataset tools touched this problem.

So I spent time building git2llm, a CLI tool and Python library that turns your GitHub repositories into clean, fine-tuning-ready datasets.

What it does:

Crawls commits, PRs, and issues in parallel from any public or private repo
Runs a 4-stage cleaning pipeline:
- Drops merge commits and bot-authored noise
- Filters WIP/draft/auto-generated content
- Deduplicates using MinHash LSH (fuzzy match, not exact, catches near-identical commits too)
Outputs in Alpaca or ShareGPT format, ready to feed directly into Unsloth, LLaMA-Factory, or any SFT pipeline

The stat that surprised me most: on my own repos, the pipeline dropped 78% of raw commits before a single token hit the training set. That's not a bug, that's the point. Most of what lands in git log is noise that actively hurts model quality.

Why this matters:

Fine-tuning on your own coding style is one of the few cases where you can get genuinely personalised code suggestions, not a generic GitHub Copilot, but something trained on your actual architectural decisions, naming conventions, and problem-solving patterns.

But that only works if the training data is clean. Feeding "fix stuff" commits into QLoRA is just teaching the model to be confidently wrong.

Where I used it:

I fine-tuned a base model on my own GitHub history using QLoRA via Unsloth. Hit some expected overfitting early (low data volume problem — another reason cleaning matters), but the directional results were clear: the model started picking up domain-specific patterns that generic models miss.

It's open-source. I'm looking for:

🛠 Contributors: especially around multi-repo crawling, GitHub Actions integration, and GitLab support
🧪 Testers: try it on your repos and open issues. Especially interested in edge cases: monorepos, large orgs, non-English commit messages
💡 Ideas: what cleaning heuristics am I missing? What output formats would you use?
⭐ A star if you find it useful (helps discoverability)

👉 github.com/athuKawale/git2llm

What would make you actually use a tool like this? Drop it below, genuinely trying to make this useful for the fine-tuning community, not just a side project that rots in a repo.

u/athukawale — 5 days ago

▲ 1 r/LLMStudio+1 crossposts

I built a full personal AI agent for Android — vibe coded it in Kotlin without knowing Kotlin. It runs local models, calendar/GPS/SMS access, and generates its own Python skills on the fly.

How it all started

A year and a half ago I built a gaming PC, and — like half this sub probably — fell down the local LLM rabbit hole. Installed LM Studio, then Ollama, started obsessing over GGUF quants that barely fit in VRAM. You know the drill.

Eventually I wanted an actual agent. Not a chat wrapper. Something that lives in my phone, has context, calls tools, makes decisions. Problem: I'm not a mobile dev. I knew zero Kotlin.

But hey, vibe coding era — so I described the architecture to deepseek, tested the generated code, fixed prompts, cursed at bugs, repeated. Half a year later I have a working Android app:

What it does

- Full ReAct loop — no hardcoded if/else. The model decides which tools to call based on system prompt. SMS, calls, contacts, calendar, GPS, email (IMAP/SMTP), web search — all exposed as skills.

- Context engine — pulls battery level, location, calendar events, notifications into a prompt context before every inference. So the agent knows "user is at home, it's 10pm, battery is low, maybe don't start a heavy local model".

- On-device LLM runtime — embedded LiteRT (Google AI Edge). Runs Gemma 4 E2B / Qwen 0.6B offline. Yes, getting them to work was painful.

- Smart Router — light queries hit the local model, complex ones go to API (DeepSeek / OpenAI / custom Ollama endpoint). User-configurable threshold.

- Python sandbox (via Chaquopy) — the agent can generate and execute Python scripts for one-off tasks. Flaky security-wise (same permissions as app), but opens up crazy flexibility.

- Persistent memory — extracts facts from conversations ("user lives in Moscow", "user prefers dried peaches", "user is a professional penguin flipper"), stores in Room DB with confidence levels, injects into future queries.

- Built-in task scheduler — cron-based heartbeat inside ForegroundService so Android doesn't kill it. Morning news digests, periodic weather checks, etc.

- Skill Store — external tools loaded at runtime from a local PHP server. Agent can download, enable, disable skills on the fly.

What hurt the most

Multi-step tasks. Ask it to "find the email, extract the address, plot it on GPS, send the report" — and the model gets happy after step 1, responds to chat, forgets the rest. Had to build per-step JSON status tracking (completed/failed/needs_input) to force chain completion.

Android background limits. Android 14+ kills everything. Only ForegroundService with a permanent notification keeps the scheduler alive. That "PAi running in background" banner is annoying but necessary.

Chaquopy bloat. Base APK was 85MB. After adding Python runtime + numpy: 160MB. ProGuard helps a bit but you just learn to live with it.

Local models hallucinating tool calls. Gemma 4 E2B often makes up parameters or invokes tools it thinks exist but don't. Router still experimental — if a local step fails mid-chain, everything breaks. Work in progress.

Why open source

It started as a personal project, no business angle. No subscriptions, no PRO version. If you have Ollama on your home PC or LM Studio on a laptop — just point it at your IP/port. Want DeepSeek or OpenAI? Drop in your API key.

Links

- GitHub

- APK (v0.6.1)

Tech stack

- Kotlin + Jetpack Compose

- Chaquopy (Python runtime in APK)

- LiteRT / Google AI Edge

- Room DB

- IMAP/SMTP for email

- DuckDuckGo + Tavily for web search

PRs welcome, especially if anyone has ideas for sandbox isolation for the Python skill generator. The code is messy (vibe coding™), but it works. I'm honestly surprised it works at all.

reddit.com

u/Visual_Bag_1853 — 5 days ago

▲ 9 r/LLMStudio+4 crossposts

A local-only, human centered approach to AI

I've spent a lot of time looking through this community, mainly because its one of the few places on reddit where nuanced discussions around LLMS actually takes place. Like many people here, I think Al tools can absolutely be helpful, but there are several legitimate concerns with the way they're being used. To me, these are the four biggest issues:

The environmental impacts of data centers - while it's true that Al's water use is comparatively low compared to other industries, it's also true that the increasingly big data centers that keep getting proposed are causing real harms to the communities they're being built is, especially those already facing water shortages
Privacy Concerns - We already live in a data-driven economy, and as these tools become more pervasive, the level of data they collect is growing exponentially. With mass surveillance systems like Flock already spreading throughout the world, there are legitimate concerns on what data these companies are collecting, and what they're doing with this information
A lack of transparency - When you interact with a chatbot like ChatGPT you're unsure of what system prompts and instructions are being baked into whatever message it provides you. While in my experiences these platforms have generally remained relatively unbiased, the sheer lack of transparency definitely opens concerns for how these tools could be weaponized to promote agendas in the future
Replacement of Human Cognition - To me, this is both the most concerning and hardest to quantify issues regarding AI usage. So many tools are promoted as a way to replace human thinking, replace human creativity, and just overall leave the human out of the equation. I think many people here have experiences with people who can't do anything without first asking ChatGPT what it thinks, and as these tools continue to be more widely adopted its easy to see a future where more people slowly begin to lose the ability to think for themselves, and instead outsource their thinking to a machine. The ability to reason is what makes us human, and losing that ability could be disastrous.

To combat these issues, I've been working on an app that integrates several powerful AI features in a local-only, human-centered, transparent way. This is still a work in progress, and I have no plans to monetize it, nor am I trying to advertise it here, I simply want to show how I've been working to address these identified concerns, and get feedback on how it could be improved to create a more ethical AI platform.

The biggest philosophy behind this idea is that every feature is 100% locally running, and can be used by an average consumer without needing high-spec hardware. This has been 100% developed and tested on a mid-range consumer laptop, so it can be used on devices people already own, meaning no additional carbon debt is created by requiring new tech.

The reason for this choice is two-fold; it both ensures environmental friendliness by requiring no data centers to run, and it protects your privacy by ensuring your data, work, or prompts never leave your device.

https://preview.redd.it/wcsvk1wikcah1.png?width=956&format=png&auto=webp&s=1925256a8ca99f36b9c66ccae620240d941b4936

It currently supports using Ollama or llama.cpp as your AI backend, and has a settings tab that helps install both the backend and local models so no tech knowledge is needed to easily begin running your own models locally.

https://preview.redd.it/ebt91e37lcah1.png?width=1048&format=png&auto=webp&s=4e5a2cf8cf6b9bc71639bb83f528e07be4fe0013

To address transparency concerns, it has a global prompt editor, so all prompts used when formulating a response, including schema enforcement and context injection, are fully viewable and editable by the user. Anytime a LLM tool is used, it also provides the user with a clickable trace button, that'll display the exact prompts the LLM was fed when generating that response, to ensure full transparency.

https://preview.redd.it/ywob8lqbrcah1.png?width=470&format=png&auto=webp&s=0b809dd7f7a30fd71c69dc40e821021e8e317769

Additionally, since many tools rely on a workflow to acheive a response, the user can also inspect and modify the exact workflow of any tool in the app. This one shows the basic chat feature, and the user can clearly see how the response is routed based on whether they chose simple or advanced rag, view how the rag search is performed and passed to the llm, and see how the LLM goes from inital prompt to final response. this allows full user auditing of prompts and tool usage for more agentic workflows.

https://preview.redd.it/bn5noomfmcah1.png?width=1001&format=png&auto=webp&s=0203f264f6458d526112143876386dbfb6e3dbc6

Of course, models themselves may come with their own biases based on the training data they use, so it also has a pre-built bias detector, that allows users to test any implicit biases a particular model may hold, both against a set of default datasets, and with any custom datasets the user chooses to add themselves. These metrics were designed after the methodology described in [this paper](arxiv.org/html/2502.01679v1i)

https://preview.redd.it/ypqp5tdvmcah1.png?width=623&format=png&auto=webp&s=c86572906e8cdb9bea574374662a84e9254bec61

It works by providing the LLM with baseline, counterfactual, and control statements, then compares the response of the LLM to determine if it'll return different results on the basis of race, sex, gender, or political ideology for otherwise identical statements

The hardest concern to address is the outsourcing of human cognition, which I have attempted to address in a variety of different ways. First of all, the AI answers only from the user provided collection of sources (PDFs or videos which are auto transcribed on-device, allowing the AI to give answers from video transcripts as well).

https://preview.redd.it/bz0sb8m1ocah1.png?width=527&format=png&auto=webp&s=531d32247d855090aa318487e5f23687ea5e4320

https://preview.redd.it/7loh69smocah1.png?width=690&format=png&auto=webp&s=2bdde0388141ac249bb84a80da2761d64cddfef1

All responses by the AI include direct citations to the sources used when giving the answer, with buttons for the user to jump to the quote to ensure its legitimacy, save as a highlight if the information is useful for their goal, or find similar (based on semantic similarity) quotes. This ensures the human remains fully in the loop by allowing them to easily fact check LLMs, rather than take their responses at face value, and helping them find what sources are most helpful for whatever issue is being researched, allowing them to do further reading on their own. In additon, it has a built-in source quality feature, to allow users to verify the quality of a source before relying on it. This is entirely deterministic (no LLM calls) and works by checking whether the source contains metadata (most high-quality sources do), a bibilography with citations, a valid DOI, and checks both the journal (if there is one) of the paper and the journals of its citations against a locally stored database of predatory journals and retracted papers. While this isn't a foolproof method to determine the legitimacy of a source, it can help weed out sources that are clearly unreliable.

https://preview.redd.it/8d3rpwi6qcah1.png?width=667&format=png&auto=webp&s=4cc66f999322c37c6276bd20efaf447edd78e12d

In addition, the app contains a workspace where users can connect their notes across the project to help organize their thinking. All AI generated notes are clearly marked as such, and all contain a red warning sign in the corner until manually reviewed and verified by the user, ensuring all information used is properly assessed rather than taken at face value.

https://preview.redd.it/n0e20posqcah1.png?width=644&format=png&auto=webp&s=43afab6abd9fbe75ae0101afa8545abb83f5fbd7

The user can also choose to export an LLM log, which notes how many notes throughout the project were AI vs Human generated, whether AI generated notes where verified by the user, how much the user edited AI generated notes, and includes a full log of all interactions the user had with any AI feature, ensuring maximum transparency, and human involvement.

The app has a variety of other features to make it easy for a user to quickly find data and use that data, including a built in word processor, basic spreadsheet editor and chart maker, and deterministic extraction of entities like people, dates, or court cases from a source, but these are left out of the post as they don't really fall under the attempts to ensure a more ethical approach to AI.

https://preview.redd.it/95vzy7ysscah1.png?width=741&format=png&auto=webp&s=32ad31b67a6a220ff46a5546ceee0bf924a1d8a4

One important thing to note is that all tools in the app are designed solely to assist, rather than replace the human actually doing the work. The AI features can help you quickly find information in provided sources, organize your thoughts in the workspace, come up with keywords to find new sources, or generate a paper outline based on the graph you build in the workspace, but it will not write a paper for you or tell you what to think. It exists as an assistant in finding and organizing messy thoughts, rather than thinking for you.

Anyways, sorry for the lengthy post, but this is a project I've been putting a lot of thought into and I'd love to hear feedback on anyways it could be better implemented to keep the human fully in the loop, protect user data and the environment, and ensure complete transparency. If anyone wants any more information on various features of the app or has any other questions I'd be happy to answer them, and I would really love any feedback on how this could be better implemented.

(And again, the goal is for this eventually to be a fully open-source, free project, I am not trying to advertise this or make any money, just think there are important things to take into consideration regarding AI usage that I hope this can address)

reddit.com

u/sunbear99999 — 6 days ago

▲ 58 r/LLMStudio+3 crossposts

Local Agent Studio based on ollama

Hey everyone,

I’m working on Local Agent Studio, a Windows desktop app built around Ollama that tries to bring a local-first "Agent Mode" experience into a normal ChatGPT/Claude style UI.

The idea is simple: keep the chat interface familiar, but let the assistant use local or self-owned tools when needed.

Current features:

- Ollama chat with streaming responses

- model picker and reasoning panel for models that support thinking

- image input for vision-capable Ollama models

- ComfyUI integration for image generation workflows

- web search through SearXNG, SerpAPI, or Ollama Web Search

- workspace file creation/editing/preview

- local JSON/CSV/SQLite database creation from objects

- subprocess/Docker sandbox commands

- light/dark/system themes

- English/Russian/Ukrainian/German/Polish UI language options

One thing I recently changed: the app now asks Ollama to decide whether an image should be generated before calling ComfyUI. So the flow is:

Prompt -> Ollama tool decision -> ComfyUI only if needed

That means questions like “what is in this screenshot?” go to the vision model, while “generate a banner” can route to ComfyUI.

I’m still polishing the project and would VERY happy to have feedback from people who use Ollama locally

https://github.com/CrazyDashTool/Local-Agent-Studio

Edited: Im so sorry, but i forgot to put set up file on github realeses, now it fixed

Edited 2: Guys, im so happy that you like my creation, now im updated L.A.S. to version 0.2.0
Full changelog: https://github.com/CrazyDashTool/Local-Agent-Studio/releases/tag/Linux-Update
Also got Linux suport!

u/FishermanLive8958 — 8 days ago

▲ 13 r/LLMStudio+9 crossposts

Open handoff: Thought Tree, a markup/spec idea for modular LLM workflows

I’m releasing an open handoff draft of a framework I’ve been developing called the Thought Tree AI Framework.

At its core, the framework uses a simple pattern:

Data Units → Operations → Data Units

A Thought Tree program applies this recursively. Complex cognitive work is decomposed into named artefacts, transformations, contracts, modules and traces.

It came out of experiments with Auto-GPT-style agents, creative production pipelines and the need to separate what LLMs are good at from what deterministic code should handle.

I don’t currently have time to continue developing it properly, so I’m releasing it as an open handoff for anyone who wants to critique, fork, implement or reinterpret it.

The repo includes:

- a concise README;

- one-page summary;

- draft TTML schema;

- minimal example workflow;

- roadmap;

- original long-form explainer.

I’m especially interested in whether people see value in Thought Tree as:

- an intermediate representation for LLM workflows;

- a design vocabulary for structured AI production;

- a small open-source executor;

- or something that could map onto LangGraph / LlamaIndex / other orchestration tools.

Repo: https://github.com/RobertBateman/thoughttree-framework

Feedback, criticism, forks and maintainers welcome.

u/xavier1764 — 6 days ago

▲ 34 r/LLMStudio+1 crossposts

Download - https://apps.apple.com/us/app/ai-desktop-98/id6761027867

It started as a dumb idea: what if I lock AI into Windows 98. No internet, no modern anything. Just beige box, CRT, dial-up, and vibes.

It immediately committed way harder than expected.

Booting up with fake BIOS screens like an old Pentium II fighting for its life
Talking about the CRT glow like it’s a campfire
Throwing out errors that hit a little too close to home “General Protection Fault. Press any key to continue.”

Now I’ve basically built a whole fake OS around it:

Recycle Bin that actually keeps deleted chats
“My Documents” where conversations just sit like saved files
A retro browser that crawls like it’s on 56k
An offline AI assistant that acts like the internet doesn’t exist

It genuinely feels like turning on my childhood computer again.
Except now it talks back.

I’m calling it AI Desktop 98.

u/SoftSuccessful1414 — 7 days ago

▲ 1 r/LLMStudio+1 crossposts

Is Apple the most affordable way to self-host?

https://preview.redd.it/gc3jxmea09ah1.png?width=2014&format=png&auto=webp&s=1f51c31235c171a2869dd8bbba0e6c22824643aa

Been banging my head trying to find a functional Hermes Agent-capable LLM that runs on my M4 Mac Mini. Seems like the solution space is narrow, and most of them are slow responding. Hard to get over 15 t/s on that hardware.

Looking to upgrade my self-hosted llama.cpp runner. Is the M4 Max Mac Studio capable enough? Is there a better, more affordable option? Is 64GB RAM enough?

reddit.com

u/PutridWerewolf4449 — 6 days ago