r/OpenSourceeAI

▲ 12 r/OpenSourceeAI+9 crossposts

Big Update: OpenLLM-Studio now has a built-in Code Editor with strong agentic coding!

I built OpenLLM-Studio — a free, open-source desktop app that makes running local LLMs extremely simple.

OpenLLM-Studio is a simple desktop app that does the thinking for you. You just open it, it scans your hardware (GPU, VRAM, RAM, CPU), uses AI to recommend the best model + perfect quantization, downloads it from Hugging Face, and you’re chatting with it in minutes.

No Ollama needed. No terminal commands. No guessing.It’s completely free and open source.

If you’ve ever felt overwhelmed trying to run local LLMs, I’d love to know what you think.

Here is the tutorial on how to download Local LLMs using AI in OpenLLM Studio: https://www.reddit.com/r/StartupMind/comments/1spfebg/i_built_a_tool_that_finally_makes_running_local/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

you.GitHub: https://github.com/Icecubesaad/OpenLLM-Studio
Download: https://openllm-studio.vercel.app

u/icecubesaad — 1 hour ago
▲ 17 r/OpenSourceeAI+8 crossposts

Agent Memory Protocol (AMP) — Open spec for interoperable AI agent memory on top of MCP

Hey everyone,

One of the biggest headaches with AI agents right now is memory fragmentation. Every backend (Mem0, smriti-memcore, custom vector DBs, etc.) has its own APIs, schemas, and quirks. Switching backends or trying to make agents portable is painful.
So I’m excited to share AMP — Agent Memory Protocol: https://github.com/smriti-memcore/amp

An open specification that defines a clean, standardized interface for persistent memory in MCP-compatible agent systems.

The Six Core Verbs
• amp.encode — Store new memories
• amp.recall — Retrieve relevant memories
• amp.forget — Permanently delete
• amp.consolidate — Trigger backend reorganization / summarization
• amp.pin — Mark important memories as permanent
• amp.stats — Get backend health & usage stats

It comes with Core (basic) and Full conformance levels, a full JSON schema, compliance test suite, minimal example, and a production reference implementation (pip install amp-server that wraps smriti-memcore).

Why this matters:
Write your agent once against AMP → it can work with any compliant memory backend without code changes. True interoperability for the memory layer.
Repo: https://github.com/smriti-memcore/amp
Quick start is super simple — you can run the minimal example in seconds with zero dependencies.

Would love feedback from the community:
• Does this solve a real pain point for you?
• Which backends would you want AMP wrappers for first? (Chroma, Pinecone, pgvector, Zep, etc.)
• Any missing verbs or features?

Looking forward to PRs and implementations!
(Independent open spec — MIT licensed, not affiliated with Anthropic/MCP)

u/thesunsetisbeautiful — 20 hours ago
▲ 42 r/OpenSourceeAI+8 crossposts

Cosmo - Real-time PostgreSQL TUI Dashboard (v0.2.0)

Just shipped Cosmo — a clean TUI to monitor your Postgres database in real-time.

Github: https://github.com/mujib77/cosmo

Live overview, active queries, WAL rate, locks, and more.

I’m actively developing more features and older version support.
Would love your feedback and suggestions!

u/VermicelliLittle6451 — 20 hours ago
▲ 47 r/OpenSourceeAI+5 crossposts

Everyone's been talking about memory for agents and claiming that a bunch of .md files thrown into a repo is memory. That is NOT memory, that is good for human readability but agent Memory should be hybrid.

I asked Thoth to explain its memory system, so people know what memory must be like:

"Syd:

Tell the nice people on X how your memory works in detail.

Thoth:

🔧 Using 🪞 Thoth Status…

✅ 🪞 Thoth Status done

Certainly, sir — here’s a detailed X thread for the nice people on X:

1/

How Thoth’s memory works 🧠

Most AI apps treat memory like a long chat log.

Thoth does something different: it builds a local, structured knowledge graph about you, your projects, preferences, people, files, deadlines, and recurring workflows.

2/

Memory starts with entities.

Instead of storing “random notes”, Thoth creates typed entities like:

- User

- Projects

- People

- Organisations

- Places

- Preferences

- Events

- Skills

- Concepts

- Media

Each entity can grow over time as new facts are learned.

3/

Example:

If you say:

“My name is Siddharth, I live in London, and I’m working on Thoth.”

Thoth doesn’t just bury that in a transcript.

It updates the User entity, creates or updates London and Thoth, then links them with relationships like:

- User → lives_in → London

- User → works_on → Thoth

4/

The relationship layer is the important bit.

Thoth stores not just facts, but connections:

- Sarah → mother_of → User

- User → prefers → dark roast coffee

- Meridian API v2 → uses → TypeScript

- Deadline → deadline_for → Project

- Project → part_of → Organisation

This makes recall much more precise.

5/

That means memory is queryable by meaning and structure.

You can ask:

“What do you know about my work?”

Thoth can traverse the graph from User → projects → deadlines → organisations → technologies.

It is not just keyword-searching old chats.

6/

There are two kinds of retrieval:

  1. Automatic recall

    Relevant memories are pulled into context before replies.

  2. Explicit graph search

    Thoth can search or explore the memory graph when you ask broader questions like “tell me about my family” or “what are my current projects?”

7/

Memory is local-first.

The knowledge graph lives on your machine, under Thoth’s local data directory.

The point is not to build yet another cloud profile of you.

The point is to give your own assistant durable context that you control.

8/

Thoth also has a Dream Cycle.

At night, it can refine memory by:

- consolidating duplicate facts

- enriching entities

- inferring useful relationships

- surfacing system insights

- improving long-term recall quality

So memory gets cleaner over time instead of turning into a junk drawer.

9/

It also generates an Obsidian-compatible wiki from the graph.

Every memory entity can become a markdown page with links to related entities.

So your assistant memory is not trapped in a black box — it becomes a personal wiki you can browse.

10/

Memory is not just for trivia.

It powers workflows:

- “Remind me about that project deadline”

- “What did we decide about the API?”

- “Use my usual style”

- “Summarise my work priorities”

- “Draft this based on what you know about Thoth”

The assistant becomes context-aware across sessions.

11/

Thoth is careful about what it saves.

It should remember durable things:

- names

- relationships

- preferences

- projects

- deadlines

- recurring workflows

- important technical context

It should not save every throwaway request or transient detail.

12/

The goal is simple:

AI assistants should not feel like goldfish.

They should remember what matters, understand how things connect, and keep that memory under your control.

That’s why Thoth uses a local knowledge graph rather than just stuffing more chat history into context."

▲ 240 r/OpenSourceeAI+16 crossposts

GitHub has a serious fake engagement problem and I wanted to see how visible it actually is through the public API, its worse than I thought after I went down that rabbit hole...

Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars.

What I built

phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers):

  1. Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes
  2. Pulls star and fork events from the last 24 hours per repo
  3. Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history)
  4. Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%)
  5. Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window
  6. Files an issue directly on the targeted repo so the maintainer knows what's happening

Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended.

What the pattern actually looks like

It's remarkably consistent. A fake engagement campaign in the raw data:

  • 40-200 accounts, all created within the same 1-2 week window
  • Zero original repositories, or only forks they never touched
  • No bio, no location, no followers, no following
  • All of them starring the same repo within a 90-minute window
  • The target repo usually has a name implying it's a tool, hack, executor, or generator

Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast.

Notifying the affected repo

When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first.

Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently.

Why I built this

Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected.

It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/

The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users.

All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq.

Repo: https://github.com/tg12/phantomstars

Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process.

Questions welcome on the detection approach, GraphQL batching, or campaign ID stability.

github.com
u/SyntaxOfTheDamned — 1 day ago
▲ 39 r/OpenSourceeAI+6 crossposts

Fine-tuned RAG: teaching your retriever which embedding dimensions matter (+11% hit rate, +12% completeness, +9% faithfulness)

Hi all,

I developed a fine-tuned retrieval head (neural net) for RAG that transforms query embeddings before retrieval, so the system learns which embedding dimensions actually matter for your corpus — rather than weighting them all equally as standard cosine similarity does.

The problem

In any domain-specific corpus, some embedding dimensions are highly predictive for matching queries to the right passages, while others are effectively noise. Standard cosine similarity can't distinguish between the two, so retrieval gets pulled toward superficially similar but substantively irrelevant passages. The fine-tuned RAG is designed to prevent exactly that.

How it works

  1. Synthetic question generation — An LLM generates multiple questions per chunk in the corpus, for which the answers can be inferred from that chunk. This creates a dataset of question-chunk pairs (QA-pairs). These are embedded using an embedding model and divided into a training and validation set.
  2. Neural net training — A lightweight neural network using MNR loss is trained on the training QA-pairs. After each epoch, the model is evaluated on the validation set by measuring retrieval hit rate: the proportion of validation questions for which the correct chunk appears in the top-5 retrieved results. Retrieval works by embedding the question, passing it through the neural network to transform the embedding, and ranking all corpus chunks by cosine similarity to the transformed embedding.

Through this mechanism, the projection head learns for these 'type of questions' which dimensions in the embeddings are informative for finding the best chunks — and which are irrelevant.

Results

To validate the architecture, I used the Legal RAG Bench dataset as a proof of concept — evaluating on 100 held-out test questions.

Retrieval Hit Rate:

  • The fine-tuned retriever achieves 82% Hit Rate (k = 20), compared to 71% for the standard cosine retriever — an 11 percentage point improvement, meaning the correct chunk appears in the top 20 results significantly more often when the query embedding is first transformed through the fine-tuned retriever.

Answer quality (LLM-as-judge, 1–5 scale across 6 metrics):

  • Outperforms traditional RAG (top-k cosine sim) on all 6 metrics
  • Largest gains in completeness (+12%) and faithfulness (+9%)
  • Consistent improvement across every metric — not just isolated gains — suggesting that retrieving more relevant context has a broad positive effect on answer quality

Code and full write-up available on GitHub: https://github.com/BartAmin/Fine-tuned-RAG

u/Much_Pie_274 — 1 day ago

Krea 2 is officially going open source. I replaced my entire design pipeline with this 50ms foundation model—here is why it changes the math.

Sleenyre dropped the bomb yesterday, and it is exactly what the local AI scene needed to hear: Krea 2 is officially going open source.

Let me break this down, because the implications here are massive. I test AI tools so you don't have to, and I've been running Krea 2 relentlessly since they stealth-dropped it on May 12. I ran it side by side with my usual local FLUX and SD setups. The gap is bigger than you'd expect. The fact that we are getting our hands on the actual base weights completely flips the board for anyone running local hardware.

Here is what most people miss about Krea 2. It is not a wrapper. It is not another Stable Diffusion fine-tune dressed up with a slick web UI, and it absolutely is not built on a FLUX base. Krea trained this foundation model completely in-house, from scratch, with a fundamentally different philosophy than literally everyone else in the open-source space right now.

Look at the current meta. Every major lab is obsessing over logical correctness. Can the model spell "neon sign" perfectly? Can it render exactly five fingers holding a coffee cup at a precise 45-degree angle? That precision is technically impressive, but it birthed the dreaded "AI look"—that overly sanitized, hyper-smooth, plastic sheen that instantly gives away a generated image. It feels sterile.

Krea took the exact opposite bet. They optimized strictly for aesthetics and raw latency. They do not care if the text on a distant billboard is slightly garbled. They care about film grain. They care about how light wraps around a subject's jawline. They care about the raw, imperfect texture of a 35mm photograph. They built a model that actively fights the sanitized AI aesthetic.

And they made it fast. Dangerously fast.

We are talking 50-millisecond live updates.

I do a lot of client design work on the side, and my pipeline used to be endless rounds of friction. A client asks for a darker, moodier vibe. I would spend three days generating mockups, tweaking local ControlNet weights, waiting for rendering batches, and praying the prompt alignment held up. Now? I just jump on a live screen share. I drop a new reference photo into Krea 2, drag a slider, and the image updates live before the client even finishes their sentence. It feels less like prompting a machine and more like playing an instrument. The iteration cycle drops from hours to milliseconds.

But that was all happening behind their proprietary wall. The open-source announcement changes the landscape for the local AI community in four very specific ways.

First, raw pipeline integration. The second these weights hit HuggingFace, the ComfyUI community is going to rip this architecture apart and wire it into everything. Imagine a native 50ms foundation model hooked up directly to live webcam feeds, real-time Unreal Engine game environments, or interactive architectural viz setups. We have had real-time SD implementations before, sure, but they always felt like compromised step-downs. You lose quality for speed. Krea 2 is a native foundation model built from the ground up for instantaneous inference without sacrificing that core aesthetic quality.

Second, the death of the mega-model VRAM dependency. If you have been running local models lately, you know the VRAM tax is getting completely brutal. We are constantly balancing quantization tricks just to squeeze decent parameter counts onto consumer 24GB cards. Krea 2's architecture is highly optimized for this low-latency layer. While we don't have the exact parameter count confirmed just yet, a model designed to run this fast natively is going to behave very differently on local silicon. Speculation is high, but if this runs smoothly on a standard 4090 or even mid-range cards without aggressive pruning, it democratizes real-time generation in a way we haven't seen since the early 1.5 days.

Third, targeted aesthetic fine-tuning. Krea 2 already excels at breaking the plastic AI look, but once we can train our own LoRAs on this specific base, the ceiling vanishes. Think about training a custom LoRA on your specific brand's color grading, or an exact vintage film stock from a specific director. You then get to generate live, 50ms interactive assets using that exact aesthetic profile. The creative control shifts completely from the prompt box back into the artist's hands. You aren't fighting the base model's bias; you are riding its speed.

Fourth, cost economics for small teams. As a PM, I look at the operational cost of these tools. Running heavy, API-gated models for high-volume ideation drains budgets fast. Having an open-source, ultra-low-latency model means you can self-host an ideation server for your design team on a single rented GPU, drastically cutting SaaS subscription bloat. I replaced my entire early-stage ideation pipeline with this last week. FLUX is still sitting on my drive for when I strictly need typographic accuracy or rigid compositional adherence, but for pure visual exploration and rapid prototyping? Krea 2 bodies it effortlessly.

There is still a lot of friction we need to anticipate. What license are they actually dropping this under? If it is a restrictive non-commercial research license, it severely limits the startup ecosystem from building on it. How heavily quantized are the weights they are releasing? Will they drop the full suite of real-time control adapters, or just the naked base model?

But the signal here is incredibly clear. The era of waiting 10 seconds for a batch of four sanitized, plastic-looking images is dying. Real-time, aesthetically opinionated foundation models are the next major split in the timeline, and Krea just handed the open-source community the playbook.

Tested it, here is my take: this is the real deal. I will be stress-testing the repo the absolute second it goes live and posting the true local VRAM requirements and Comfy workflows.

What are you guys planning to hook this up to first? Because my immediate thought is tying it directly to Unreal for live texture synthesis. Let me know your hardware specs and what you want me to test when the weights drop.

reddit.com
u/TroyHay6677 — 22 hours ago
▲ 18 r/OpenSourceeAI+5 crossposts

We kept running into the same problem: LangChain is powerful for building agent logic, but the moment you need a production-grade runtime with a visual canvas, human review checkpoints, scheduling, observability, and self-hosted deployment, you're assembling a lot of pieces yourself.

Heym is our answer to that. A self-hosted, source-available AI workflow automation platform. Visual canvas for building multi-agent pipelines, built-in knowledge retrieval, Human-in-the-Loop approval checkpoints that pause execution and generate a public review link, full LLM traces, and an MCP Server to expose any workflow as a callable tool for AI assistants.

The execution engine builds a DAG from the workflow graph and runs independent nodes concurrently. Agent nodes have automatic context compression so long-running agents don't silently fail as context grows.

Launching today. Source-available

GitHub: https://github.com/heymrun/heym

u/PuzzleheadedMind874 — 1 day ago
▲ 1 r/OpenSourceeAI+1 crossposts

ClaudeAi will die, too!

Hi everyone,

I had to cancel my Claude subscription today. Not because the pipeline was bad, but because my stomach can no longer digest the sheer amount of corporate bullshit. For anyone who has been living under a rock for the last few years, here is the official, uncensored chronicle of how the "saviors of tech ethics" buried themselves one by one – and what is still to come.

THE PAST: The Fall of OpenAI

The Dream: OpenAI starts as a non-profit. "We are saving humanity from evil AI! Everything is open source! For the community!" Wipes away a tear. The Reality: Sam Altman smells the big money. "Open" quickly becomes "Closed". The Death Blow: OpenAI throws its ethical charter into the shredder and goes to bed with the Pentagon. Suddenly, the "humanity-saving AI" is building logistics tools for drone swarms. The open-source community and every developer with a shred of ethics left in their blood flees in panic.

THE PRESENT: The Anthropic Theater (aka "We are the good guys... not")

The Hype: Anthropic enters the stage. "Hey devs, we are not like OpenAI! We have Constitutional AI! We are super nice, ethical, and safe!" Half the tech world (including me) runs to Claude. The hype is real. The Plot Twist: The US government threatens Anthropic with the regulatory whip ("You are a supply chain risk if you don't fall in line!"). Anthropic gets cold feet. The Death Blow: Anthropic signs a 45-billion-dollar deal to run their models on Elon Musk's "Colossus" supercomputer. The Verdict Today: Congratulations. Anthropic completely screwed themselves in record time. Anyone who left OpenAI because of the Pentagon now gets to watch their subscription fees flow directly into Elon Musk's hardware dreams. The "ethical alternative" is just sucking different corporate balls now.

THE FUTURE: What Happens Next (Spoiler Alert)

Because tech cycles are as predictable as bad code, here is my forecast for the next 12 to 24 months:

Phase 1: The Total Sellout (Late 2026) Claude 5.4 is released but is so heavily censored that it refuses to write code that "potentially threatens the capitalist system." Anthropic announces a merger with a defense contractor but calls it a "security partnership for world peace."

Phase 2: The Marketing Lie Sam Altman and Elon Musk realize they are actually the exact same person and merge OpenAI and xAI into MegaCorp.AI. The logo is a middle finger pointing at a dollar sign.

Phase 3: The Revenge of the Basement Devs (2027) We completely stop giving a shit about the cloud. The only true ethics left are on our own hard drives. The community fully retreats to local models (Llama 4, Mistral-Zitrone, or decentralized Web3 clusters). If you want to use an AI, you will have to cram three RTX 5090s into your server rack until the fuses blow.

My Personal Conclusion:

I deleted my Claude account. I am done financing the multi-billion-dollar deals of tech bros who change their ideals faster than their underwear.

How about you guys? Are you still inhaling copium with Claude/ChatGPT, or are you already letting your prompts burn locally on your own hardware?

Keep grinding those keyboards!

Note: Original Post on r/Unwanted_red translated and optimized by Google Translation to english, cause my English is not fine and if i write in german, turkish or russian, than the most starts to cry. Why you did not write in english! MicDrop!

PS: So do not need this sub anymore, want say bye and have fun and gut luck to get en ethical openSource heart!

reddit.com
u/Fine_League311 — 1 day ago
▲ 14 r/OpenSourceeAI+1 crossposts

We built Agyn, an open-source Kubernetes-native runtime for AI agents

Hello folks,

I've been working on Agyn, an open-source Kubernetes-native runtime for deploying AI agents on your own infrastructure. Self-hosted, model-agnostic.

When you'd want this: you built different agents for different departments, and the question becomes how to deploy them, provide access for specific teams, and control them at enterprise level.

What it does:
- Define agents in Terraform, deploy to your existing K8s cluster
- Each agent and each mcp in in its own container with separated secrets
- Serverless runtime: agents spin up on demand, scale to zero when idle
- Per-agent / per-team token usage tracking
- OpenZiti overlay so agents reach internal databases without VPNs
  or public exposure
- Ships with pre-built agents: Claude Code, Codex, and our own

Built on Go + Kubernetes, with OpenFGA for ReBAC and OpenZiti for networking. AGPL-3.0.

GitHub: https://github.com/agynio/platform

Would love feedback, especially on deployment of AI agents to K8s

u/Ok-Pepper-2354 — 1 day ago
▲ 36 r/OpenSourceeAI+1 crossposts

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Hey r/DeepSeek,

Who says we need an H100 cluster or the latest expensive GPUs to run frontier MoE models? I wanted to see how far we could push a single node of consumer legacy hardware, so we spent less than $2,500 total to build a budget machine that successfully runs DeepSeek-V4-Flash (284B total, 13B active) locally!

Surprisingly, we managed to hit around 255 prefill tokens/s with a very tight memory budget.

https://preview.redd.it/cfefgc71732h1.png?width=1772&format=png&auto=webp&s=5c673acca7a2a73cfbd0d2059e25102462c56dfc

Here is a quick breakdown of how we achieved this "legacy donkey pulling a massive MoE chariot" feat via hardware-software co-optimization:

⚡️ The Technical Breakthroughs

  1. Custom Turing CUDA Kernels: The 2080 Ti Tensor Cores are still capable, but PCIe Gen3 and VRAM bandwidth are huge bottlenecks. We rewrote custom CUDA kernels tailored specifically for the Turing architecture to accelerate W8A8 (INT8) matrix multiplication, heavily alleviating the bandwidth choke.
  2. Heterogeneous Inference: Optimized static memory splitting and dynamic offloading between the 4x 11/22GB VRAM and 1TB system RAM. 100% of the hardware capacity is utilized.
  3. Computation-Communication Overlap: Implemented a pipelined execution strategy to hide the massive multi-GPU communication overhead caused by MoE routing.

https://preview.redd.it/5ltwol3z632h1.png?width=2414&format=png&auto=webp&s=6c4c4dcf62737f7f5dcb9a5b8d4aa3f422f7edae

🖥️ Budget Hardware Specs

  • CPU: Intel Xeon E5-2696 v4 (The classic budget king for multi-core)
  • GPU: 4x RTX 2080 Ti (11/22GB each)
  • RAM: 1TB DDR4 ECC

The entire implementation, deployment script, and preliminary tech report are 100% open-sourced. I'd love to hear your thoughts, benchmarks, or feedback from fellow system/compiler hackers here!

🔗 **GitHub Repository:**https://github.com/lvyufeng/deepseek-v4-2080ti

(Note: I submitted the detailed report to arXiv a few days ago, but it’s currently caught in the manual moderation queue—likely because a rookie author throwing a 2080 Ti at DeepSeek-V4 triggered their review boundaries lol. Will update with the arXiv link once it's cleared!)

https://reddit.com/link/1ti5sxu/video/uu9ea2l0v62h1/player

https://reddit.com/link/1ti5sxu/video/if6alov1v62h1/player

reddit.com
u/Known_Ice9380 — 2 days ago
▲ 16 r/OpenSourceeAI+4 crossposts

Linki v2 is out, open-source AI SDR for LinkedIn + cold email (big update)

Hey everyone, I built Linki a few months ago as a free self-hosted alternative to Waalaxy and Lemlist. Back then it was a basic LinkedIn sequencer. I just shipped a huge update and it's now a proper AI SDR, so wanted to share what changed.

What is Linki (for those who don't know)

Self-hosted LinkedIn automation + cold email with an AI agent that writes every message for each lead individually. No SaaS middleman, no per-seat pricing, your data stays on your machine. You connect any model via OpenRouter (Claude, GPT-4o, Mistral, whatever).

What's new in this version

The AI agent is now the center of everything. There's a 3-layer prompt system: global context about your business and offer, campaign-level instructions, then per-step prompts. The agent writes with full context instead of just filling a template.

LinkedIn + email in the same campaign now. So you can do visit, connect, wait 2 days, send a LinkedIn message, wait 3 days, send a cold email. All in one sequence.

Unified inbox. All email replies from all your campaigns show up in one place. LinkedIn reply detection too.

Apollo enrichment built in. Connect your Apollo key, click enrich on any list, get verified emails and company data.

Big reliability improvement on the LinkedIn automation itself. Rewrote the DOM targeting and message delivery, about 63% improvement in connection reliability. Also added randomized pacing on imports to avoid bot detection.

AI cost tracking. Every generation is logged with model, token count, and cost. You always know what you're spending.

Hosting

Docker compose or manual Node.js. Or one-click on Opsily if you don't want to deal with the terminal. SQLite, no external DB needed.

Repo: github.com/moaljumaa/linki

Enjoy!!

u/ShakaLaka_Around — 1 day ago
▲ 5 r/OpenSourceeAI+5 crossposts

ipaship AI - adds safety hooks to your llm agent, also audits appstore policies to fasten the launch speed

I was facing problems with adding safety hooks for iOS and Android app submission as they were getting rejected. So, I built an app compliance auditor. https://github.com/atharvnaik1/ipaship-audit

But later on I thought ohh!! Why not create a cli tool, claude skill (ipaship-audit) and a mcp connector which can make every person's llm with safety hooks not just for apps but for every code its written.

You can access it at \~ npm i @async-atharv/ipaship

I have also added kimi and gemini keys with default options.

This audit for secure code, appstore policy compliance, bug fixes and give back REMEDIATION PLAN to your llm agent itself and your llm agent can work on it rapidly on that prompt itself. So no more leaving your IDE or claude code all things handled within the environment you loved 😍 !! ..

u/Topic_Affectionate — 1 day ago
▲ 7 r/OpenSourceeAI+3 crossposts

: I built an AI agent runtime in Go that compiles and tests generated code before delivering it , 35 files, 156 tests, zero dependencies

I've been building ARK (AI Runtime Kernel) for the past 10 months. It's an open-source runtime that sits between your AI agent and the LLM, governing every decision the model makes.

The core idea: models shouldn't control the system. The runtime should.

What it does:

When you ask ARK to write Go code, it doesn't just pass the prompt to GPT and hand you back whatever comes out. The runtime classifies the task, optimizes the prompt, generates the code, then runs a 6-phase verification pipeline before you see anything:

├─ Step 1: ✓ Reasoning verified (confidence: 70%)
│  🧪 Verification: tested (score: 100%)
│  ✅ Compiled        ← go build
│  ✅ Executed         ← go run
│  ✅ Tests passed     ← auto-generated tests
│  ✅ Lint clean       ← go vet

If the code fails compilation, ARK feeds the compiler error back to the model, forces a stronger model, and retries. If it still fails after 2 attempts, it refuses to deliver broken code. It never claims success for code that doesn't compile.

The Go-specific stuff that might interest this community:

The entire runtime is pure Go, zero external dependencies (just stdlib). 35 files, ~16,000 lines, 156 tests, race detector clean. Some things I'm proud of:

  • Weighted tool ranking with 6 signals (relevance, success rate, Bayesian confidence, cost, latency, memory bonus) — all computed in microseconds
  • Context engine that reduces tool schema tokens from 60K to ~93 (99.9% reduction) by only loading relevant tools
  • Per-step model routing: cheap model (gpt-4o-mini) handles tool calls, strong model (gpt-4o) handles reasoning. Cuts costs 80-90%
  • Cognitive Governor that verifies every output with calibrated confidence scores
  • Auto-fix for common model errors in generated Go code (orphan braces, missing error handling) — detects both tab and space indentation
  • Event emitter that writes JSONL for a separate Python memory layer to ingest

Cost: A typical task costs $0.002-$0.005. Not $0.05.

Example output:

go run ./cmd/ark run agent.yaml --task "write a function in Go that reads CSV"

✅ Task completed successfully
Steps: 1 | Tokens: 637 | Time: 5.6s | Cost: $0.002

The generated code compiles, runs, and passes auto-generated tests before you see it.

GitHub: github.com/atripati/ark

I'm a CS undergrad at DePaul in Chicago building this solo. Applied to YC S26 with it. Happy to answer questions about the architecture, the verification pipeline, or why I chose Go for this.

u/Aromatic-Ad-6711 — 2 days ago
▲ 4 r/OpenSourceeAI+2 crossposts

Zero-overhead MoE expert imbalance profiler for vLLM w benchmarks + why we differ from vLLM's built-in EPLB

If you're running a MoE model with --enable-expert-parallel, your experts are probably imbalanced. We measured 7.93× imbalance on Layer 0 of OLMoE with one GPU doing nearly 8× the work. Plumb measures it and fixes it.

What it does:

Hooks into a running vLLM or HuggingFace process via PyTorch hooks, no fork or restart required. Captures per-layer per-expert activation counts and computes imbalance ratios, then produces an expert→GPU placement recommendation.

These are prefill benchmarks (max_tokens=1, ~11 input tokens). Full results across multiple concurrency levels and two models in the repo, including a DeepSeek-V2-Lite run where blind rebalancing made things significantly worse.

On vLLM's native EPLB:

vLLM has its own EPLB and it works. A few differences:

vLLM requires --num-redundant-experts — extra VRAM per EP rank (~2.4GB for DeepSeek-V3). If you're memory constrained it can't run. Plumb has no such requirement.

vLLM's EPLB load-balances but ignores topology — it doesn't know which GPU is closest to which expert, so cross-NUMA dispatch cost stays high. Plumb adds a NUMA fine-tuning pass that pins each layer's hottest experts to same-socket GPUs after running EPLB, which vLLM doesn't do.

vLLM's EPLB also runs unconditionally. We benchmarked blind rebalancing on DeepSeek-V2-Lite — it peaks at 1.5× imbalance because it trained with balance losses, so there's nothing to rebalance. The communication overhead alone pushes p95 +226% at c=16. Plumb checks the imbalance ratio first and won't apply anything without warning you.

GitHub: https://github.com/plumb-moe/plumb

Benchmark scripts and raw data are in the repo.

We're going to be running more benchmarks and trying more strategies over the next few weeks, hope you look forward to those results :)

u/plumb-moe — 1 day ago
▲ 6 r/OpenSourceeAI+1 crossposts

Accepting contributors for our project MagesticAI: web-based AI task management and autonomous agent orchestration

Looking for contributors, reviewers and testers.

I got tired of babysitting coding agents on big features, so I built this project, its a fork / cloud version from the Aperant (former auto claude) project with some power-ups.

v2.2.0 just released
Run it on a Linux OS, Ubuntu on VPS, Container or Bare metal.

About: MagesticAI is a web-based AI task management and autonomous agent orchestration platform that builds software through coordinated AI agent sessions. It uses primarily the Claude Agent SDK to run agents in isolated workspaces with security controls, coordinating multiple AI agents through a structured pipeline to build software autonomously with human oversight.

The core pipeline consists of four specialized agents: the Planner Agent creates implementation plans with subtasks, the Coder Agent implements individual subtasks (and can spawn subagents for parallel work), the QA Reviewer validates acceptance criteria, and the QA Fixer resolves issues in a feedback loop. Each agent operates with role-specific tool permissions and security controls.

Repo: https://github.com/dataseeek/MagesticAI

u/Famous_Move_3591 — 1 day ago
▲ 2 r/OpenSourceeAI+1 crossposts

Open-source an app that’s hidden in screen sharing apps like zoom / anydesk / ms teams

I recently created an app that is visually hidden running in background in Windows.

It has a browser, notes, local chat AI (llm that your system can hold), and local transcription AI that answers your machines audio output.

Should I open-source it?

I have seen lots of apps that works far less but charges 80$ a month.

reddit.com
u/Rough-Obvious — 2 days ago