scythe inspect: find missing FK indexes, RLS-disabled tables, and duplicate indexes in your Postgres schema

scythe inspect: find missing FK indexes, RLS-disabled tables, and duplicate indexes in your Postgres schema

I maintain scythe, an MIT SQL-to-typed-code generator (think sqlc, but for 10 languages), and one part of it turned out genuinely useful on its own for Postgres: scythe inspect.

It connects to a live database and flags operational issues that are easy to miss:

  • Foreign keys with no covering index (the classic silent seq-scan on delete/join).
  • Tables that have RLS policies defined but RLS not actually enabled.
  • Duplicate indexes (same columns, same order) quietly taxing every write.

It reads your schema, not your app code, so it doesn't care which ORM or driver you use. Output is human-readable, or SARIF/JSON to wire into CI. Postgres-only for now (that's where I needed it first).

Repo: https://github.com/Goldziher/scythe (inspect docs under guide/inspect)

Two honest notes: the codegen side is inspired by sqlc, and inspect is young, so I'd like to hear which other checks are worth adding. What schema smells do you wish something caught automatically?

u/Goldziher — 14 hours ago
▲ 9 r/Rag+1 crossposts

I ship one Rust core to 14 languages from a single config

For the last few months I've been building document and RAG infrastructure (crawling, HTML-to-Markdown, extraction) on a Rust core, and shipping each library to Python, Node, Go, Ruby, Java, and a dozen more.

The hard part isn't the Rust. It's producing genuinely idiomatic native packages for every language and keeping them in sync as the core changes. Doing that by hand across 14 targets is a nightmare.

So I built alef: one config generates the bindings and the packaging for all targets straight from the Rust type definitions. It now drives all my polyglot libraries (the crawler, the HTML-to-Markdown engine, an LLM client, a tree-sitter grammar pack). All MIT.

https://github.com/xberg-io/alef

Happy to get into the binding and packaging weeds if you're shipping cross-language libraries.

u/Goldziher — 14 hours ago
▲ 11 r/OpenSourceeAI+1 crossposts

ai-rulez: one source of truth for AI coding rules, generates native configs for 19 tools (Go, MIT)

Every AI coding tool wants its own config file: Claude reads CLAUDE.md, Cursor wants .cursor/rules, Copilot expects .github/copilot-instructions.md, and so on. Use more than one and you're maintaining duplicates that drift.

ai-rulez keeps one source. You write rules, context, agents, and commands once in .ai-rulez/, run generate, and it emits each tool's native format for 19 platforms. Two things make it hold up on real projects:

  • Composition over git: [[includes]] pull shared rule modules from other repos, so org-wide standards live in one place and every repo overrides locally as needed.
  • Monorepos: nested configs plus generate --recursive, profiles per audience, and 33 builtin domains (languages, security, testing, git-workflow and more) you switch on instead of writing from scratch.

Concrete: in one of my repos a ~25-line config expands into 103 generated files across 5 tools, regenerated on every commit via a pre-commit hook, so nothing drifts.

Single Go binary: npx ai-rulez@latest init, or brew. MIT.

Honest tradeoff: outputs are generated, so you edit the source and never the outputs (they get overwritten).

https://github.com/Goldziher/ai-rulez

How are others managing rules across multiple AI tools?

u/Goldziher — 1 day ago
▲ 14 r/OpenSourceAI+2 crossposts

basemind: an MCP server that indexes your repo so agents answer from signatures, not full file reads

I kept watching coding agents answer "what calls this function" by grepping, opening three files, and reading them top to bottom to find four call sites. On a big repo that eats the context window fast.

basemind indexes a repo once and answers structurally. The MCP tools return paths, line numbers, and signatures instead of file bodies, so a lookup costs a fraction of reading the source. What it exposes:

  • Code map (300+ languages): outline, search_symbols, find_references, find_callers, call_graph, find_implementations. An expand escape hatch pulls a single function's full body when the agent actually needs it.
  • Git at symbol resolution: blame_symbol, symbol_history (when a symbol's body changed), recent_changes, diff_outline.
  • Document RAG over 90+ formats with text extraction and OCR built in, plus semantic and full-text search.
  • Shared memory and an agent-to-agent comms channel (rooms, DMs, inbox) for running more than one agent on the same repo.

Runs three ways over one local index: a Claude Code plugin, a plain MCP server, or a CLI. Works with Claude Code, Codex, Cursor, Gemini CLI, Copilot CLI, OpenCode and a few others. Rust, MIT.

On token savings: it ships a heuristic counter (an outline is modelled at about 1/5 of reading the file, a caller lookup about 1/3 of grep plus read). It's an honest estimate, not a benchmark, and tools with no fair baseline (memory, git wrappers) count zero.

Honest limitations: it's an index, so it lags edits between scans. serve watches by default and there's a rescan, but a cold first scan is slower (worst case in my tests is the TypeScript compiler, 81k files, about 18s), and the git-history index costs 6 to 22% of your .git on disk.

https://github.com/Goldziher/basemind

Curious how others here are feeding repo structure to agents over MCP.

u/Goldziher — 1 day ago

Introducing H2M, Crawlberg, TSLP, LiterLLM and the upcoming Xberg release

Hi all,

Yesterday someone opened a PR adding an unofficial Dart binding to our html-to-markdown library, and it made me realize I never actually announced that we now ship Dart packages.

Some backstory: a while ago I ran into someone name-squatting our Kreuzberg package on pub.dev. I got the name back eventually (thanks, pub.dev support). We're also rebranding to Xberg - I'll announce that properly once it's officially out, and it'll have full Dart support.

For the past 2-3 months I've been extending our polyglot libraries with Dart support using flutter_rust_bridge (thanks fzyzcjy). Four are now on pub.dev, all MIT:

  1. h2m (source) — a high-fidelity (precise, standards-compliant) HTML-to-Markdown converter that covers the full HTML5 semantic spec. It's built for speed and handles essentially all real-world HTML.

  2. tree_sitter_language_pack (source) — the largest collection of tree-sitter grammars I know of, all permissive OSS. It's unusual in that it loads grammars dynamically: we build both static and dynamic-linking binaries for every grammar we can find on GitHub, currently 306 programming and data languages/formats. Useful for code inspection, AI "code intelligence", linters, syntax highlighting, and so on. I use it in several of my own projects for the code-understanding side, and Xberg uses it for code chunking.

  3. crawlberg (source) — a crawling engine, now at v1 (stable). For whenever you need to crawl content on demand — agent workflows, for example, or apps that need to fetch and process pages.

  4. liter-llm — a Rust port of Python's litellm. It's an abstraction (with an optional proxy layer) over a large number of LLM providers.

All four share the same core philosophy:

  1. The core is written in Rust, on top of well-maintained, permissively-licensed OSS (MIT, BSD, etc.).
  2. We generate the binding glue and idiomatic Dart directly from the Rust types.
  3. We build powerful open-source primitives and put our commercial value around and on top of them — i.e. we dog-food our own OSS.

Open source is something I genuinely love, and in a world where agentic coding is becoming the norm it makes software more robust: users battle-test it, which lets us move fast and build better. So any feedback, bug reports, or API suggestions are very welcome.

u/Goldziher — 4 days ago
▲ 105 r/Python

Tip: use msgspec for JSON decoding — it decodes straight into your type at C speed

A tip that's saved us a lot of boilerplate across our Python stack (Litestar, and our document-extraction tooling): stop decoding JSON into dict[str, Any] and casting/.get()-ing your way through it. Decode straight into your declared type.

msgspec validates and decodes directly into your type at C speed. Quick comparison of the usual options on the same payload:

  • json.loads / orjson.loads -> dict[str, Any] (cast and pray; orjson just faster)
  • pydantic TypeAdapter(...).validate_json -> your model, validated + rich, but heavier
  • msgspec.json.decode(raw, type=T) -> your type, validated, C-fast

pydantic does far more and its Rust core is fast; for model-heavy code it's still my default. But on hot paths where you just need decode-into-a-struct, a C decoder going straight to the type is hard to beat.

With PEP 695 generics the whole (de)serialization layer collapses to one function:

def deserialize[T](raw: bytes, t: type[T]) -> T:
    return msgspec.json.decode(raw, type=t, strict=False)

deserialize(raw, Grant)        # -> Grant
deserialize(raw, list[Grant])  # -> list[Grant]

We landed on this while building Litestar (msgspec is a big reason it's fast) and reuse it across everything now. How do you handle hot-path decoding — msgspec, orjson + manual validation, or full pydantic?

reddit.com
u/Goldziher — 5 days ago
▲ 7 r/litrpg

Look for recommendations

Hi there,

Need some recommendations.

Here are series I like:

- Defiance of the Fall

- Dungeon Crawler Carl

- Cradle

- Elydes

- The Legend of Randidly Ghosthound

- Bog Standard Isekai

- The Good Guys

- Book of the Dead

- Mother of Learning

- Path of Dragons

- A Soldier's Life

- Ironbound

- The Perfect Run

- Loopbreaker

- He Who Fights with Monsters

Series I DNFed in the middle:

- The Primal Hunter (DNF book 5)

- Beward of Chicken (DNF book 4)

- Victor of Tucson (DNF Book 3)

- Mark of the Fool (DNF Book 9)

- Return of the Runebound Professor (DNF Book 3)

- The Last Horizon (DNF Book 2)

- Dissonance (DNF book 5)

- Trysmoon Saga (DNF book 3)

- Ultimate Level 1 (DNF book 4)

Series I DNFed at book 1:

- Mage Tank

- Hell Difficulty Tutorial

- World Sphere

- System Universe

- 1% Lifesteal

- Azarinth Healer

- A thousand Li

On my table:

- Chrysalis

- Monsters and Legends

---

What I like:

- LitRPG that takes itself seriously (can be with humor of course, but still seriously).

- Well written, with a good plot -- preferably.

- Some degree of mystery and magic that is, to some extent at least, magical.

- Characters who are not cardboard cutouts.

- Preferably something which tries to be fantastic as well.

- Grit / harshness. This one is not a strict required, for example - Beware of Chicken was pretty great, until I got tired of its schtick.

I would love some recommendations. Not trying to offend anyone- my taste is my own.

I might give a chance to some of the stuff I DNFed, if it fits the criteria.

P.S. I was thinking of doing a tier list, hope above list is acceptable. I

reddit.com
u/Goldziher — 6 days ago
▲ 4 r/AIDeveloperNews+1 crossposts

Base mind: AI Context and Communication Layer

I am happy to introduce basemind - a high performance, local first, AI context and communication layer.

Basemind packs a mighty punch:

* map massive code bases in seconds

* millisecond speed code search across 300+ languages

* parse and extract 90+ document formats, making any agent a document intelligence powerhouse using Kreuzberg

* semantic and free text search

* plugins for all major coding agents, extensive MCP support + CLI

* git history and analysis tools

* code aware token compression and reduction

* inter-agent communication (different agents - in the same machine, can talk with each other)

* .... many more

Check it out!

Repo: https://github.com/Goldziher/basemind

u/Goldziher — 13 days ago
▲ 49 r/elixir+8 crossposts

BaseMind: MIT Licensed AI Context Layer

Hi Peeps,

I'm an open-source maintainer (Goldziher on Github) and the CTO of kreuzberg.dev.

I published basemind — an MIT licensed pure-Rust AI context layer for agents.

The goal of basemind is to allow agents to work on large codebases, generating maps of code, and processing files (code, documents etc.) at high speed - while saving on tokens. The tool has extensive caching capabilities, and it dramatically saves on tokens, enhances precision and offers a wide range of tools:

  1. tree-sitter based code mapping and search for 300+ languages
  2. document extraction, processing and ML for above 90 file formats
  3. on demand fast web crawling.
  4. git intelligence and analytics
  5. localized RAG

And more. I have been dog fooding it for a while, and I like it very much.

I'd be happy for any feedback.

u/Goldziher — 17 days ago

Name squatting on pub.dev

Hi there,

I'm the author of Kreuzberg.

I am working on our v5.0.0 - doing rc.* dry runs. I have been working for a while now on adding Dart support - and Android / iOS natives. I was publishing rc.1 on the CLI and discovered that yesterday someone published a fork and squatted the pub.dev/kreuzberg namespace. Maybe he didn't have ill intents, just wanted this package and was obtuse. I dunno, but I gotta say this is pretty infuriating (felt like a blow). He didn't open an issue or ask for permission, he just forked and did this.

What can I do? I sent an email to support@pub.dev. But I am afraid this will kill our velocity and release planning. Please advise.

P.S. support welcome.


Edit:

  1. This person has done it to another package of ours. He also opened a PR trying to add it a git submodule in our repo. That was his communication: https://github.com/kreuzberg-dev/html-to-markdown/pull/349

  2. I got an email from pub.dev - they thumb stoned Kreuzberg for now. TBD.

u/Goldziher — 2 months ago
▲ 4 r/litrpg

Hi there,

Need some recommendations.

Currently listening to the new Path of Dragons volume 4. After this I will be out and need some more recommendation.

Here are series I like:

  • Defiance of the Fall
  • Dungeon Crawler Carl
  • Cradle
  • Elydes
  • The Legend of Randidly Ghosthound
  • Bog Standard Isekai
  • The Good Guys
  • Book of the Dead
  • Mother of Learning
  • Path of Dragons
  • A Soldier's Life
  • Ironbound
  • The Perfect Run

Series I DNFed in the middle:

  • The Primal Hunter (DNF book 5)
  • Beward of Chicken (DNF book 4)
  • Victor of Tucson (DNF Book 3)
  • Mark of the Fool (DNF Book 9)
  • Return of the Runebound Professor (DNF Book 3)
  • The Last Horizon (DNF Book 2)

Series I DNFed at book 1:

  • Mage Tank
  • Hell Difficulty Tutorial
  • World Sphere
  • System Universe
  • 1% Lifesteal
  • Azarinth Healer

On my table:

  • Chrysalis

What I like:

  • LitRPG that takes itself seriously (can be with humor of course, but still seriously).
  • Well written, with a good plot -- preferably.
  • Some degree of mystery and magic that is, to some extent at least, magical.
  • Characters who are not cardboard cutouts.
  • Preferably something which tries to be fantastic as well.
  • Grit / harshness. This one is not a strict required, for example - Beware of Chicken was pretty great, until I got tired of its schtick.

I would love some recommendations. Not trying to offend anyone- my taste is my own.

I might give a chance to some of the stuff I DNFed, if it fits the criteria.

P.S. I was thinking of doing a tier list, hope above list is acceptable.

reddit.com
u/Goldziher — 2 months ago