u/gimalay — reddlx

iwe update — frontmatter schema inference, djot support, and semantic line breaks

A few weeks back I posted an intro to iwe — a single Rust binary that treats a folder of .md files as a queryable database: Mongo-style filters over frontmatter, graph operators over links, bulk update/delete with --dry-run. No daemon, no database, just files.

Since then a few releases landed, plus there's one command I never showed. The parts this sub might care about:

iwe schema — what's actually in your frontmatter? After a corpus grows for a year, nobody remembers which fields exist, which are always set, and what values they hold. schema scans every file and reports per-field types, coverage, and value distribution:

$ iwe schema
| Field    | Types                    | Coverage   | Distinct | Values                              |
| -------- | ------------------------ | ---------- | -------- | ----------------------------------- |
| created  | date (100%)              | 214 (100%) | 89       | 2026-06-12 (6), 2026-05-30 (5), ... |
| priority | number (94%), null (6%)  | 214 (100%) | 9        | 5 (61), 3 (48), 8 (22), ...         |
| status   | string (100%)            | 187 (87%)  | 4        | done (95), draft (51), Draft (3)... |
| owner    | string (71%), null (29%) | 132 (62%)  | 12       | alice (40), bob (31), ...           |

--field status drills into one field; -f json for scripting; the same --filter flags scope it to a subset. It closes the loop with the query side — schema shows you that status holds both draft and Draft, then:

iwe update --filter 'status: Draft' --set status=draft

djot support. Set format = "djot" in the config and the whole toolchain — normalize, find, export, new — reads and writes djot (.dj files) instead of markdown.

Semantic line breaks survive the formatter. If you write one sentence per line (great for git diff), iwe normalize used to join them into a single paragraph line. New preserve_newlines config option keeps your line structure while still fixing headers, lists, and link titles.

Wiki-link path rewriting. wiki_link_path = "short" makes normalize rewrite every wiki link to the shortest unambiguous suffix; "full" expands to complete paths; "preserve" (default) leaves them as typed. One corpus-wide pass, consistent links.

Formatter hardening. Escaped literals (\*not emphasis\*) survive normalization, [X] task markers lowercase to [x], list items with code blocks or tables render loose instead of glued together, and a config option gives MkDocs-style 4-space list indentation if that's your ecosystem.

Install: cargo install iwe (or brew install iwe-org/iwe/iwe). Repo: <https://github.com/iwe-org/iwe>

Curious what people here would want from the schema side — validation against a declared schema is the obvious next step, but I'd rather hear real use cases first.

reddit.com

u/gimalay — 3 days ago

▲ 1 r/coolgithubprojects

iwe — Rust CLI + LSP that turns a markdown folder into a queryable graph

iwe is a single Rust binary that treats a directory of .md files as a knowledge graph — both a CLI and an LSP server out of the same install.

What it gives you:

A query language for your notes. iwe find --filter 'status: draft, priority: {$gte: 8}' over frontmatter. Walks markdown links as graph edges (--references, --included-by) — the same predicates drive count, update, delete.
Bulk refactoring. iwe normalize cleans link titles, header levels, list numbering across thousands of files in under a second. iwe rename moves a doc and updates every reference.
Editor integration via LSP. Go-to-definition follows links, references show backlinks, completion suggests link targets. Works in VSCode, Neovim, Helix, Zed.
Useful CLI extras. iwe stats, iwe squash (collapse a subtree into one doc), iwe export dot (Graphviz visualization), iwe extract / iwe inline for structural refactoring.

No database, no daemon, no sync — your notes stay plain markdown. Git is the audit log.

Install: cargo install iwe or brew install iwe-org/tap/iwe.

Repo: <https://github.com/iwe-org/iwe> · Docs: <https://iwe.md>

u/gimalay — 2 months ago

▲ 4 r/AI_Agents

I've been giving my coding agent access to a folder of markdown files as its long-term memory. It works surprisingly well for open-ended questions — "why did we choose Postgres over DynamoDB?" or "what's the context behind the auth rewrite?" The agent finds the right document, reads it, gives a solid answer.

Then my teammate asked: "Which of our API decisions are still in draft status?"

The agent read through every decision document. It took 40 seconds. It missed two because the word "draft" didn't appear in the body — I'd just never gotten around to finishing them. It hallucinated one as "draft" because the text said "this approach is still a draft idea" in a different context.

The failure mode was obvious once I saw it: I was asking a structured question against unstructured data. The agent had to parse natural language to extract what was essentially a database query. Of course it got it wrong.

The fix was adding YAML frontmatter to every document:

---
title: "Use Postgres for the event store"
type: decision
status: accepted
domain: infrastructure
created: 2026-01-15
---

Now every document carries its own metadata as machine-readable fields — not buried in prose where the agent has to guess. Status, type, domain, dates, relationships — all queryable.

The query that previously took 40 seconds and got it wrong:

iwe find --filter 'status: draft' --project title,domain,created -f json

Instant. Correct. No token cost.

Once I started modeling metadata this way, a whole class of questions that used to require the agent to "think" became trivial lookups:

iwe find --filter '{type: decision, domain: infrastructure}' --project title,status -f json

iwe count --filter 'status: draft'

iwe find --filter '{status: published, created: { $gte: "2026-04-01" }}' \
  --sort created:-1 --project title,domain -f json

The pattern that emerged: there are two kinds of questions you ask a knowledge base.

Navigational questions — "tell me about X" — where you want the agent to read documents and synthesize an answer. Full-text retrieval works fine for these. The content matters.

Structured questions — "how many X are in state Y" — where the answer is a filter, a count, or a sort. These should never touch the LLM at all. They're database queries. If your knowledge base can't answer them without reading every document, you're missing a layer.

Frontmatter is that layer. It turns each document into a row with typed columns, while keeping the body as freeform prose for the navigational questions. The agent uses CLI queries for structured questions and document retrieval for everything else.

The tradeoffs:

You have to define a schema and maintain it. If you're sloppy about filling in frontmatter, the queries return garbage. Garbage in, garbage out.
There's upfront work to retrofit existing documents. But here's where fast, cheap models shine — I pointed a fast, cheap model at each document with a simple prompt: "read this document and extract these fields: type, status, domain, created date. Return YAML." It costs almost nothing per document and it's surprisingly accurate for structured extraction. I ran it over my whole KB in under a minute for a few cents. The fast models aren't great at reasoning over your whole knowledge base, but they're perfect at reading one document and pulling out metadata. I spot-checked maybe 10% and fixed a handful of errors. Way faster than tagging everything by hand.
You need a tool that can query frontmatter. I use IWE which has a CLI with filter, projection, and sort — but you could build something similar with any YAML parser and a bit of scripting.

Here's the workflow that actually made this practical:

Design the schema with a smart model. I sat down with a capable model and described my knowledge base — what kinds of documents I have, what questions I want to ask, what dimensions matter. In about ten minutes of back and forth, we landed on a schema: type, status, domain, priority, created date. The smart model is good at this — it asks "do you ever need to filter by X?" and you realize yes, you do. You wouldn't think of half the fields on your own.

Deploy a swarm of fast agents to populate it. Once the schema is locked, you don't need a smart model to fill it in. I pointed a fast model at every document — one doc per call, same prompt: "read this and extract these fields as YAML frontmatter." Under a minute, a few cents total. Fast models are perfect for structured extraction from a single document. They don't need to reason across your whole knowledge base — they just need to read one file and pull out values. I spot-checked maybe 10% and fixed a handful of errors.

Start querying. Now the questions that used to require the agent to read everything and guess become precise, instant lookups:

iwe count --filter 'status: draft'

iwe find --filter '{status: accepted, domain: infrastructure}' \
  --project title,priority,created --sort priority:-1 -f json

iwe find --filter '{priority: { $gte: 3 }, status: draft}' \
  --project title,domain --sort created:-1 -f json

Counts, filters, sorts, projections — all against frontmatter fields, no tokens burned reading document bodies.

The thing I didn't expect: the agent started maintaining the schema better than I did. I give it a system prompt instruction — when you create a new document, always include frontmatter with these fields. It's more consistent about it than I am. And auditing for gaps is just another query:

iwe find --filter '{type: decision, domain: null}'
iwe find --filter '{type: decision, priority: null}'

No reading. No guessing. Just: which documents am I forgetting to tag?

The meta-realization: the expensive model designs the schema, the cheap models populate it, and after that most structured questions don't need an LLM at all — they're just queries. You're paying for intelligence exactly where it matters and using deterministic lookups everywhere else.

Curious if others have landed on a similar split, or if you're handling structured questions differently.

reddit.com

u/gimalay — 2 months ago

▲ 124 r/HelixEditor+4 crossposts

Most of my workflow already lives in Neovim — code, prose, notes, scratchpads. The piece that always lagged was querying the notes. Plenty of tools let me grep them; almost none let me ask things like "all the drafts under tasks/q2 that link to people/alice" without leaving the buffer.

Turns out you can. IWE is a Rust binary (LSP server + CLI) that treats a directory of .md files as a queryable graph. Install once, use it from the editor over LSP and from the shell over :!.

The query language is small and reads like Mongo's:

iwe find --filter 'status: draft, priority: {$gte: 8}'

iwe find --filter 'author.email: {$exists: true}'

Frontmatter is the schema. Markdown links are the relationships — and there are two kinds, which the engine actually distinguishes:

An inline link in body text is a reference: "see also."
A markdown link alone on its own line is an inclusion link: containment. The linked document becomes a structural child of this one.

Each gets its own pair of operators:

iwe find --references people/alice # docs that link to Alice inline
iwe find --included-by tasks/alpha:0 # everything under alpha's tree (unbounded)
iwe find --included-by tasks/alpha:0 --references people/dmytro --filter 'status: draft'

That last line: drafts under the tasks/alpha subtree that also mention people/dmytro inline. Three relationships, three flags.

The same predicates drive iwe count, iwe update, iwe delete. Bulk-set frontmatter from the shell:

iwe update --filter 'status: draft, reviewed: true' \
--set status=published \
--set published_at=2026-05-02

update and delete require an explicit --filter (no accidental whole-corpus rewrites). --dry-run previews.

From inside Neovim, this composes two ways.

The same iwe binary is also a markdown LSP server, so the editing side feels like working in code:

gd — jump to linked notes
gr — find references / backlinks
K — hover preview of a linked note without opening it
Code actions for extract section to a new file, inline a referenced note, rename
Auto-complete for link targets as you type
Inlay hints showing parent context and link counts

There's a dedicated plugin — iwe.nvim — that wires the LSP up and adds Telescope integration with hierarchical path search (notes show as Journal ⇒ 2026 ⇒ Week 18 ⇒ Mon notes). Lazy / packer / vim-plug all work.

For querying, you don't need a special integration — the CLI is enough:

Output is plain text — pipe to jq, fzf, telescope, whatever.

Same install handles both: cargo install iwe and you have the LSP server + the CLI. The LSP runs against any folder of .md files; the CLI queries the same folder.

Side note: this also turns out to be a decent shape for AI agents. They use the same CLI you do, see the same files, and git log is your audit trail for whatever they touch.

Repo: https://github.com/iwe-org/iwe · Plugin: https://github.com/iwe-org/iwe.nvim

Curious what the heavy notes-in-Neovim crowd thinks, especially on the inclusion-vs-reference link split.

u/gimalay — 3 days ago