r/semanticweb

▲ 1 r/semanticweb+1 crossposts

Open Album Technical Metadata Standard (OATMS): New open standard proposal

Hi everyone,I’m proposing a new open standard called the Open Album Technical Metadata Standard (OATMS).The goal is simple: give listeners, especially audiophiles, clear and standardized technical information about how an album was mastered — things like integrated loudness (LUFS), True Peak, dynamic range, frequency response extension, and basic spectral balance.Right now there’s no consistent, easy-to-read technical data sheet that travels with albums. Mastering engineers already measure most of this data, but it’s rarely shared with the public in a useful format.What I’m suggesting:

  • A simple, open standard for a “Technical Data Sheet” that can be included with releases (digital booklets, Bandcamp, hi-res stores, etc.).
  • A free tool to help generate clean, professional-looking versions of these sheets.
  • The core standard itself will remain fully open and free for anyone to use or build upon.

This is still in the very early stages. I’m looking for feedback from mastering engineers, artists, labels, and audiophiles on what data would actually be useful and how it should be presented.If you’re interested in the idea, have thoughts on the spec, or would like to get involved, feel free to reply here or reach out.More details and the draft spec will be published soon. Thanks Alex D

reddit.com
u/ADDproblem — 7 hours ago
▲ 11 r/semanticweb+1 crossposts

In-process and in-memory graph database for large knowledge graphs - no server needed with TuringDB v1.31

Hey again! Adam from TuringDB, posted here a few months back when we launched the community version.

Quick update on something we just shipped: in-process mode.

You can now embed TuringDB directly in your script or pipeline - no separate server, no socket, no daemon to manage. Just instantiate and query:

In python

from turingdb import TuringDB

db = TuringDB() db.load_graph('my_knowledge_graph') db.set_graph('my_knowledge_graph')

df = db.query('MATCH (n)-->(m) RETURN n,m') print(df)

Results back as a DataFrame, zero networking to manage.

Practically this means: if you're running a KG pipeline, a GraphRAG system, or just iterating locally on a large graph - you no longer need to spin up an instance of TuringDB to use it. It runs where your code runs.

Everything else from the previous post still applies - git-style versioning, zero-lock reads, vector search, Cypher. This just removes the last friction point for local and embedded workflows.

Docs at docs.turingdb.ai and source at github.com/turing-db/turingdb

Happy to answer questions 🙂

reddit.com
u/adambio — 3 days ago
▲ 21 r/semanticweb+3 crossposts

Knowledge Graphs to tackle the problem of searching code and documentation again and again with help of Mnemo

This is what your codebase actually looks like.

2032 nodes. 2878 edges. 7 relationship types.

Every service. Every dependency. Every API. Every owner. Every connection your team built over years — visualised in one graph.

Most AI coding assistants see none of this.

They see the file you have open.
Maybe the files you paste in.
Nothing else.

So when they generate code, they generate it blind.
No knowledge of what depends on what.
No knowledge of what breaks if you change something.
No knowledge of the relationships your team spent years building.

This is the real problem with AI in enterprise development.
It's not capability. The models are powerful.

It's context. AI operates on a fraction of the knowledge your senior engineers carry in their heads.

Mnemo builds this knowledge graph automatically from your codebase.

Services and their boundaries.
APIs and their consumers.
Dependencies and their blast radius.
Files and their owners.
Decisions and their history.

And then makes all of it available to your AI assistant — automatically, on every session.

No more blind generation.
No more code that compiles but breaks something downstream.
No more AI that doesn't know why things are the way they are.

This is what AI-assisted development should actually look like.

🔗 github.com/Mnemo-mcp/Mnemo

Drop a comment if you've ever had AI break something it didn't know existed.

u/killerexelon — 9 days ago

Protégé Short Course at Stanford: hands-on OWL ontology development with Protégé

Hi r/semanticweb — I’m part of the Protégé team at Stanford, and I wanted to share that we’re running the Protégé Short Course this June.

It’s a hands-on introduction to ontology development with OWL 2 and Protégé. The course is aimed at beginners as well as intermediate users who want a deeper grounding in OWL ontologies, reasoning, querying, and practical ontology-engineering workflows.

Participants receive course materials, including a 221-page hands-on manual developed by the Protégé team, with walkthroughs, diagrams, quizzes, and more than 100 practical exercises.

Early-bird registration is available until May 23.

Details are here:

https://protege.stanford.edu/shortcourse/

Happy to answer questions about the course, the intended audience, or what topics are covered.

Matthew

reddit.com
u/MatthewH2 — 8 days ago
▲ 6 r/semanticweb+1 crossposts

How to turn a messy SQL schema into a domain ontology — the 4-step process I use

Our schema had 47 tables. Our Confluence had 200 pages. Neither told us what the business actually did. A column named status appeared in 11 different tables. In 3 of them it meant completely different things. Nobody caught it for 4 years because the documentation was written by whoever built the table, never reconciled, and last updated in 2021. We fixed it by building a domain ontology directly from the schema. Not a data dictionary. Not an ER diagram. An actual ontology — where every concept has a formal definition, every relationship has a direction, and every uncertainty is explicitly labeled instead of silently papered over. Here's the process, because I've never seen it written down clearly.

Step 1: Classify what your tables actually are Before you touch any columns, you need to decide what role each table plays. Four categories cover almost everything:

Entity table → a thing that persists (Customer, Order, Product) Event/audit table → something that happened (OrderStatusChange, LoginAttempt) Junction/bridge table → a many-to-many relationship between entities Lookup/code table → a controlled vocabulary (StatusCodes, CountryCodes)

Most schemas are a mix, and the confusion comes from tables that look like entities but are actually event logs — or vice versa. In our case, three tables we'd been treating as entities were actually event logs with no primary entity attached. That was hiding half our business process from our data model.

Step 2: Classify your columns as properties or relations Two types:

Data property — a value attached to the entity (name, amount, timestamp) Object property — a link to another entity (foreign key)

The interesting column is status. If status is a FK into a lookup table, it's an object property — your entity has a relationship to a state. If it's a plain string like 'active'/'cancelled', you now need to decide: is that a value partition (enum) or are these actually instances of a State class with their own logic? That distinction changes your downstream queries, your event modeling, and whether your ML features are leaking state information they shouldn't have.

Step 3: Tag everything as Evidence, Hypothesis, or Gap This is the step nobody does and the reason data models drift.

Evidence: directly confirmed from the schema or from code (orders.customer_id is a FK → confirmed relation) Hypothesis: inferred but not confirmed ("the cancelled_at timestamp implies a Cancellation event class") Gap: explicitly missing ("no timestamp exists for the Approval transition — we cannot reconstruct approval history")

The Gaps are the most valuable output. They tell you exactly what your schema can't answer. Before we ran this process, we thought our schema had full order lifecycle coverage. After: we found 6 state transitions with no timestamp, meaning we had been silently reporting incorrect cycle times for 2 years.

Step 4: Reconcile the inconsistencies explicitly The status problem I mentioned? Once you've typed every table and classified every column, you run a simple check: any column with the same name that maps to a different primitive type across tables is an inconsistency that needs a formal resolution. In our case:

orders.status → State (current condition of an entity) payments.status → Event outcome (result of a completed process) users.status → Role flag (operational classification, not a state machine)

Three different semantic meanings. Same column name. One fix: rename them and add the reconciliation note to the ontology as a documented decision, not a silent rename in a migration script.

What changed after doing this Our data contracts got sharper because the ontology is the schema documentation — not a separate artifact that drifts. New engineers onboard to the domain model, not 200 Confluence pages. And when we get a question like "how long does an order stay in approval?" we can immediately tell them whether our schema can answer it or not, rather than spending a week on a query that returns wrong data. The process takes longer upfront. It's worth it.

What's the worst case of documentation-reality drift you've hit in a schema you inherited?

reddit.com
u/Critical-Elephant630 — 9 days ago

I've been working on a semantic architecture called the Concept Library.

The core idea is simple: meaning and intelligence should be structurally separated.

- Concept layer = what something is.

Immutable definition + multimodal signatures (acoustic, visual, signal, haptic, chemical, EM).

No logic, no thresholds, no inter‑concept references.

- Control layer = decides what an input matches, using concepts as anchors.

Fully auditable. All reasoning lives here.

A CLF (Concept Library File) is the atomic unit: one concept, defined once, never changed.

Whether something qualifies as an instance is never encoded in the concept file — only in the control layer.

I just published a reference implementation of the control layer (clfcontrollayer_v1.py) with a runnable demo.

It loads any CLF concept folder, accepts multimodal queries, and returns the best match with a full semantic audit trail.

No external dependencies.

`

git clone https://github.com/pekkalepola/colibri-clf

`

The white paper is in the repo if you want the full theoretical foundation, architectural consequences, and EU AI Act implications.

reddit.com
u/Colibri-Standard — 13 days ago