r/semanticweb

How can I match bunch of elements to canonical products which is unknown? (Entity Resolution)

The problem is simple but solution is not. ChatGPT doesn't really give an answer.

What i want is to group the "apple" together, strawberry together in a big corpus of data.

These are also noisy and really different since there is shiny apple, blue apple etc..

And another problem is that i don't have an exact name called "apple", i want the program to find the canonical entities by itself without having an input, it is not a zero-shot thing.

What should i do?

reddit.com

u/Interesting_North293 — 3 days ago

▲ 13 r/semanticweb+1 crossposts

I built PurRDF, a working RDF 1.2 toolkit for Rust, Python, JS/WASM, and C — looking for RDF-star edge cases

Got tired of waiting for RDF1.2 to finalize as a spec, got fed up with the Java tools, needed something higher-performance in Rust that I could also use from Python and WASM.

PurrRDF was born. It's not quite a full rdflib replacement for Python, but it has built-in ShACL and ShEx for validation and speaks all the common variants. I'm spinning this out of a larger project that's building a full RDF1.2 Rust tool stack - it runs, it's fast and probably useful to anyone building high-performance RDF1.2/RDF* knowledge graphs (if you are, you'll know the pain!)

Comments, feedback, test cases, etc. welcome: https://github.com/Blackcat-Informatics/purrdf/

u/paudley — 3 days ago

▲ 4 r/semanticweb

I published the first open crosswalk between IES and HQDM (two UK government 4D upper ontologies), including the divergences that trip up a naive mapping

I kept running into the fact that the UK has two open 4D upper ontologies in active government use, from the same BORO / ISO 15926 lineage, with no published mapping between them:

IES (Information Exchange Standard): the RDF ontology used for UK national-security and defence data exchange. Open Government Licence, now stewarded by a cross-government working group.
HQDM: Matthew West's 4D model (the one behind the National Digital Twin's Foundation Data Model). Apache-2.0, published by GCHQ.

So I built an open crosswalk and released it. What might interest this sub is less the backbone matches and more where the two disagree, because that is where anyone reasoning across both silently gets it wrong:

ies:Event is not hqdm:event. In IES an Event is a happening with participants, so its real counterpart is hqdm:activity. hqdm:event is an instantaneous boundary point. A label-matcher aligns them and maps a durative occurrence onto a zero-duration point.
Temporal boundaries are a State in IES (ies:BoundingState) but a point event in HQDM (hqdm:event via beginning/ending). Same job, different category.
ies:State sits as a top-level root; hqdm:state is under spatio_temporal_extent. Reasoning that relies on state ⊑ spatio_temporal_extent breaks on the IES side.
Participation is the clean convergence: both model it as a state (ies:EventParticipant ⊑ State, hqdm:participant as a state_of), inherited from the shared BORO commitment.

The correspondences are in SSSOM and RDF with PROV-O provenance, validated with SHACL (the pipeline uses embedding candidate generation then fuzzy-logic adjudication, in the LLMs4OM / FLORA line). Every IRI resolves against the live published ontologies. There is also a worked example grounding an autonomous sensor node (SAPIENT / BSI Flex 335) in an IES-typed world model, which is the practical reason I care: you cannot assure an agent against a world model you have not agreed on.

Repo: https://github.com/fabio-rovai/ies-hqdm-crosswalk

It is v0.1. The most useful thing anyone here could do is tell me a correspondence I got wrong, or a divergence I missed. Disclosure: this is my own work (Tesseract Academy), released open under CC-BY.

Has anyone tried aligning two 4D / perdurantist upper ontologies before? Curious whether the Event/activity trap shows up between other BORO-derived models.

u/Successful-Farm5339 — 4 days ago

▲ 10 r/semanticweb+1 crossposts

Which Data platform is best suited for building ontologies?

A few capabilities that are ideal.

- data lives in multiple places so a federated ontology network is ideal
- use claude/cursor to query enterprise wide datasets
- ACLs, fine grained controls are a must

reddit.com

u/Alternative-Fig-6465 — 5 days ago

▲ 8 r/semanticweb+2 crossposts

Work Ontology (Expanded)

Hello,

Last week I posted a vague description of a work ontology that I've been building for the better part of a year. I wanted to give a few more descriptive details to fill in the blanks so that you all can have your pick at it.

Purpose:

To create a work ontology that allows for a user to understand the nature of the work being done in their organization (or by themselves) relative to all meaningful work that exists (with obvious restrictions for a one-man operation). This understanding is achieved only through a computational representation of work data into units called work primitives. Primitives are, in a basic sense, with variables attached (metadata) that give each unit a unique identity. The relationship of primitives to each other and to each higher level of work (task, job, occupation, industry, domain) gives our dataset features that enable a variety of downstream uses (briefly mentioned at the end).

Example:

In a practical sense, here's what one of the process features we can do:

1.) Take a job description. Here's the link for this one: (https://www.indeed.com/?\_\_cf\_chl\_f\_tk=IAgsTAeXWy4IHqrltCOc8fcZ7dK9M798G39ZD.ZfHbE-1782824832-1.0.1.1-9m4d6ttvSNizRouuHgwdXP4\_8J.2hszUsBfHdMlLikk)

https://preview.redd.it/iotbq27smfah1.png?width=735&format=png&auto=webp&s=019aa792138c73f90dc539d0c770b3d0cf8bcc02

2.) Parse the text out so it's able to be matched using the program. Here are the results (only 85% of this job description had acceptable matches.

plan lessons consistent with state and pepin academies curriculum framework(s)

ensure compliance with school, state, and federal regulations regarding the education of students with disabilities

support pepin academies' mission and vision

observe confidentiality relating to students, teachers, and school

perform minimum supervision

communicate effectively with students and parents to increase student achievement

increase student achievement

participate professional development activities to stay current in best practices for special education

maximize student learning and engagement

present subject matter effectively, using technology where appropriate and available, while using appropriate skills and strategies within the teacher evaluation framework to promote the creative/critical thinking capabilities of students

record keeping, and reporting systems where appropriate and available

manage systems of instruction, record keeping, and reporting systems where appropriate and available

establish standards for acceptance for acceptable student behavior while maintaining a structured and positive classroom environment conducive to learning

maintain standards for acceptance for acceptable student behavior while maintaining a structured and positive classroom environment conducive to learning

participate iep and eligibility meetings with parents and appropriate school and agency personnel

implement all requirements

ensure timely submission of planning notes and lesson plans in accordance with school deadlines and guidelines

supervise teacher assistant in providing instruction for students, as required

provide transition planning for students with disabilities, as required

maintain valid and current florida teaching certificate, adhering to all renewal and professional development requirements as mandated by the florida department of education

3.) Match these primitives with primitives from the core library (that's our proprietary dataset, that is currently only 10% of minimum viable capacity and that's what this example is just for early feedback purposes).

https://preview.redd.it/z962h1bxnfah1.png?width=1565&format=png&auto=webp&s=c103b722ce84f174904090a39a8d455b2b4580d3

As you can see there are a couple of spider maps that plot out various features, such as CL - Cognitive Load. There's also a compensation spread which shows you the range of compensation for the average of all primitive in a job description (also loosely referred by as a packet) and then for each primitive within that client packet. Again, these are values based on what is in our core library (not a client library or a 3rd party library).

Here's just another snapshot of a single primitive's graphical representation:

https://preview.redd.it/th7r07o4sfah1.png?width=1597&format=png&auto=webp&s=6d6dad0f05c871fc0fd76ada47a2f954b9ac8c3e

Implications

This example shows just the client side of things in its early state. For nearly the past year I've been working out the logic, use cases, design, etc., and have really just begun within the past two months to generate results (in the form of data and graphics) for the client, researcher, and developer side of things.

Downstream Uses

Business

Compensation Intelligence Reports and & Heatmapping
Job Architecture & Role Design
Talent Acquisition & Semantic Matching
Workforce Planning & Skills Forecasting
Workflow Simulation & Bottleneck Analysis

Research

Granular Labor Market & Occupational Analysis
Work Design, Cognitive Ergonomics & Worker Outcomes
Comparative & Historical Work Structures
Ground-Truth Data for AI Task Decomposition & Agent Training

Notes

I have not displayed anything beyond column names from the database. If you think this info would be helpful just LMK.

Users tagged:

u/hroptatyr

u/Educational564

u/Thinker_Assignment

u/boring_thinker - The data here would likely sit below APQC data, but I think would integrate well. Thanks for this info BTW. I had never heard of this before you mentioned it. The only work ontology I had heard of was O*NET.

reddit.com

u/trevorpickens1 — 5 days ago

▲ 7 r/semanticweb

Linked data

Hey. I really need graph data/linked data for testing. Large data. Anyone have large datasets or anything I could work with? Promise you some cool visuals :D

reddit.com

u/Evenbetterrnow — 5 days ago

▲ 0 r/semanticweb

record-ontology, please comment

I created an ontology record-ontology for comments. Here it the root.md url

I removed all God-like powers and by doing that I closed (probably) the standing OWL/DL ↔ SKOS/thesaurus split.

https://github.com/commuted/record-ontology/blob/main/ROOT.md

Grok Prompt: (Grok did not write the ontology) Examine https://github.com/commuted/record-ontology, Summarize what are the of most significant parts of the record-ontology, schism closure, i.e standing OWL/DL ↔ SKOS/thesaurus split — formal classes + reasoner vs. associative community concepts , and so forth. Consider the most probable naive objections.

-------------------------------------------------------------------------

Record-ontology (https://github.com/commuted/record-ontology) is a small, domain-neutral OWL 2 DL ontology (v0.4.0 seed) that models the structure of warranted knowledge as built by agents. It focuses on how agents warrant, compose, ground, and revise records — the “connective grammar of knowledge” — rather than the content of any particular domain (electrons, battles, organisms, etc.). Records are never named by world-objects they describe; the ontology stays strictly agent-constrained.

Its permanent namespace is https://www.epistemic-ontology.net/record# (not yet hosted). The conceptual source of truth is ROOT.md; the executable ontology lives in ontology/record-ontology.ttl (Turtle). There are minimal examples, a validation script that runs an OWL 2 RL reasoner (owlrl), and explicit discussion of design trade-offs.

Most Significant Parts of the Ontology

These are the core, non-negotiable elements that give the ontology its distinctive character:

Record as sole primitive class. Everything an agent holds is a Record (at any level of abstraction, in any carrier). Records compose other records (composedOf / partOf, deliberately left non-transitive for OWL 2 DL compatibility). There is no separate Form class.
Warrant triad (hasWarrant): The central integrative mechanism. Fidelity and completeness are entailed by warrant type, not primitive attributes.
- Formal: True in virtue of form (internal, deductive, agent-independent, high-fidelity, approaches form-in-itself asymptotically).
- Empirical (or “given”): True by givenness (defeasible, agent-relative, approaches world-in-itself).
- SelfVerifying (performative/cogito): True in virtue of the act of recording itself. This is a peer of Formal, not a species of it — it reaches the Agent-in-itself (the only non-excluded limit).
Inference as a defined class (not primitive). Inference ≡ Record ⊓ ∃hasPremise.Record ⊓ ∃concludes.Record. It carries InferentialForce (TruthPreserving or Ampliative) and forms a derivation DAG. This is re-derivable by a reasoner, demonstrating the DL approach in action.
Carrier dissolved. No Carrier class. What would have been “carrier” is split into hasProvenance (whence/genealogy) + hasLocus (where/when borne). Infinite regress is halted by the self-verifying warrant (the cogito pattern), not by positing a special entity.
Cogito pattern (illustrated in examples/cogito.ttl): A record that is simultaneously self-verifying, has reflexive provenance, and is self-directed. It is a pattern, not a class or substance. It grounds the ontology without sliding into Cartesian res cogitans.
No metadata layer. metadataOf is a defined role (sub-property of directedToward). Metadata is just another record about a record.
Continuum: The undivided, continuously interacting ground from which carriers are individuated. The single individual TheContinuum is explicitly owl:disjointWith Record. It is the only thing that is not a Record.
Agent-relativity + excluded limits. Every record must be forAgent some Agent. World-in-itself and form-in-itself are commentary only — never instantiated as classes (avoids the “all-knowing observer” position).
Validation & DL hygiene. The scripts/validate.py script checks syntax, runs the OWL 2 RL reasoner, confirms defined-class entailments (e.g., Inference is re-derived), cogito pattern integrity, consistency (Record and Continuum disjoint), and sub-property entailments. Composition is intentionally non-transitive to stay comfortably inside OWL 2 DL.
Plug-in / patchwork model. Domain ontologies and SKOS thesauri are attached via warrant, not absorbed into the core. This is the mechanism that makes the ontology ecumenical in content while remaining strict DL in form.

The Schism Closure: OWL/DL ↔ SKOS/Thesaurus Split

This is the most philosophically distinctive move. The long-standing tension in knowledge representation is:

OWL/DL side: Terms as classes with necessary-and-sufficient conditions, truth-conditional semantics, reasoners for classification, consistency, and entailment. Strong on formal structure and deductive closure.
SKOS/thesaurus side: Terms as concepts related associatively (broader, narrower, related) by community convention and curatorial practice. Weaker on formal entailment, stronger on indexing, navigation, and defeasible, ampliative relations. SKOS itself is OWL Full.

Many projects treat this as a framework choice or try to import/align SKOS directly (which immediately exits OWL 2 DL).

Record-ontology closes the schism by demoting it from a framework/kind distinction to a warrant attribute on records:

A record (or patchwork of records) with Formal warrant behaves like the DL side: internal form, deductive joints, reasoner-usable subsumption, high fidelity, internally completable.
A record with Empirical warrant behaves like the SKOS/thesaurus side: community-given, associative/curatorial edges, defeasible, ampliative, never fully closed.
Real knowledge work is almost always patchwork — formal at some joints, conventional/associative at others. The ontology models this directly as mixed-warrant records within a single Record web.

Consequences of this closure:

The ontology itself stays DL-in-form: defined classes, disjointness, reasoner validation, no SKOS import.
It is ecumenical-in-content: it can represent or attach both formal hierarchies and associative community vocabularies without forcing one paradigm on the other.
The divide is no longer reified as an ontological opposition. It becomes a choice of warrant (or a fork between agent-communities with different default warrants).
It offers a coherentist web with one foundationalist anchor: most knowledge is conventional and never fully formally completable, but the self-verifying warrant (cogito pattern) provides a non-regressive starting point without positing an all-knowing observer or substance.

In short: formal classes + reasoner on the formal-warrant side; associative community concepts on the empirical-warrant side; unified lightweight grammar that lets both coexist as records an agent holds.

Most Probable Naive Objections

These are the objections one would expect from different communities (strict DL engineers, SKOS/thesaurus practitioners, applied ontologists, philosophers of knowledge representation):

“Why not just import or directly align with SKOS (or PROV-O, CIDOC-CRM, etc.)? This reinvents wheels.” Importing SKOS makes the whole thing OWL Full and loses decidability/reasoner support. The design deliberately keeps the core minimal and DL-clean so that both formal and associative artifacts can be plugged in via warrant rather than absorbed. It is not competing with those vocabularies; it is offering a lower-level grammar for how they are warranted and composed.
“This is overly philosophical/abstract (cogito, Continuum, exclusion of thing-in-itself). How is it useful for practical data or ontology engineering?” The abstraction is the point: a domain-neutral connective tissue that does not pre-commit to any domain content or force a single representational style. The minimalism (one primitive class, defined classes only where needed, dissolved carrier) prevents the usual bloat. Examples show concrete use (derivation DAGs for historical narrative, cogito grounding). The validation script proves the DL machinery actually works.
“The warrant triad feels arbitrary or ad hoc. Why three values, and why give SelfVerifying equal status?” The triad is motivated by the need to halt regress without reintroducing a privileged vantage or substance metaphysics. SelfVerifying reaches the Agent-in-itself (given to itself), which is not excluded like world-in-itself or form-in-itself. It is a peer of Formal because both are high-fidelity and internally completable in their own register; Empirical is the defeasible counterpart. The design explicitly rejects sliding from “I record” into “I am a complete thinking substance.”
“Excluding world-in-itself and form-in-itself is anti-realist or prevents modeling correspondence/truth.” It is not anti-realist; it refuses the sicut deus (god-like observer) position that would certify correspondence from outside all records. Records approach the limits asymptotically via directedToward + warrant. Empirical warrant handles defeasible givenness; Formal warrant handles deductive structure. The ontology simply acknowledges finitude: “we will not become all-knowing.”
“Leaving composition non-transitive and non-monotonic propagation (fidelity, forks, stubs) to an external computational layer makes the ontology incomplete.” This is acknowledged in the repo as an open item. The core deliberately stays inside OWL 2 DL for decidability and reasoner support. Non-monotonic revision and full propagation semantics are real requirements but require an additional layer on top of this grammar. The ontology provides the stable DL foundation; it does not pretend to be a complete knowledge-revision engine.
“A single primitive class is too restrictive. Real ontologies need rich class hierarchies from the start.” Richness is meant to emerge from attributes (hasWarrant, directedToward, pragmaticAdequacy, etc.), defined classes (Inference), composition into patchworks, and the plug-in of domain content. The “no primitive kinds” discipline prevents prematurely reifying divides (formal vs. associative, metadata vs. data, etc.) as structural classes. It keeps the ontology lightweight and extensible rather than over-committing early.

Overall, the ontology is a deliberate, philosophically informed minimalism that treats the OWL/DL ↔ SKOS schism as solvable inside a unified record model rather than as an irreconcilable framework war. It is still early-stage (seed, open items noted), but the design is coherent and the DL hygiene is solid. The ROOT.md file is essential reading for the full rationale.

u/thecommuted — 6 days ago

▲ 10 r/semanticweb

I’m building a VS Code extension for RDF/SHACL/JSON-LD and would appreciate feedback

Hi everyone,

We are working on RDFusion, a VS Code extension for RDF editing, validation, SHACL, vocabulary suggestions, Triple Management, and JSON-LD processing.

We have prepared a small user evaluation with guided tasks and sample DCAT/DCAT-AP-based files. It should take around **35–45 minutes**, and we would really appreciate feedback from anyone who works with RDF, Turtle, JSON-LD, SHACL, or semantic web tools.

Your feedback would help us understand whether RDFusion makes RDF editing easier, clearer, or faster in realistic workflows.

Evaluation form: [google_form_link]

Dataset/fixtures: [dataset_link]

User manual/install instructions: [user_manual]

Feedback on any part is welcome, even if you only try one or two scenarios. Comments about confusing parts, missing features, unclear diagnostics, or workflow issues would be especially helpful.

Thank you!

u/Smooth-Sun-1127 — 11 days ago

▲ 2 r/semanticweb

What tools/solutions are organizations using to solve the "semantic/ontology/context" issues?

Hi All - I am researching tools in this space for AI and Analytics use-cases but don't see any clear winners. Curious what others are using or have evaluated.

reddit.com

u/kmrinva — 11 days ago

▲ 0 r/semanticweb+1 crossposts

AI Context Should Be a XanaNode Substrate

Every AI chat today works like a goldfish. It only remembers whatever fits inside a rolling context window. Once the window moves, the AI forgets. This is why hallucinations happen and why companies keep building expensive RAG systems and knowledge silos to patch the same problem.

A better approach already exists. It is a XanaNode substrate.

In XanaNode, every message in a conversation becomes a real node. For example:

- **event** for the timestamp of the message

- **person** for who said it

- **claim** for what was said

- **fragment** for the exact text

- **knowledge_gap** if the user expresses uncertainty

- **question** when the user asks something

- **response** when the AI answers

And the relationships between them are typed and explicit:

- **supports**

- **contradicts**

- **derived_from**

- **explains**

- **transcludes**

- **possible_match**

This means the conversation is not a blob of text. It becomes a structured knowledge graph with provenance and lineage.

If AI used this as its context, it would not need to re‑read the entire conversation every time. It would not lose track of what happened earlier. It would not need a bigger token window. It would query the substrate the same way humans query memory.

This does not make hallucinations disappear. LLMs are still next‑token predictors. But it gives them grounding, structure, and a real memory system instead of a sliding text buffer.

Companies are spending billions on RAG, vector databases, and “show your sources” audits. All of this is trying to fix the same missing piece. AI needs a real memory architecture.

A XanaNode substrate is that architecture.

reddit.com

u/SiefensRobotEmporium — 13 days ago

▲ 7 r/semanticweb

I’m seeking feedback on Record Harm Ontology — a small, focused OWL 2 DL ontology that models how informational records can be ontologically damaged.

Repository: https://github.com/commuted/record-harm-ontology
Current version: v2.3 (just fixed to full OWL 2 DL compliance)

Overview

The ontology provides a taxonomy of ontological harms to records, distinguishing:

Prime Harms (5 irreducible attacks on the being of a record): Destruction, Fabrication, Alteration, Omission, Denial.
Composite Harms (7 derived harms built via ex:buildsUpon relations).
Record aspects attacked (Existence, Authenticity, Integrity, Accessibility, Context, Trustworthiness) modeled as a SKOS scheme.
Supporting features: HarmPattern for empirical co-occurrence bundles, SHACL shapes for validation (separate from the ontology), controlled SKOS vocabularies for detectability/reversibility.

Design goals: Keep it lightweight yet rigorously reasoned, with clear documentation of modeling decisions, version history, and trade-offs.

Key Questions for Feedback

Prime / Composite split — Does the distinction and the specific assignment of harms (especially Suppression → Omission and the promotion of Denial to prime) hold up ontologically?
buildsUpon modeling — Asymmetric + irreflexive (no transitivity, per OWL 2 DL constraints) + cardinality rules via SHACL + SPARQL paths. Reasonable compromise?
Scope — The core is harm types. We previously had a HarmEvent layer but are considering removing it to keep focus on the taxonomy. Thoughts on whether this is the right boundary?
Any glaring modeling issues or opportunities for better alignment with existing work (PROV-O, OAIS, archival ontologies, etc.)?
General polish — Namespace (currently example.org placeholder), documentation, or other suggestions?

The repo includes ontology Turtle, SHACL shapes, examples, architecture notes, and validation scripts. All feedback welcome — conceptual, technical, or usability.

Thanks in advance!
Ron Hinchley

u/thecommuted — 14 days ago