u/Apart_Sprinkles_8504

I revamped an old project of mine to make it evaluate hallucination of the notable LLM Models that are deployed and oh my god.
▲ 2 r/LLM

I revamped an old project of mine to make it evaluate hallucination of the notable LLM Models that are deployed and oh my god.

when I was in Undergrad, I made a boolean algebra, which had simple logic of taking the variables (Distinct Alphabets) as inputs, turning them into boolean numerals. and then generate a truth table that was relevant.
And with that truth table solve the boolean expression that was a simple string notation.

I was exploring my github Repo randomly and found my old projects, and found out only this one has 2 stars in it which made me kinda proud (Because no other repo has any stars.)

So I used claude code to see what can we do with this project.

And randomly I had a brainwave to check if we can plug it into AI LLM Models to see if the slop is real (Since boolean logic needs to be at scale).

So I repurposed the base code of Java to Python, experimented with few use cases and I built a deterministic boolean algebra engine that evaluates expressions by exhaustive truth table enumeration, cross-verified with z3. Then I used it as an oracle to benchmark tinyllama and llama3.2:3b on satisfiability questions — can two rules ever be true simultaneously?

Both scored 50%. Coin flip. But the failure mode is the interesting part:

- tinyllama always answered "yes" — constant output

- llama3.2:3b always answered "no" — constant output

Neither model is reasoning case by case. They're outputting a prior. The per-case strips in the chart make it obvious — uniform colour across every case, no variation.

This isn't a small model problem. It's a architectural one. Transformers aren't built to enumerate truth tables. The right fix isn't a bigger model — it's a deterministic layer that does the computation the model can't.

!pip install boolean-algebra-engine

https://preview.redd.it/f53wiahvmw2h1.png?width=2161&format=png&auto=webp&s=261c3f743a8c5f593a16ea0c0fb0c515a62914ad

Repo + benchmark: Check out here

https://preview.redd.it/1jv2woa3bx2h1.png?width=2292&format=png&auto=webp&s=9a9819db0a5c7a72aa9ebbbbe8c06227bd035dee

Update 1: Since I got bashing on other sub. Here's a test on Gemma

reddit.com
u/Apart_Sprinkles_8504 — 4 hours ago
▲ 5 r/ollama+1 crossposts

I benchmarked tinyllama and llama3.2:3b on boolean logic. Both scored 50% — coin flip. Here's the proof.

I was curious how well local models handle boolean logic — not code, just pure logical reasoning. Can two rules conflict? Is this expression satisfiable?

I built a deterministic engine that evaluates boolean expressions by exhaustive truth table enumeration. It's the oracle — ground truth computed, not guessed, cross-verified with z3. Then I asked tinyllama and

llama3.2:3b the same questions.

Both scored 50%. Coin flip. But in opposite directions:

- tinyllama always answered "yes" — missed every conflict

- llama3.2:3b always answered "no" — missed every compatible pair

Neither model is reasoning. Both are outputting a constant.

The engine column is ground truth. Every mismatch with llm is a provable hallucination — not an opinion, not a benchmark score, a logical proof.

!pip install boolean-algebra-engine

Repo and benchmark script: Check Here

https://preview.redd.it/ftzyrz4pdw2h1.png?width=2161&format=png&auto=webp&s=cc253cb875efa2538eee859aec003d555f95fa81

Update 1: Since I was getting beaten, I also have tested on Gemma 😭. It resulted in 36.4% hallucination rate. Guys go check your favorite models

reddit.com
u/Apart_Sprinkles_8504 — 5 hours ago
▲ 375 r/HomeLabPorn+3 crossposts

Posting what I built Here since I don't know what else to do, genuinely proud of myself for building it though

I simulated a basic enterprise connection between two sites.
1 Site with one switch for demonstration of Single Point of Failure

and site 2 with 2 switches and additional server for DNS.
The whole point of the exercise is to simulate a controlled environment for multiple concepts.
Learnt to configure DHCP
Routing and Debugging the routing table
NAT
Routing between switch<->Router<->Firewall<->Endhosts.

Only things I need to learn more are VPN Tunnelling to connect the two seperate subnets, then i will also learn OSPF, VLans. There's just so much to learn. And I couldn't be more excited.

u/Apart_Sprinkles_8504 — 35 minutes ago