u/Boydbme — reddlx

▲ 970 r/artificial+1 crossposts

I made an agentic "Daily Brief" for my kids with a receipt printer

What it does: Agents gather and curate data and send to a wifi-enabled receipt printer (phenol-free paper)

At 1:00am a cron triggers generation of data for all 3 kids (unique data sources per kid where applicable).
A sidecar web service renders the data to templates, screenshots it, converts it to 1-bit with dithering and saves it back to the agent’s thread filesystem.
Button presses (one per kid) then find a matching report for today's date (and trigger a generation if it's missing for some reason) and send it to the printer. Delay between button press and print is between 2-5 seconds.

Morning daily briefs per kid at the press of a button! Fun, and the kids love it!

(This demo print is using mock child data — not real information).

u/Boydbme — 9 days ago

▲ 957 r/aiagents

I made an agentic "Daily Brief" for my kids with a receipt printer

What it does: Agents gather and curate data and send to a wifi-enabled receipt printer (phenol-free paper)

At 1:00am a cron triggers generation of data for all 3 kids (unique data sources per kid where applicable).
A sidecar web service renders the data to templates, screenshots it, converts it to 1-bit with dithering and saves it back to the agent’s thread filesystem.
Button presses (one per kid) then find a matching report for today's date (and trigger a generation if it's missing for some reason) and send it to the printer. Delay between button press and print is between 2-5 seconds.

Morning daily briefs per kid at the press of a button! Fun, and the kids love it!

u/Boydbme — 11 days ago

▲ 51 r/aiagents

I made a agentic "Daily Agenda" receipt printer for my kids.

Weekend project. AgentBuilder + wifi receipt printer (phenol-free paper).

1:00am cron triggers data generation.
Sidecar service renders and saves to agent’s thread filesystem.
1 button trigger per kid.

Daily briefs for my kids at the press of a button!

u/Boydbme — 11 days ago

▲ 2 r/codex

My coworkers and I have been long-time users of Codex and Claude Code and are getting a ton of exposure to other models due to our deep dives into authoring agents. One thing we've learned is that the current benchmarks that exist are terribly unreliable as barometers of model quality. Every model that comes out ships with bar charts claiming that it's the best that ever was within its category.

In response to this we're launching VibeBench, a benchmark that relies on feedback from real engineers doing real work with new model releases to create a relative strengths and weaknesses report based on — vibes.

We have unironically found vibes to be the best indicator of real-world usefulness in models. As an example, despite GPT-5.5 scoring very well across benchmarks I don't feel it's the best option for front-end UI. however, I wouldn't dare use Opus-4.7 to try and produce complex system architecture over GPT-5.5. The current benchmarks don't surface this nuance, you have to rely on the vibes you get by reading endless social media posts or your own scars.

Our hope is that together we can crystalize this amorphous group consensus that naturally emerges over time into a new type of benchmark that answers "How will this model actually feel to use for ______".

To do it we need your help. here's how it works:

We need to gather an initial cohort of 1000 qualified software engineers.
Groups of 250 are assigned and evaluate new models for 2 days on real-world workloads.
Participants subjectively rank the new model relative to other models they have experience with.
On day 4 a report is released with objective results derived from the subjective tests

How can you help:

We all need this benchmark to exist, but for it to become reality, we need an initial cohort of 1000 qualified software engineers. If that’s you, please join: https://vibebench.standardagents.ai
Share this initiative with everyone on your engineering teams. Together we can make this benchmark a reality for all of us.

u/Boydbme — 24 days ago