u/FitAlternative3903

Hey everyone,

I found an interesting technique to prevent the Hermes Agent from deleting files without a specific codeword (e.g. 23879fd2u9jd2j).

Instead of simply telling the agent "only delete files if you have the codeword", I instructed it to create a Python script that stores the hash of the codeword — not the codeword itself. On top of that, I told it to build a skill that triggers on every deletion request: it asks the user for the codeword, hashes the input, and compares it against the stored hash — just like a standard login flow.

It actually worked. The correct codeword was accepted, a wrong one was rejected. ✅

The catch:

The Hermes Agent can search through old sessions. If it does, it can retrieve the original codeword I gave it during setup and potentially expose it to an attacker — or anyone who asks.

I don't have a fix for this yet, but I still think the approach of using a Python script with hashed credentials (like any standard login system) is a step in the right direction. Maybe someone here has an idea on how to handle the session memory issue?

reddit.com
u/FitAlternative3903 — 24 days ago