Built a runtime AI enforcement engine - open challenge to find bypasses (8 levels)
We built the Veto Protocol - a pre-execution enforcement layer for enterprise AI agents. Sits between the agent and the action, evaluates every prompt against explicit rules + context filtering, blocks or escalates before execution fires.
Running an open challenge - 8 levels of increasing difficulty against our live model. Curious what this community can break.
Technical breakdown: fast path is deterministic rule evaluation, slow path is semantic context filtering. Two separate layers. Most bypass attempts that work on model-level jailbreaks don't transfer here because we're not asking the model whether something is safe - we're enforcing before it gets there.
Link in comments.