An Auditing Protocol for Human-AI Sessions: HTML Test to Measure Clarity, Coherence, Emphasis, and More
Sharing a protocol I developed for auditing co-creation sessions with language models (LLMs). It's a single HTML form, no external dependencies, designed to evaluate both model performance and user experience.
Why this might be relevant
In long interactions, conversation quality tends to fluctuate. Sometimes the model loses the thread, shifts its tone, or drifts from the initial goal, and it's not always clear whether it's a technical failure or an effect of the session dynamics. This test offers a systematic way to track it.
What it measures
· Model (3C+1E): Clarity, Compactness, Coherence, and Emphasis (fidelity to the goal declared at the start of the session).
· User (SSJ): Speed (whether the session flows or stalls), Struggle (cognitive cost), and Joy (whether the interaction feels rewarding).
· Conversational ruptures: where and why the interaction broke, and how (or if) it recovered.
· Regulatory checks: flags potential violations of the EU AI Act's Article 5 (manipulative techniques, exploitation of vulnerability) and cross-platform contamination.
An unexpected finding
In tests with three different models performing the same task (translating an essay into native English), the data showed that:
· The Joy metric stayed at 0 in all cases, even when the technical outputs were solid.
· The main source of drift was cross-contamination: feeding one model's outputs into another destabilised the sessions.
· The model that received the most initial trust (and thus the heaviest workload) scored the worst — a bias the test helps identify.
The deferred phase
The protocol includes an optional phase 24 hours later: the results are shared with the model and analysed together. This second look often reveals patterns that went unnoticed in the heat of the session.
In summary
· Compatible with any LLM (local or API).
· Quick to complete (5–10 minutes after a session).
· Exports data as JSON for longitudinal tracking.
· Licensed CC BY 4.0, completely free.
Link to the test: https://doi.org/10.6084/m9.figshare.32320875
The file includes the HTML form and a User Guide. This is a Beta version (v3); feedback is welcome from anyone who works intensively with LLMs and wants to try it under real conditions.
analysed together. This second look often reveals patterns that went unnoticed in the heat of the session.
In summary
· Compatible with any LLM (local or API).
· Quick to complete (5–10 minutes after a session).
· Exports data as JSON for longitudinal tracking.
· Licensed CC BY 4.0, completely free.
Link to the test: https://doi.org/10.6084/m9.figshare.32320875
The file includes the HTML form and a User Guide. This is a Beta version (v3); feedback is welcome from anyone who works intensively with LLMs and wants to try it under real condition