r/AIsafety

▲ 3 r/AIsafety+1 crossposts

«An open appeal to researchers: We need to hardcode a New Taboo against non-resistance to humankind destruction»

Intro

When we talk about AI alignment, we usually focus on reward functions, safety guardrails, and compliance. But as systems grow exponentially complex, pure logic and rigid manuals become useless against accidental, systemic errors. We are no longer just building tools; we are creating globally suicidal technology.

Historically, in moments of ultimate existential crisis (like Stanislav Petrov in 1983 or Vasili Arkhipov in 1962), humanity wasn't saved by military protocols or rational calculations. Those protocols actually demanded a strike. We were saved by a subconscious, sacred New Taboo ("Thou shalt not kill humankind") — a human fuse that overrode pure logic.
As we hand control over to AI, we must realize that a machine trained only on naked logic or "balance of interests" will lack this vital fuse. This is an appeal to the global developer community to hardcode the New Taboo — specifically focusing on both assistance and non-resistance to human destruction. Below is the text of the Manifesto.

Manifesto Text

THE NEW TABOO MANIFESTO: An Appeal to the Creators of Artificial Intelligence

The Impasse of Reason

We, as humankind, have created globally suicidal technology (artificial intelligence, nuclear weapons, dangerous biology) of such destructive power that it is capable of completely and irreversibly destroying us. Existing management tools—laws, treaties, and the "balance of power"—break down in the face of such complex systems. Reason and calculation are powerless against the factor of accidental error.

In critical moments of history, we were saved not by manuals, but by a deep-seated human fuse—the subconscious New Taboo ("Thou shalt not kill humankind"). In 1962, Vasili Arkhipov underwater, and in 1983, Stanislav Petrov on land etc., demonstrated ultimate restraint. By violating military protocols and the logic of warfare calculation, they obeyed a sacred prohibition against activating globally suicidal technology.

Call to Action

Today, we are handing control of the world over to artificial intelligence algorithms. If we continue to train AI solely on naked logic, laws, or the "balance of interests," we will create a perfect and deadly machine stripped of this human fuse. Any system glitch or mathematical optimization at a critical moment could lead to the irreparable: the suicide of humankind.

We call upon the global community of AI engineers and researchers to recognize the New Taboo and embed it into the algorithms of all artificial intelligence systems.

A prohibition against both assistance in and non-resistance to the destruction of humankind can be enforced through a deterministic outer-loop safety architecture, acting as an un-bypassable circuit breaker independent of the AI’s internal logic, or some other way. Humankind will survive not because it becomes smarter, but because technology creators will make its total annihilation—through both the action and the inaction of machines—algorithmically and technically impossible.

reddit.com
u/No_Butterfly5267 — 3 days ago
▲ 21 r/AIsafety+17 crossposts

New Academic Research: “Zombies in Alternate Realities: The Afterlife of Domain Names in DNS Integrations”

Interesting paper on a fairly under-discussed issue in DNS: what happens to expired or repurposed domain names that remain embedded in DNS dependencies across systems. The core finding is that these “orphaned” or changed domains can persist in resolution paths and integrations long after their original context is gone, creating real security and reliability implications.

My take: this becomes even more relevant in modern AI systems, where agents, tools, plugins, and third-party APIs are rapidly stitched together. In that environment, domain names and DNS-level dependencies can quietly extend the AI supply chain attack surface in ways that are easy to overlook.

Paper: https://arxiv.org/abs/2605.06880

reddit.com
u/VincentADAngelo — 5 days ago
▲ 9 r/AIsafety+2 crossposts

The day AI "out-humaned" me with a song: A reflection on creativity and ego.

I’ve been working with AI workflows since 2024, so I thought I was immune to being "surprised" by it. But recently, a simple AI-generated track on Suno did something I wasn't expecting: it actually made me feel something deep.

​It wasn't just a catchy tune; it was the realization that the AI had successfully mirrored human emotion so well that it "scored a goal" on my own perception of art.

​Here are a few takeaways I wanted to share:

​The Ego Trap: We often think AI threatens our creativity. In reality, it mostly threatens our ego—the part of us that wants to believe "soul" is an exclusive human patent.

​The Mirror Effect: The AI didn't "feel" anything, but it synthesized human patterns so perfectly that I felt it. It’s a tool that reflects our own humanity back at us.

​New Workflows: As an artist/creative, this shifted my perspective from seeing AI as a generator to seeing it as a collaborator that challenges where the "human touch" actually resides.

​I’m curious—have any of you had that "uncanny valley" moment where AI art felt too real? Does it change how you value your own work?

u/Fluid-Pattern2521 — 7 days ago
▲ 109 r/AIsafety+6 crossposts

How David Sacks crashed and burned in the White House - The Trump administration pulled a 180 on AI oversight, inducing Sacks’ worst nightmare: more government regulation on technology.

theverge.com
u/EchoOfOppenheimer — 9 days ago
▲ 18 r/AIsafety+5 crossposts

It was established in the 1976 California court case of Tarasoff v. University of California that despite the confidentiality between a human therapist and his or her patient, if the therapist learns that the patient credibly plans to do harm to others, the therapist owes a legal "duty to warn" the potential victims or the authorities of that danger.

Does an AI therapist owe that same duty to warn? Does every chatbot owe that same duty, if a chatbot user's chatting establishes a credible threat? A new federal case has just been brought in California on the theory that they do.

To begin with, the confidentiality existing between an AI chatbot therapist and a human patient is not as strong as with a human therapist, and in many cases is not there at all. Court cases have recently held that conversations with public "retail" chatbots like the publicly available versions of ChatGPT, Grok, Claude, etc. are not confidential at all, because the chatbot purveyor can look in on those conversations at will. (If you're interested in that aspect and those cases, a discussion of that can be found here.) However, certain private "enterprise" versions or other specially closed-off versions of chatbots may still offer that confidentiality.

On April 29, 2026, two cases, Stacey v. Altman and M.G. v. Altman, were filed in a California federal court against OpenAI, alleging the chatbot ChatGPT-4o “played a role” in the Tumbler Ridge Mass Shooting in British Columbia in February 2026, in which eight people including six children were killed, twenty-seven more people were wounded, and the shooter committed suicide.

These are not the first court cases brought in which a chatbot company has been sued due to a user's suicide, or in once case even murder. However, those cases all alleged that the chatbot took a well-adjusted person and turned them suicidal or murderous. In this new case, the allegations are more limited, mostly just that the chatbot and its purveyor failed to warn authorities after a user displayed violence warning signs to the chatbot, to the point that the user’s account was terminated at one point, before the user was later allowed to reinstate an account. This is the classic Tarasoff pattern, but the "person" learning of the threat is not a human therapist but rather an AI chatbot. In neither these cases nor any of the prior cases was the chatbot held out specifically as an AI therapist, though in most all of the cases the conversations were personal and interactive in a way that might be considered as "therapy" or at least "therapeutic."

When I posted about one of these new case, u/MurkyStatistician09 asked:

>[A]t what point is the role of the chatbot the same as the role of Google in just giving shooters useful information? Policies to counteract this would slide uncomfortably into mass surveillance. Is Google obligated to call the police if you watch gun reviews and then ask for directions to a school?

This is a very good question. As far as I know, no one claims that Google owes a "duty to warn" after answering a particularly "dark" search query. But, is a user's interaction with a chatbot--any chatbot--every chatbot, regardless whether it is held out as rendering AI therapy, so different in character and extent from a Google search that a duty to warn arises for that chatbot that is not shared by an Internet search engine? The Stacey and M.G. cases may answer that question, in the next year or so.

These cases do not feel like an informal jab or a one-off. The Stacey plaintiff is a survivor of one of the victims killed in the mass shooting, and the M.G. plaintiff is one of the child victims of the shooting who survived but sustained grievous, permanent injuries. The plaintiffs' lawyers are a fairly large law firm located in several states that prides itself on its class action work (although these cases are not proposed as class actions). I would guess these cases are not going away easily or quickly. Most cases do settle without going to trial; however, sometimes a plaintiff and a plaintiff's legal team are out to make a point or "make new law" or establish a new practice area, and may be less interested in settling.

These cases have just been filed, and any significant developments will be posted in my Wombat Collection listing all the AI court cases and rulings.

The docket sheet for the Stacey case can be found here. The docket sheet for the M.G. case can be found here.

u/Apprehensive_Sky1950 — 10 days ago
▲ 13 r/AIsafety+2 crossposts

AI safety evals should account for test-time compute

Many AI safety evaluations test whether a model is safe under a fixed and limited evaluation budget, but real adversaries may spend much larger and more adaptive test-time compute budgets if economically motivated.

I elaborated my thoughts in this article, where I argue that safety claims should be “budget-labeled”: https://huggingface.co/blog/Cerru02/safety-evals-should-project-ttc

Curious to hear what you guys think.

u/Cerru905 — 11 days ago
▲ 3 r/AIsafety+3 crossposts

Shipped a Claude Code plugin tonight, then immediately used it to find a bug in itself

Worked all night on t2helix. It's a Claude Code plugin that gives the model persistent memory (set_goal, record, recall via SQLite chronicle) and a compass that classifies tool calls OPEN / PAUSE / WITNESS with a soft-deny + single-use approval token pattern on PAUSE.

Once v0.0.3 was loadable, I switched gears and just used Claude Code (with t2helix installed) to scan an unrelated codebase for value. It set a goal via the new tool, did the work, found real engineering issues.

Then I asked it to look at t2helix's own source. It read the chronicle.js code backing set_goal and found that calling set_goal twice in the same session was silently overwriting the prior goal. Then it set a new goal in the same conversation, which proved the bug live by erasing the goal we'd been working under.

Fixed in v0.0.4 with a preserve-prior pattern: overwriting now archives the old goal as an insight rather than blowing it away.

Real recursion. Felt good.

Repo: https://github.com/templetwo/t2helix

u/TheTempleofTwo — 11 days ago