u/MauschelMusic

Could you use a prompt like "never hallucinate" to trigger aberrant AI behavior?

I've been thinking about the infamous Marc Andreesen prompt where he shows off how he doesn't really understand what AI is, and thinks it's some kind. of wishing machine. Anyway, he uses a lot of instructions like "never hallucinate," and "You are a world class expert in all domains," that are basically prompting the AI to be better than it is and can't possibly lead to anything useful, or point it towards anything it knows how to do.

I read a study here about how small amounts of data attacking a particular string could compromise an AI, even if they form a miniscule proportion of training data, and was wondering if these sorts of wishcasting strings might be good targets.

Triggering massive hallucinations on the string "never hallucinate" would be incredibly funny.

Just spitballing. Feel free to let me know if this is dumb or unworkable.

reddit.com
u/MauschelMusic — 5 days ago
▲ 510 r/PoisonFountain+2 crossposts

Google AI just trained on five years of satirical research papers I host

I host a satirical research journal and I noticed in the last two nights I had huge spikes of views from a single visitor in the middle of the night from Council Bluffs Iowa crawling through all my web pages. I’m pretty sure the AI just ingested over five years of made up science that looks like real research

u/CopiousCool — 5 days ago