I asked GPT to recreate The Great Wave off Kanagawa as a photograph. Here is why the obvious prompt fails.
Listen, I test AI tools so you don't have to. PM by day, tool hunter by night. Over the last week, I've been watching this trend blow up where people ask ChatGPT to turn classic art—specifically Katsushika Hokusai’s "The Great Wave off Kanagawa"—into photorealistic images.
Sounds simple. You upload the image, type a quick prompt, and get a masterpiece. But if you've actually tried this workflow, you know it fails instantly.
Tested it, here's my take. The way ChatGPT (now running GPT-5.3 and the new GPT Image 1.5 engine) handles image-to-image translation is fundamentally broken if you don't understand how the model anchors to semantic concepts.
Let me break this down.
**The Lazy Prompt Trap**
When I first tested this, I used the exact prompt that is currently making the rounds on Reddit. It’s what 90% of people naturally type when they want to change an image's style:
> "Redraw this painting, keeping the same proportions and overall colorings and all, but make it as though it's a beautiful hyper-realistic photograph."
What did ChatGPT output? A stunning, high-resolution, perfectly lit photograph... of a woodblock print. It gave me the texture of the paper, the slight fading of the Prussian blue ink, and the flat dimensions of the original artwork.
It failed to translate the scene. It only translated the object.
This happens because of how ChatGPT writes the underlying system prompts for its new image generator. Ever since OpenAI deprecated DALL-E 3 a few days ago and switched entirely to GPT Image 1.5, the model operates with aggressive literalism. When you say "redraw this painting," the LLM locks onto the concept of a "painting" as the primary physical subject. It doesn't view your uploaded image as a window into a world; it views it as a physical artifact.
**The Pivot: Forcing the Ontological Shift**
Here's what most people miss when they try to transform sketches or reference art into photorealism. You cannot ask the AI to change the style of the object. You have to explicitly instruct it to change the reality of the scene.
To get the actual photorealistic Great Wave—with terrifying, freezing ocean spray, splintering wooden boats, and a distant, snow-capped Mt. Fuji—you have to forcefully rip the model out of its art-history latent space.
Here is the exact workflow and prompt adjustment that works:
> "No, I want it as a photograph, not a painting. Like a hyper-realistic photo of an actual ocean wave, with real wooden boats caught in the swell, the mountain in the background, keeping the exact same composition but making it a real-world scene."
Boom. The shift is immediate. But why does this specific phrasing work while the first one fails?
**1. Divorcing Subject from Medium**
Notice the phrase "not a painting." Conversational prompting in GPT-5.3 responds instantly to negative ontological corrections. By stating what the object is not, you force the underlying text model to strip words like "canvas," "woodblock," "ink," and "art" from the final parameters it feeds to the image engine.
**2. Describing Physics, Not Aesthetics**
The lazy prompt asks for "proportions and colorings." The winning prompt asks for "wooden boats" and an "ocean wave." If you want reality, you have to prompt with physical materials. Wood, water, snow, sky. When you use art terms, GPT Image 1.5 generates art. When you use physical nouns, it generates reality.
**3. The Hidden Prompt Mechanic**
Every time you ask ChatGPT to make an image, it writes a highly detailed paragraph behind the scenes. If you tell it to "make this painting realistic," its hidden prompt will look like: *A realistic photograph of a 19th-century Japanese painting...*
You have to override that automated captioning. You are essentially fighting the LLM's instinct to describe the file you uploaded.
**Why This Matters Beyond Hokusai**
I see product managers and designers hit this exact wall constantly. You sketch a wireframe on a whiteboard, snap a photo, and ask GPT-5.4 to "make this into a high-fidelity UI mockup." Half the time, it spits back a hyper-realistic digital render of a whiteboard with better markers.
Or you upload a flat logo and ask for a 3D version, and it gives you a 3D photo of a piece of paper with a flat logo printed on it.
The failure point is identical across the board. I tested this exact logic on Salvador Dalí's *The Persistence of Memory*.
Ask for "The Persistence of Memory as a photo," and you get a canvas in a gallery.
Ask for "A hyper-realistic landscape photo of actual melting clocks draped over dead olive trees on a real desert beach," and you get cinematic magic.
**The Local Alternative**
For those of you running local models or jumping into the new Midjourney V8.1, the logic is similar but the execution differs. Midjourney V8.1 just dropped a few weeks ago with its new HD 2K output, and it handles the semantic leap slightly better if you use image weights correctly. But honestly, for rapid prototyping, ChatGPT is far more accessible if you just nail the text. You don't need to tweak a hundred parameters; you just need to know how to talk to the machine.
Stop asking AI to act like a Photoshop filter. Start asking it to act like a camera pointing at a parallel universe.
The next time you use an image prompt, remember that the AI doesn't know the difference between a picture of a pipe and the pipe itself. You have to tell it which one you want.
Has anyone else noticed GPT Image 1.5 getting brutally literal with image references lately? What’s your go-to prompt structure for forcing these models out of their stubborn literalist phase? 🔍