OpenAI is turning ChatGPT into a full-blown creative suite. Starting today, users can generate images directly inside ChatGPT—no plugins, no external tools. Just type your prompt, and boom, visuals. The feature, aptly dubbed “Images in ChatGPT,” is now live across all tiers: Plus, Pro, Team—and yes, even free.
The new image model is built on GPT-4o, OpenAI’s “omnimodal” architecture that can handle text, images, audio, and video. This isn’t a minor upgrade—it’s a foundational shift. And while it won’t replace DALL·E just yet, it’s definitely gunning for its spot.
So What’s the Big Deal?
Accuracy, for starters. One of the biggest complaints about AI image generators? They’re great—until they aren’t. Mix up a few objects or throw in some text, and most models fall apart. Not this one.
OpenAI’s research lead Gabriel Goh says the model dramatically improves “binding”—that’s AI-speak for not messing up basic relationships between objects. Ask for a red triangle next to a blue star, and you’ll get exactly that—not a purple mess or a geometric guessing game. Goh claims the system can juggle up to 20 distinct objects while keeping things accurate. That’s a big leap.
And yes, text rendering finally works. You know how AI-generated text on images usually looks like your keyboard glitched mid-sneeze? This system fixes that—mostly. Small fonts still trip it up, but overall, you can now get usable posters, labels, menus, and comics without cross-checking every word.
Under the Hood
Unlike diffusion models (like DALL·E) that generate images all at once, this new system goes left-to-right, top-to-bottom—like writing a sentence. It’s slower, sure, but that structure seems to be the secret sauce for getting text and object relationships right.
In demos, OpenAI showed off scientific diagrams, multi-panel comics, sticker designs, and even restaurant menus—complete with coherent text and clean layouts. One standout: a fully labeled diagram of Newton’s prism experiment, created from a single prompt.
Latency vs. Quality
The tradeoff? It’s slower. Image generation takes a few seconds longer than DALL·E. But OpenAI thinks the boost in quality and reliability is worth the wait.
“The capability, the world knowledge, really makes up for the additional seconds,” said Jackie Shannon, product lead for ChatGPT’s multimodal features.
Guardrails Included (Mostly)
With AI-generated visuals under increasing scrutiny, OpenAI says it’s built in safeguards. No deepfakes. No watermark removal. No explicit content. Images don’t carry visible watermarks, but OpenAI adds C2PA metadata for traceability. There’s also internal tooling to help track where images come from.
Ownership? It’s yours. As long as you follow the usage policy, you can do what you want with the images—print them, sell them, meme them into the metaverse.
Bottom Line
This isn’t just a toy upgrade. Images in ChatGPT signals OpenAI’s broader ambition: make ChatGPT the only tool you need for ideation, creation, and execution. Whether you’re building pitch decks, comics, or classroom slides, your AI co-pilot just leveled up.
Next up: video? Probably. But for now, image generation inside ChatGPT feels like a glimpse of what real multimodal AI should look like—useful, usable, and just a little bit magical.