GPT-4o Can Now Generate Images with Almost Flawless Text

OpenAI’s GPT-4o launched around a year ago, just got a major upgrade: image generation with stunningly accurate text rendering. This new feature allows users to create detailed, high-quality images based on text prompts and refine them through the conversation until they align perfectly with their imagination—eliminating the nonsensical symbols and wavy characters common in previous AI models.

The Text in the Images shared by OpenAI is perfectly legible

In contrast to conventional image generation, which typically involves refining a single prompt, GPT-4o adopts a more interactive method. You begin with a simple request—like asking for a cat—and then engage in a conversation to fine-tune your vision, whether that includes a detective hat, a monocle, or any other detail you desire.

OpenAI provides examples that illustrate this process: users can construct and adjust scenes incrementally, combining elements from various images into a unified outcome. The model excels at producing clear text on signs or objects, a significant improvement over the jumbled results from earlier AI image generation systems.

It’s worth mentioning that OpenAI acknowledges some selective showcasing—many images are “best of 2” or “best of 8” —but the outcomes remain impressive, particularly given the user-friendly interface. GPT-4o can even begin with your photo and apply modifications, managing 10-20 objects in a scene where competitors struggle with just 5-8. Recently, I attempted to recreate the final scene from The Count of Monte Cristo, which proved quite challenging. However, with GPT-4o’s image generation, not only will the resulting images feature readable text, but it will also be significantly easier to turn your creative ideas into reality.

That said, it’s not without its flaws. OpenAI points out issues like bottom cropping, persistent hallucinations, difficulties with non-Latin text, and challenges when exceeding 20 objects. Nevertheless, the capability to create intricate, text-rich images using straightforward English distinguishes GPT-4o from its predecessors. If you’re working on a poster, this tool offers a level of accuracy and adaptability that older models could only aspire to achieve.

For more daily updates, please visit our News Section.

Categories AI

Leave a Comment