OpenAI has announced a major update to ChatGPT’s Voice mode, changing how users interact with the feature on both the web and the mobile app. The new experience allows users to engage in voice conversations directly within their ongoing chat, rather than switching to a separate interface.
With this update, users can now see a live transcript of their voice interactions, alongside visual elements that illustrate ChatGPT’s responses. For example, in OpenAI’s demo, the assistant displayed a map of popular bakeries and photos of pastries from Tartine, complementing its spoken recommendations.
How to Start a Voice Chat
Initiating a voice conversation is simple:
- Tap or click the waveform icon next to ChatGPT’s text input field.
- Voice chats now occur inline, maintaining the context of your existing conversation.
For those who prefer the original orb-filled Voice interface, OpenAI has included an option to switch back by enabling Separate Mode under Voice Mode settings.
Why This Matters
Combining voice responses with visuals is a natural extension of ChatGPT’s multimodal capabilities. Users can already prompt the model with voice, images, and videos, so adding visual context to voice replies enhances clarity and engagement.
While Google’s Gemini Live has explored similar features—such as overlaying highlights on live video—OpenAI’s approach focuses on making voice conversations more informative and interactive, even if not fully reactive in real time.
