OpenAI is rolling out new voice and image capabilities in ChatGPT

OpenAI is rolling out new voice and image capabilities in ChatGPT. These offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.

With voice and image capabilities, you have more ways to interact or use ChatGPT in your everyday life.

To begin using the voice feature, go to your mobile app’s Settings, then navigate to New Features, and opt in for voice conversations. Afterward, click the headphone icon situated in the upper-right corner of the home screen and select your desired voice from a choice of five different options.

This enhanced voice capability is driven by an advanced text-to-speech model, capable of generating lifelike audio solely from text and a brief audio sample. OpenAI says its collaboration with skilled voice actors was instrumental in crafting each of these unique voices. Additionally, OpenAI employ Whisper, our open-source speech recognition system, to accurately transcribe spoken words into text.

You can chat about images by taking pictures of an image and asking ChatGPT anything about the image. For instance, I am take a picture of the contents in my refrigerator and ask chatGPT to come up with possible menu combinations.

To get started, select the photo button to either take a picture or pick an existing image. If you’re using iOS or Android, you may need to tap the plus button initially. Additionally, you have the option to converse about multiple images or employ our drawing tool to provide instructions to your assistant.

The capability to comprehend images is made possible through the utilization of multimodal models, including GPT-3.5 and GPT-4. These models apply their language comprehension abilities to various types of images, including photographs, screen captures, and documents that contain a combination of text and visuals.

OpenAI says that the new voice and image capabilities will be available to ChatGPT Plus and Enterprise users in the next two weeks. It will roll out these capabilities to other groups of users, including developers, soon after. It also says that voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.

Last week, OpenAI introduced DALL-E 3, featuring full integration with ChatGPT

Related

South Africa’s Cerebrium Lands $8.5M to Take on Global AI Infrastructure

Ecobank Partners with Google Cloud to Drive AI-Powered Financial Transformation in Africa

AI Drones vs. Gold Raiders: West African Mines Fight Back with Technology

2 Comments

Related

Related Posts

South Africa’s Cerebrium Lands $8.5M to Take on Global AI Infrastructure

Ecobank Partners with Google Cloud to Drive AI-Powered Financial Transformation in Africa

AI Drones vs. Gold Raiders: West African Mines Fight Back with Technology

2 Comments