SHARE

ChatGPT is rarely out of the news at the moment, and the engineers behind it are regularly pushing out new features and improvements to the generative AI chatbot—including, recently, new voice and image capabilities.

[ Related: ChatGPT can now see, hear, and talk to some users ]

In simple terms, as per the OpenAI blog post, these new capabilities mean ChatGPT can now see, hear, and speak. You’re no longer restricted to text prompts when interacting with the bot, although it’s worth noting that these features remain exclusive to paying ChatGPT Plus users for the time being. At first, only a limited number of users got the features as they were rolled out, but now every ChatGPT Plus user should have access. (On November 21, ChatGPT’s voice chat feature has been rolled out to all free users.)

As well as changing how you interact with ChatGPT, these new features also widen the scope of what it can do—read you a bedtime story, for instance. Here’s what’s new, and how to make the best use of it.

Chatting with ChatGPT

You've got five voice options for conversing with ChatGPT.
You’ve got five voice options for conversing with ChatGPT. OpenAI/David Nield

If you’re a ChatGPT Plus user and you want to talk to ChatGPT, you need to use the mobile app for Android and iOS (this functionality hasn’t yet been added to ChatGPT on the web). Once you’ve signed into your account and reached the main prompt screen, tap the headphones icon (lower right) to start a voice conversation with the bot.

You’ll get a splash screen explaining what the feature does, then you can tap Choose a voice to do just that. There are five to pick from, and if you select any of them you’ll hear a short preview. Tap Confirm when you’ve decided which one you want to converse with, and you’re then ready to start talking.

Speaking with ChatGPT is as simple as just talking to your phone. When you stop talking, the app will process what you’ve said and generate a response. You’ll often find that when it’s speaking, ChatGPT will end its response with a related question, to keep the conversation going—but you can always ask to talk about something else, or tap the pause button in the lower left corner to start a new chat.

If ChatGPT isn’t quite catching what you’re saying or recognizing your pauses as you talk, you can manually give it voice inputs, walkie talkie style, by tapping and holding the screen. Say what you need to say, then release your finger and the chat will be processed—it’s a more deliberate way of talking that you might find easier.

Think about ways in which spoken responses are better: You can get ChatGPT to tell you a bedtime story, for example, or a poem on a topic of your choice. As with text prompts, you can be as specific as you like about subjects or the tone. When you’re ready to go back to the main ChatGPT interface, tap the red and white cross icon, and you’ll see the responses you’ve been given in text format. 

Image inputs and outputs

ChatGPT can identify the contents of images for you.
ChatGPT can identify the contents of images for you. Credit: David Nield

You can now prompt ChatGPT using images, whether it’s on the web or via the apps for Android or iOS. On the web, click the paperclip icon to the left of the input box, then pick the image from your computer; in the apps, tap the picture icon to choose an image from your gallery or the camera icon to take a new photo (if you can’t see these icons, tap the + button to the left of the input box).

You’ll be invited to add a prompt alongside your image, and your options here are virtually unlimited. You can ask ChatGPT about what’s inside the image, for example. You can also take a photo of a problem—like a leaky faucet—and ask about the best way to fix it, or show ChatGPT the contents of your fridge and ask for suggestions on what meal to cook.

If you’re in the mobile apps, you can tap on the image before you add the accompanying prompt, and scribble around a particular part of it—this focuses ChatGPT’s attention on a particular part of the image, which can be useful for troubleshooting problems or getting clarity about something specific.

The image generator DALL-E (also developed by OpenAI) is now integrated inside ChatGPT as well. That means you can ask for new images to be generated, as well as using your own as prompts: Ask it to produce a landscape of rolling hills, or a grimy street scene at night, or a cartoon-style rendering of an interior location. You can also ask it to modify or build on an image you provide.

As with text prompts, the more specific you can be, the better—you can be really precise about what’s in your picture, and what style is used, and how color and shade is applied. So, you might say you want to see a cartoon-style picture of fields with a well in the foreground. Or, you might want a photorealistic portrait of a CEO-type figure, rendered in black and white. If you’re not happy with the first attempt at something, you can ask ChatGPT to make changes with further prompts. To save your creations, click or tap on the generated images to find the download option.