Skip to main content

ChatGPT’s new upgrade finally breaks the text barrier

OpenAI is rolling out new functionalities for ChatGPT that will allow prompts to be executed with images and voice directives in addition to text.

The AI brand announced on Monday that it will be making these new features available over the next two weeks to ChatGPT Plus and Enterprise users. The voice feature is available in iOS and Android in an opt-in capacity, while the images feature is available on all ChatGPT platforms. OpenAI notes it plans to expand the availability of the images and voice features beyond paid users after the staggered rollout.

OpenAI image prompt.
Twitter/X

The voice chat functions as an auditory conversation between the user and ChatGPT. You press the button and say your question. After processing the information, the chatbot gives you an answer in auditory speech instead of in text. The process is similar to using virtual assistants such as Alexa or Google Assistant and could be the preamble to a complete revamp of virtual assistants as a whole. OpenAI’s announcement comes just days after Amazon revealed a similar feature coming to Alexa.

To implement voice and audio communication with ChatGPT, OpenAI uses a new text-to-speech model that is able to generate “human-like audio from just text and a few seconds of sample speech.” Additionally, its Whisper model can “transcribe your spoken words into text.”

OpenAI says it’s aware of the issues that could arise due to the power behind this feature, including, “the potential for malicious actors to impersonate public figures or commit fraud.”

This is one of the main reasons the company plans to limit the use of its new features to “specific use cases and partnerships.” Even when the features are more widely available they will be accessible mainly to more privileged users, such as developers.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb

— OpenAI (@OpenAI) September 25, 2023

The image feature allows you to capture an image and input it into ChatGPT with your question or prompt. You can use the drawing tool with the app to help clarify your answer and have a back-and-forth conversation with the chatbot until your issue is resolved. This is similar to Microsoft’s new Copilot feature in Windows, which is built on OpenAI’s model.

OpenAI has also acknowledged the challenges of ChatGPT, such as its ongoing hallucination issue. When aligned with the image feature, the brand decided to limit certain functionalities, such as the chatbot’s “ability to analyze and make direct statements about people.”

ChatGPT was first introduced as a text-to-speech tool late last year; however, OpenAI has quickly expanded its prowess. The original chatbot based on the GPT-3 language model has since been updated to GPT-3.5 and now GPT-4, which is the model that is receiving the new feature.

When GPT-4 first launched in March, OpenAI announced various enterprise collaborations, such as Duolingo, which used the AI model to improve the accuracy of the listening and speech-based lessons on the language learning app. OpenAI has collaborated with Spotify to translate podcasts into other languages while preserving the sound of the podcaster’s voice. The company also spoke of its work with the mobile app, Be My Eyes, which works to aid blind and low-vision people. Many of these apps and services were available ahead of the images and voice update.

Editors' Recommendations

Fionna Agomuoh
Fionna Agomuoh is a technology journalist with over a decade of experience writing about various consumer electronics topics…
I used ChatGPT to help me make my first game. Don’t make the same mistakes I did
A person typing on a laptop that is showing the ChatGPT generative AI website.

Alongside writing articles about ChatGPT, coming to terms with AI chatbot has been a major mission of mine for the past year. I've found it useful for coming up with recipe ideas from a list of ingredients, writing fun alternate history ideas, and answering board game rules clarifications. But I wanted to see if it could do something more impressive: teach me how to make a game.
The first hurdle
I've wanted to make a game for a while now. I programmed a bunch of basic Flash games when I was a kid -- if you can find my Newgrounds profile, you can have a good laugh at them -- but I've had a few ideas ticking in my mind that have calcified into thoughts that will not shift. I need to make them someday and maybe someday is now.

But knowing how to start making a game isn't easy. I didn't really know what kind of game I was trying to make, or what engine I should use, or how you actually start making a game. Until recently, I just hadn't done it. I'd downloaded Unity once, became intimidated, and uninstalled it.

Read more
This one image breaks ChatGPT each and every time
A laptop screen shows the home page for ChatGPT, OpenAI's artificial intelligence chatbot.

Sending images as prompts to ChatGPT is still a fairly new feature, but in my own testing, it works fine most of the time. However, someone's just found an image that ChatGPT can't seem to handle, and it's definitely not what you expect.

The image, spotted by brandon_xyzw on X (formerly Twitter), presents some digital noise. It's nothing special, really -- just a black background with some vertical lines all over it. But if you try to show it to ChatGPT, the image breaks the chatbot each and every time, without fail.

Read more
Google Gemini vs. GPT-4: Which is the best AI?
A person typing on a laptop that is showing the ChatGPT generative AI website.

Google's Gemini artificial intelligence and OpenAI's ChatGPT that uses the GPT-4 model are two of the most advanced artificial intelligence (AI) solutions available today. They can comprehend and interact with text, images, video, audio, and code, as well as output various alterations of each. they also provide expertise that would cost a lot to replicate with an expert human.

But if you're weighing which tool to put your time and energies into learning how to use, you want to pick the best one. Which is the more capable AI tool? Gemini or GPT-4?
Availability and pricing
Gemini is available in Pro and Nano form, though Ultra has yet to be released. Image used with permission by copyright holder

Read more