Everything You Need to Know About ChatGPT’s Latest Update

Everything You Need to Know About ChatGPT’s Latest Update — GPT-4o GPT-4o is the latest AI model released by OpenAI, the company that disrupted the entire tech scene through its ChatGPT AI models a little over a year ago kickstarting the AI boom. GPT-4o is touted as its flagship multimodal AI, which means it can handle text, audio and images — something that was thought to have been far-fetched as recently as 2 years ago.

According to Mira Murati, CTO of OpenAI, the latest ChatGPT update brings “GPT-4-level intelligence to everything, including our free users.” Through this update, OpenAI has for the first time made a GPT-4 level model freely accessible to ChatGPT users worldwide as until this point, free users only had access to GPT-3 and GPT-3.5 Turbo.

Everything-You-Need-to-Know-About-ChatGPT’s-Latest-Update-—-GPT-4o---img01

During the release, OpenAI also announced that their GPT-4o model comes with GPT level performance (and better) at considerably faster speeds as well as lower costs.

What’s New in the ChatGPT-4o Model?

Everything You Need to Know About ChatGPT’s Latest Update — GPT-4o - img02

(Image source: OpenAI)

One of the main benefits according to the company is the ability to work with enormous reasoning, processing and natural language capabilities that are being made available in the free version of ChatGPT for the first time. OpenAI said that it wants to make the best AI available to everyone free of cost as part of its Spring Update.

The company isn’t going to make all the features available instantly, rather preferring to roll them out in phases.

According to OpenAI, the “o” in 4o stands for “Omni,” which highlights the model’s multimodal capabilities.

Sam Altman, OpenAI CEO recently posted on X mentioning GPT-4o is “natively multimodal,” which means it can easily work with voice, text and vision (read images and video). During the ChatGPT-4o launch that was live streamed, OpenAI CTO, Mira Murati explained that GPT-4o will be freely available and users will be able to do things like convert text into image outputs.

A blog post on the company’s official website mentioned that GPT-4o’s features “will be rolled out iteratively,” while noting that its text and image features will begin to roll out today.

The blog also explained how the latest update scores over the previous versions. According to the company, users could previously use the voice mode to talk to ChatGPT with latencies of 2.8 seconds and 5.4 seconds for the GPT-3.5 and GPT-4 versions respectively.

The earlier versions used three separate models — one to transcribe audio to text, the second model used GPT-3.5 & GPT-4 to take in and give out text, and the third that converts the text back to audio. This meant that GPT-4 would not be involved in the first and third processes.

Since it couldn’t directly understand the tone, multiple speakers (if any), background noises, or express emotions, it meant the main source of intelligence (GPT-4) lost out on vital information.

With GPT-4o, OpenAI trained a single model to understand text, vision and audio, which means that the same neural network processes all the information. OpenAI further explained that since this is the first time they’re combining all these modalities, there’s a lot of research still to be done to understand what the model can and can’t do.

GPT-4o’s Voice Mode Capabilities

OpenAI’s latest update has a new “Voice Mode” for easy back and forth communication just like having a conversation with another person. OpenAI is expected to further strengthen this by including video capabilities in the near future too.

OPenAI shared GPT-4o’s phenomenal multimodal capabilities in a series of videos where people can be seen having normal conversations with the AI in the voice mode. In one such striking video, two people simultaneously have a conversation with GPT-4o asking it for ideas for games that they can play.

To this, GPT-4o suggests playing a “classic game of rock-paper-scissors.” In the video, the model not just has a conversation with the two individuals but also plays referee by identifying who’s the winner in the rock-paper-scissors game — indicating that it can process both audio and identify the individuals and their actions visually simultaneously — a mind blowing feat!

OpenAI claims that GPT-4o responds to audio sound within 232 milliseconds, with an average response time of 320 milliseconds, which is similar to human response time.

Everything You Need to Know About ChatGPT’s Latest Update — GPT-4o - img03

(Image source: OpenAI)

Other Important ChatGPT 4-o Updates

In addition to the multimodal capabilities, OpenAI has included several new features that could completely transform the way you use AI in the coming days. Note that not all these features will be immediately available.

Users accessing the free version of ChatGPT will see the biggest changes because GPT-4o is not only faster than the previous models, but is also touted to score over the paid version previously available.
OpenAI’s latest offering allows users to interact with ChatGPT with videos, which means you can share live footage of a math equation you’re stuck with and ask ChatGPT to solve it for you. GPT-4o will either give you the final solution or offer options that you can choose to work through the equation.
GPT-4o also allows you to share images or documents that contain both text as well as images. You can ask ChatGPT about earlier conversations and check real-time information in a conversation.
OpenAI revealed additional data analysis features such as uploading files from Google Drive and Microsoft OneDrive that allows you to work with graphs and charts. You can also carry out complex data analysis by uploading graphs.
ChatGPT-4o will be available in 50 languages.
Currently, users will get access to the chat version of GPT-4o without the more advanced features of voice and video functionalities as those features have been identified for a roll out at a later date — starting with Plus and Team accounts.
You can now run code snippets, evaluate photos and text files and get access to custom GPT chatbots.
In a bid to make AI more accessible, the company also announced a fresh UI that does not require your email to use ChatGPT.
You can interact with ChatGPT at a more conversational level and share videos as a starting point.

Controversies Surrounding the Latest GPT-4o “Sky Voice”

One of the new features coming to GPT-4o is the voice mode in which the app will be able to act as a voice assistant (called “Sky voice”) similar to that seen in the movie “Her.” Reports said that Sam Altman was keen on using Hollywood actress Scarlett Johansson’s voice for the assistant, an approach that the actress had reportedly turned down several times.

The actress stated that GPT-4o’s voice assistant sounds “eerily similar” to her’s even though she had declined the offer. She further stated that she was “shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference.”

In response to the actor’s statement, Sam Altman, in a written statement, said that “the voice of “Sky” was not meant to sound like Johansson’s and was chosen before reaching out to her.”

While the actor has requested her legal team to understand the process used by OpenAI to choose the voice, OpenAI has reluctantly agreed to take down the “Sky voice” for the time being.

How to Try ChatGPT-4o?

OpenAI has already made the text and vision features of GPT-4o available to several paid users and it will soon be made available to free users too. Developers can access GPT-4o through its API right away.

ChatGPT is also rolling out a desktop app with GPT-4o functionalities, which will make it instantly available. GPT-4o can also tell you what’s happening on your screen using its vision capabilities.

You may not be able to use all the fancy features immediately as OpenAI plans on releasing them gradually — so be patient!