3 min readNew DelhiMay 8, 2026 09:22 AM IST OpenAI, on Thursday, May 7, introduced its new generation of voice models that have the ability to reason, translate, and transcribe as users speak. The company said that its latest models in the API have the potential to create a wide range of voice apps for developers. In simple words, developers will now be able to create apps that can talk, transcribe, and translate conversions with users in real-time.
The new models are GPT-Realtime-2, which comes with GPT-5 class reasoning capable of handling harder requests and carrying out conversations naturally. The second model is GPT-Realtime-Translate, which is OpenAI’s new live translation model that translates speech from over 70 input languages into about 13 output languages. And the model does this in real time while maintaining the pace of the speaker. Meanwhile, the third model is GPT-Realtime-Whisper, which is a new streaming speech-to-text that transcribes speech live as the speaker talks.
Voice models are among the most preferred ways for millions to use software. However, OpenAI said that building practical voice products is far more complex. This is because an AI agent needs to understand the context of the conversation, adjust when a request changes, use tools as the conversation continues, and, most importantly, respond in a manner that feels appropriate for the time.
“Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said in its official blog.
When it comes to deployment, these models are likely to benefit organisations that want to expand their customer service offerings. Regardless, the Sam Altman-led company has stated that the new features will be helpful in far more areas, including domains like media, events, education, creator platforms, etc.
Besides, when it comes to India, realtime translation can likely mean services with multilingual voice experiences. The model allows developers to build live multilingual voice experiences where multiple persons can speak in their desired language and hear the conversation translated in real time and read transcriptions at the same time.
“Building voice AI for India means handling diverse regional phonetics. In our evals across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested, along with lower fallback rates, higher task completion, and latency that sustained natural conversation,” said Prateek Sachan, Co-founder & CTO at BolnaAI, adding that the model sets a new standard for multilingual voice AI.
Expand © IE Online Media Services Pvt Ltd
