Voice & AI audio

OpenAI ships three next-gen voice models in its API

Reasoning, translation, transcription — one stack. The brief.

The FeaturedDaily Desk7 May 2026Verified May 2026

The answer

OpenAI launched GPT-Realtime-2, Translate and Whisper on 7 May 2026 in its Realtime API.

What happened

On 7 May, OpenAI added three audio models to its Realtime API. GPT-Realtime-2 brings GPT-5-class reasoning into a live voice model — the first time OpenAI has offered frontier reasoning in-band during a voice conversation. GPT-Realtime-Translate handles live speech translation from 70+ input languages into 13 outputs, keeping pace with the speaker. GPT-Realtime-Whisper streams transcription in real time as audio comes in. All three are accessible through a single Realtime API connection.

Pricing and what it signals

Translate and Whisper are billed per minute of audio — standard pricing for commodity streaming tasks. GPT-Realtime-2 is billed by token. That difference matters: token billing reflects variable reasoning depth, and it's OpenAI's signal that this is where the expensive, defensible work sits. Per-minute billing would undercharge for complex requests; token billing prices the actual compute. The pricing architecture maps to where each model is genuinely differentiated.

OpenAI described the update as advancing voice intelligence — GPT-Realtime-2 is the first voice model in its Realtime API with GPT-5-class reasoning, alongside per-minute-billed translation and transcription models.

Source: OpenAI · 7 May 2026

GPT-Realtime-2 uses token-based billing, reflecting the variable compute cost of real-time reasoning; Translate and Whisper are billed by the minute.

Source: TechCrunch · 7 May 2026

What's next. The real test is whether GPT-Realtime-2's reasoning quality holds in production apps. Real-time reasoning trades some depth for latency, and the launch post is not the proof. Watch for the first wave of consumer and enterprise voice apps shipping on this stack in the months ahead — that's when the capability claim becomes verifiable.

Frequently asked questions

What are the three new OpenAI voice models?

GPT-Realtime-2 (reasoning-capable voice, billed per token), GPT-Realtime-Translate (live speech translation across 70+ languages, billed per minute), and GPT-Realtime-Whisper (streaming transcription, billed per minute) — all released 7 May 2026 in OpenAI's Realtime API.

Who are they for?

Developers building voice features into apps — assistants, customer service, translation and transcription. End users feel the benefit indirectly through smarter voice experiences in the apps they already use.

Why does billing differ between the models?

Translate and Whisper are commodity streaming tasks, priced per minute like most audio services. GPT-Realtime-2's reasoning workload is variable — complex requests cost more than simple ones — so token billing is the honest pricing model. Per TechCrunch (7 May 2026).

How does this fit into the broader voice-AI race?

This landed days before Sesame's iOS voice app and weeks before Apple's Siri AI reveal at WWDC. OpenAI is positioning the Realtime API as the developer infrastructure layer for voice agents before Apple and startups can lock in that ecosystem.

Sources

Advancing voice intelligence with new models in the API — OpenAI, 7 May 2026
Realtime API guide — voice agents, translation, transcription and speech models — OpenAI Platform Docs, 7 May 2026
OpenAI launches new voice intelligence features in its API — TechCrunch, 7 May 2026

← All news

What happened

Pricing and what it signals

Source: OpenAI · 7 May 2026

GPT-Realtime-2 uses token-based billing, reflecting the variable compute cost of real-time reasoning; Translate and Whisper are billed by the minute.

Source: TechCrunch · 7 May 2026

Frequently asked questions

What are the three new OpenAI voice models?

Who are they for?

Why does billing differ between the models?

How does this fit into the broader voice-AI race?

OpenAI ships three next-gen voice models in its API

What happened

Pricing and what it signals

Frequently asked questions

Sources

Related

AI voice in 2026: who makes what

GPT-5.6 Goes Public: Sol, Terra, Luna and the Prices

Anthropic Passes OpenAI on Revenue Run-Rate: The Quick Brief

OpenAI ships three next-gen voice models in its API

What happened

Pricing and what it signals

Frequently asked questions

Sources

Related

AI voice in 2026: who makes what

GPT-5.6 Goes Public: Sol, Terra, Luna and the Prices

Anthropic Passes OpenAI on Revenue Run-Rate: The Quick Brief