Mira Murati's Thinking Machines: The Rise of Interaction Models

May 18, 20265 min read

Former OpenAI CTO Mira Murati's new startup introduces "interaction models" - AI designed for continuous, full-duplex conversation that could reshape how we work with artificial intelligence.

In September 2024, Mira Murati made headlines when she stepped down as OpenAI's Chief Technology Officer—the executive who had led the development and release of ChatGPT, GPT-4, and Advanced Voice Mode. Six months later, she's back with something that might fundamentally change how we think about AI interfaces.

On May 11, 2026, Thinking Machines Lab, Murati's new venture, released a research preview of what they call "interaction models." The core concept is deceptively simple: instead of building AI that takes turns with humans (you speak, it responds, you speak again), these models engage in continuous, full-duplex conversation—listening, watching, reasoning, and speaking simultaneously, just like humans do.

What Makes Interaction Models Different

The AI industry has been dominated by turn-based models for years. You type a prompt, the model processes it, and returns a response. Even voice assistants like Siri or Alexa follow this pattern: listen, stop listening, process, respond. It works, but it's not how humans communicate.

Real conversation is messier. We interrupt each other. We say "uh-huh" and "right" while someone else is talking. We pick up on visual cues—a raised eyebrow, a confused look—and adjust mid-sentence. Traditional AI architectures fundamentally can't do this because they're built on turn-taking.

Thinking Machines' approach is different. Their model, TML-Interaction-Small, processes audio, video, and text in parallel through 200-millisecond "micro-turns." It can backchannel (those little "got it" and "okay" sounds humans make), interrupt when it detects hesitation, or jump in when it notices something relevant in what you're showing it.

The technical specs are impressive: 276 billion parameters with a Mixture-of-Experts architecture that activates only 12 billion per inference, keeping computational costs manageable. The average turn-taking latency is 0.40 seconds—comparable to natural human conversation and significantly faster than most current voice AI.

The Full-Duplex Revolution

The central innovation here is "full-duplex" communication. In telecommunications, full-duplex means both parties can transmit and receive simultaneously—like a phone call where both people can talk at once. Half-duplex is more like a walkie-talkie: one person speaks while the other listens, then they switch.

Current AI assistants are essentially sophisticated walkie-talkies. They might stream responses or handle interruptions, but the underlying architecture still processes in discrete turns. Thinking Machines built their model from the ground up to handle bidirectional streaming natively.

This isn't just a technical curiosity. It has profound implications for how natural AI collaboration feels. When you're working with a colleague, you don't wait for them to finish every thought before responding. You build on each other's ideas in real-time. You catch misunderstandings immediately. You maintain a flow state that turn-based AI inherently disrupts.

The Architecture Behind the Magic

What's particularly clever about Thinking Machines' approach is their dual-model architecture. The system isn't one monolithic model—it's two working in parallel:

The Interaction Model handles the real-time conversation: dialogue, presence detection, immediate responses, backchanneling, and frame-by-frame video analysis. It's optimized for low latency and natural engagement.

The Background Model runs asynchronously, handling extended reasoning, web searches, tool calls, and complex calculations. When it finishes a task, it streams results back to the Interaction Model, which weaves them into the conversation without perceptible pauses.

This solves a fundamental problem in real-time AI: how do you let the model "think" without the conversation grinding to a halt? By separating the concerns, Thinking Machines keeps the interaction fluid while still enabling deep reasoning.

Another technical innovation worth noting: encoder-free early fusion. Traditional multimodal models pass audio and video through separate pre-trained encoders before combining them. Thinking Machines ingests raw data directly—dMel spectrograms for audio, 40x40 pixel patches for video. This reduces latency and lets the model learn more coherent multimodal representations during pre-training.

The Bitter Lesson

Thinking Machines explicitly cites Rich Sutton's famous "bitter lesson" in their technical blog post. Sutton observed in 2019 that hand-built AI systems (with rules, heuristics, and engineered components) are always outperformed in the long run by general architectures that scale with compute and data.

Applied to voice AI: the current approach of orchestrating separate speech recognition, language models, and text-to-speech systems will likely be surpassed by models that learn everything end-to-end. The bigger the model gets, the more its native training in interactive mode should produce emergent capabilities that external pipelines can't replicate.

It's a bet on architecture over engineering, on learning over crafting. Given Murati's track record at OpenAI, it's a bet worth taking seriously.

What This Means for the Industry

The strategic implications are significant. Murati took substantial real-time voice expertise with her when she left OpenAI—she led the development of Advanced Voice Mode. Now she's publishing previews claiming better latencies than her former employer.

For OpenAI, Google, and Anthropic, this creates competitive pressure on a critical frontier. The race for the first true mass-market AI interface beyond text chat is intensifying. Whoever solves natural, fluid AI collaboration first may define the next era of human-computer interaction.

For developers and businesses building on AI APIs, the message is mixed. In the short term, nothing changes—this is a closed research preview, not a commercial product. In the medium term (12-18 months), the voice-first architectures built by orchestrating separate ASR, LLM, and TTS components may become obsolete, replaced by models that natively handle bidirectional streaming.

The Road Ahead

Thinking Machines has the resources to execute. They raised a $2 billion seed round led by Andreessen Horowitz at a $12 billion valuation—possibly the largest seed round in AI history. They've signed a strategic partnership with NVIDIA for a gigawatt of Vera Rubin compute. They've recruited Soumith Chintala, co-creator of PyTorch, as CTO.

But challenges remain. Two co-founders left for OpenAI in January 2026. The preview is closed, with no public API or pricing. Commercial availability is vaguely "later this year." And running a full-duplex model with parallel architectures consumes significantly more compute than traditional text-based LLMs.

The gap between a curated demo and a reliable consumer product is notoriously wide in AI. We've seen this before with Google Gemini and various OpenAI launches. Independent benchmarks don't exist yet. The technology is promising, but unproven at scale.

Why This Matters

Interaction models represent a philosophical shift in how we think about AI interfaces. Instead of optimizing for task completion (get the right answer), they optimize for collaboration (maintain a productive partnership). Instead of minimizing latency per response, they minimize friction in the overall interaction.

If Thinking Machines succeeds, AI becomes less like a tool you query and more like a colleague you work alongside—present, attentive, and genuinely collaborative. That's a future worth building toward.

The race for natural AI interaction just got a lot more interesting.