OpenAI Open-Sources MRC: A New Networking Protocol for Supercomputer-Scale AI Training
OpenAI and five hardware partners release MRC through the Open Compute Project to reduce congestion and hardware-fault disruptions in large GPU clusters.
Last verified:
OpenAI, together with AMD, Broadcom, Intel, Microsoft, and NVIDIA, has released MRC (Multipath Reliable Connection) — a new GPU networking protocol designed to keep Stargate-scale training runs alive through congestion events and hardware faults. Published through the Open Compute Project on May 5, 2026, MRC is now available to the broader industry as an open specification.
Preempting Failure: Static Source Routing
Most networking protocols respond to failures reactively, triggering routing updates that can cascade into their own disruptions. MRC takes the opposite approach. According to OpenAI Blog, its static source routing precomputes traffic paths so that when a link or switch fails, packets are deterministically rerouted without spawning a routing-protocol storm — eliminating whole categories of failure rather than patching them one by one.
Clearing the Bottlenecks: Adaptive Packet Spraying
At the traffic level, MRC deploys adaptive packet spraying to distribute transfers across all available paths in real time. This prevents the hot-spot congestion that forms when multiple GPUs simultaneously target a single destination — the kind of jitter that causes one late-arriving transfer to cascade idle time across thousands of downstream processors.
Layered Redundant Network Fabric
Structurally, MRC is built on redundant parallel network planes that provide independent connectivity paths using fewer physical components and lower power draw than conventional redundancy designs. OpenAI Blog notes that architectural simplicity becomes increasingly valuable at Stargate’s scale, where the total number of interconnects turns complexity into a compounding liability.
A Five-Partner Coalition Behind an Open Standard
What makes this release unusual is who signed on. The MRC coalition spans competing hardware vendors and a hyperscaler in a joint bet on open standards — routing the specification through the Open Compute Project rather than holding it proprietary. That choice reflects a calculation that shared infrastructure norms raise the ceiling for everyone faster than competitive hoarding would.
Why This Matters
With over 900 million weekly ChatGPT users, OpenAI increasingly operates network infrastructure at a scale where it has more in common with telecommunications carriers than software companies. Open-sourcing MRC shifts networking from a potential competitive moat into shared industry plumbing — a signal that OpenAI believes its advantage lies in what it builds on top of the network, not in the network itself. For AI infrastructure teams, the OCP release offers a concrete, vendor-backed blueprint for improving cluster resilience at scale.
Frequently Asked Questions
What is MRC (Multipath Reliable Connection) and who developed it?
MRC is an open networking protocol co-developed by OpenAI, AMD, Broadcom, Intel, Microsoft, and NVIDIA to improve GPU cluster reliability at supercomputer scale. It was released through the Open Compute Project in May 2026.
How does MRC improve AI training reliability compared to existing approaches?
MRC combines static source routing, adaptive packet spraying, and redundant parallel network planes to preempt congestion and reroute around hardware faults — eliminating entire failure categories rather than reacting to them after the fact.