NVIDIA at CES 2026: Vera Rubin Dominates the Keynote Address
CES keynotes can be a parade of logos and vibes. NVIDIA’s CES 2026 keynote still had plenty of AI buzzwords, but the useful part was the numbers. Jensen Huang used the stage to turn the Vera Rubin platform from a name on a roadmap into something closer to a spec sheet, with concrete transistor counts, memory bandwidth, and scaling claims.
This is the point where NVIDIA’s strategy becomes blunt. The company is not really selling you a GPU. It is selling you a rack, and by extension the cluster that the rack plugs into. If you have followed our data center coverage, this is the through-line that keeps showing up: AI infrastructure is being productized as systems, not parts.
The rack pitch: Rubin is a platform, not a chip
NVIDIA’s headline claims for Rubin are deliberately framed as system outcomes. The company says Rubin delivers roughly 5x inference and 3.5x training uplift versus Blackwell, and then it immediately pivots to what hyperscalers actually care about: fewer GPUs to reach the same training target, and lower cost per token for inference. That is a subtle admission that “fastest chip” is not the only game anymore. The economics are the moat.
NVIDIA also made a point of saying Rubin is not one product announcement. It is a platform consisting of multiple chips designed to land together, and to scale together. That matters because at this point, the failure mode is not designing one impressive GPU. The failure mode is breaking the supply chain across packaging, memory, optics, and networking so badly that the “platform” arrives as a staggered mess.
NVIDIA Rubin GPU: transistor budget, FP4 throughput, and bandwidth as the limiter
Rubin’s GPU pitch starts with the kind of transistor number NVIDIA likes putting on slides: 336 billion transistors. Transistors are not performance by themselves, but they are a decent proxy for intent. Rubin is not framed as a polite tweak. It is framed as more compute, more data movement, and more internal plumbing to keep utilization high when you scale out.
The key point is that Rubin’s top-line throughput is anchored on NVFP4. NVIDIA is leaning hard into 4-bit as the throughput and cost lever, because that is where the economics start to bend in NVIDIA’s favor at scale. The company’s own figures put Rubin at 50 PFLOPS NVFP4 inference and 35 PFLOPS NVFP4 training. Whether developers are comfortable living in FP4 everywhere is a separate question. NVIDIA’s approach is clear: build the hardware first, then pressure the ecosystem to make it normal.
Rubin also tries to remove one of the most common bottlenecks in large-model inference and training: memory bandwidth. NVIDIA is pairing Rubin with HBM4 and quotes 22 TB/s of HBM bandwidth. In practice, bandwidth is what keeps “more compute” from turning into “more idle compute.” As context windows grow and MoE traffic patterns stay ugly, starving the GPU is still the easiest way to waste silicon.
Then there is NVLink. Rubin is not being sold as “a GPU you install.” It is being sold as “a GPU that behaves as part of a coherent domain.” NVIDIA quotes 3.6 TB/s NVLink bandwidth per GPU, and that is the correct emphasis. If you want rack-scale hardware to behave like one machine, you buy headroom in interconnect.
NVIDIA Vera: Olympus cores and the real job of the CPU in an AI rack
Vera is NVIDIA acknowledging something it used to pretend was boring. The CPU inside these racks is not just there to boot Linux and get out of the way. Vera is designed as part of the fabric.
NVIDIA says Vera uses 88 custom Olympus CPU cores, and ties the headline to the numbers that matter for the platform: NVLink-C2C at 1.8 TB/s and up to 1.5 TB of system memory (LPDDR5X). This is not “CPU performance” in the desktop review sense. It is determinism, orchestration, and keeping the GPU domain fed without the CPU turning into a choke point.
At the platform level, the story is simple. NVIDIA wants the unit of sale to be the rack. That requires the CPU to behave like a component of the rack-scale machine, not a separate host with its own island of memory and latency. NVLink-C2C is the hook: coherence, control paths, and a tighter CPU-GPU boundary.
NVLink 6 Switch and 400G SerDes: this is the bit that separates “chip” from “system”
NVLink 6 is where NVIDIA reminds everyone that the hard part is not making a fast GPU. The hard part is moving bits around at scale without collapsing efficiency. The company’s pitch is 3.6 TB/s per GPU all-to-all bandwidth inside the NVLink domain, enabled by 400G SerDes.
SerDes is not a marketing flourish. It is the physical layer reality of modern AI systems. Higher per-lane bandwidth means you can move more data off-package without exploding lane counts, connector complexity, routing congestion, and power overhead. The tradeoff is brutal signal integrity, tighter packaging constraints, and more expensive design validation. NVIDIA is effectively saying it is comfortable living in that pain, because the payoff is that NVLink scaling stays ahead of the GPU scaling.
This is also why NVIDIA keeps talking like a networking company now. The differentiator at rack scale is not just ALUs and HBM stacks. It is whether your fabric lets the rack behave like one machine instead of a collection of very fast components that spend half their time waiting.
ConnectX-9 and Spectrum-X: scale-out is where AI systems get ugly
NVLink is the scale-up story. Ethernet is the scale-out story, and scale-out is where AI factories get messy. NVIDIA introduced ConnectX-9 as a next-generation SuperNIC with 200G PAM4 SerDes, aimed at pushing Ethernet throughput while reducing the lane and routing tax of building dense systems.
The significance of 200G per lane is not just speed. It is system practicality. Fewer lanes per port means less board complexity, fewer signal integrity nightmares, and less PHY overhead fighting for power and thermal budget next to multi-kilowatt GPU trays. That matters because networking is now competing with GPUs for the same mechanical and electrical budget inside a rack.
On the switching side, NVIDIA’s Spectrum-X pitch scales into photonics switches, including a configuration quoted at 102.4 TB/s. The more interesting angle is not the big number. It is the workload assumption. NVIDIA is explicitly building for modern AI traffic patterns, especially the bursty all-to-all phases that show up in MoE training and inference. If your fabric collapses under congestion, your GPUs do not matter.
Serviceability: NVIDIA finally admits that operators are part of the performance story
One of the most telling parts of the Rubin platform briefing was not a PFLOPS number. It was the serviceability claims. NVIDIA is pitching redesigned, modular, cable-free assemblies and quotes up to 18x reduction in service time for certain maintenance operations. That is an operator-driven reality check. At this scale, downtime is a cost model input, not an inconvenience.
This is also where the rack-first strategy becomes self-reinforcing. If the rack is the product, then manufacturability and serviceability become product features. It is not glamorous, but it is the difference between “paper launch platform” and “something customers can deploy without rage.”
What to watch: timeline, supply, and how far “up to” stretches in real deployments
NVIDIA says Rubin is real silicon, with the platform built around multiple new chips and targeted for availability in H2 2026. That is close enough that customers will plan around it, and close enough that any slip becomes someone else’s capacity problem. At NVIDIA’s current scale, being late is no longer a product cycle annoyance. It is a supply continuity event.
The other thing to watch is how the economics claims land in the real world. NVIDIA is talking about up to 10x lower cost per token and up to 4x fewer GPUs for certain training scenarios versus Blackwell. Those are big claims, and they are very likely configuration-dependent. The direction is plausible, but the spread between “up to” and “typical” is where the market will decide whether Rubin is a true platform reset or just a very expensive upgrade path dressed as inevitability.
CES 2026 did not reveal a surprise new NVIDIA strategy. It confirmed the one the company has been executing for years, just with less ambiguity: the product is the rack, the moat is the fabric, and the scoreboard is cost-per-token. Rubin and Vera are the next step in that trajectory. The only real question is whether supply, packaging, and memory volumes can keep up with the demand curve that NVIDIA itself is encouraging.

Leave a Reply Cancel reply