Micron’s HBM4 clears 2.8 TB/s with samples shipping — HBM4E custom base die points to bespoke AI accelerators

Micron says its next-gen HBM4 already exceeds 2.8 TB/s per stack with >11 Gb/s pin speeds, and that it’s begun shipping customer samples. The company also teased HBM4E with a customizable base logic die, a move that could let hyperscalers tailor memory behavior to their own accelerator silicon.

What happened

During its latest results update, Micron said HBM4 stacks are now sampling to key customers with headline bandwidth above 2.8 TB/s per stack and pin speeds north of 11 Gb/s. That’s a step beyond the JEDEC HBM4 baseline (2.0 TB/s, 8 Gb/s) and indicates Micron is comfortable running wider and faster I/O earlier than expected. The company also reiterated a roadmap to HBM4E that enables customer-specific customization of the base logic die, effectively turning memory from a fixed commodity into a tunable component of the accelerator.

Why it matters

  • Tokens per joule go up: Faster pin speed and better power efficiency directly reduce time-to-first-token and cost-per-token for LLM serving—two KPIs hyperscalers obsess over.
  • HBM4E = differentiation lever: A customizable base die allows routing tweaks, prefetch behavior, and queueing that better match an accelerator’s on-package fabric. That’s new—and it moves some “secret sauce” into the memory stack.
  • Packaging pressure eases: If a single 12-high HBM4 stack clears ~2.8 TB/s, board designers can sometimes hit bandwidth targets with fewer stacks, freeing up area and improving yields on interposers or bridges.

Speeds, feeds, and where the gains come from

  • Interface width: HBM4 doubles the I/O (to a 2048-bit interface), which combined with elevated pin rates is how you reach 2.8 TB/s+ per stack.
  • Stack height: Early samples are 12-high (36 GB). Expect capacity options as 1-gamma process DRAM ramps and as thermals allow taller stacks.
  • Power efficiency: Micron is leaning on an in-house CMOS base die and packaging tweaks to keep watts in check at higher speeds.

Market impact: NVIDIA, AMD, and everyone else

This shifts the near-term memory conversation from “Who has HBM3E?” to “Who can feed a Rubin-class or CDNA-next GPU at HBM4 speeds without blowing the power budget?” Micron’s early sampling plus a path to custom base dies gives cloud buyers leverage—especially if they want to co-design memory behavior around sparsity, MoE routing, or giant KV caches.

Design notes for accelerator teams

  1. Thermals first: 2.8 TB/s with 12-high stacks means aggressive heat-spreader design. If you reduce stack count thanks to higher bandwidth per stack, you may also cut interposer size and cost.
  2. NUMA-aware memory maps: With more bandwidth per stack, placement of attention blocks and KV cache relative to each stack’s locality matters more than ever.
  3. HBM4E experiments: If Micron’s custom base die options are real, pilot them for traffic shaping and queue management that reflect your model’s token flow rather than generic patterns.

What to watch next

  • HBM4E disclosures: What knobs—reorder buffers, ECC, refresh policy, link training—will be exposed to customers?
  • Customer ramps: Which accelerators (NVIDIA Rubin, AMD Instinct next) confirm HBM4 SKUs first, and at how many stacks per card?
  • Power targets: Whether the real-world joules/TB move down materially at 2.8 TB/s will decide data-center TCO, not the spec sheet alone.

Sources

Be the first to comment

Leave a Reply

Your email address will not be published.


*