NextSilicon Maverick-2 claims GPU-beating speeds, teases RISC-V based 'Arbel' chip

NextSilicon claims its Maverick-2 accelerator outperforms leading Nvidia and Intel GPUs on graph and sparse workloads, while using less power. The company also teased a new RISC-V test chip called Arbel. DataCenterDynamics first reported the latest claims, and a Business Wire release adds new benchmark bullets and system details.

What Maverick-2 actually is

Maverick-2 is a dataflow accelerator that reconfigures at runtime and targets the bottleneck code paths your application spends time in. The pitch is simple. Keep your existing code and toolchains. Let the runtime steer hot spots into units that behave more like a custom engine than a fixed GPU pipeline. That is how they frame the speed and power wins.

Form factors, memory, and power

PCIe card. Single-die board with HBM3e. NextSilicon lists 96 GB capacity and a max power up to 400 W. DCD quotes 96 GB at 300 W. That gap likely reflects binning or different firmware limits.
OAM module. Dual-die Open Accelerator Module with 192 GB HBM3e. NextSilicon lists a 750 W ceiling. DCD reports 600 W. Same caveat on SKUs applies.
Process and clocks. TSMC 5 nm, around 1.5 GHz, 2.5D package.

What the company is claiming on performance

The latest release highlights three areas. PageRank on big graphs, GUPS for random memory updates, and HPCG for sparse linear algebra. NextSilicon quotes up to a 10 times advantage on small to mid graph analytics versus leading GPUs, with lower power use, and says it can run 25 GB plus graph sizes that comparable GPUs could not complete. On GUPS, they cite 32.6 GUPS at 460 W. On HPCG, 600 GFLOPS at 750 W and matching top GPUs at roughly half the power. These are vendor numbers, not peer reviewed papers, so treat them as a starting point, not the last word.

How it differs from a GPU

A GPU gives you massive SIMD throughput and hides memory stalls with parallelism. That model suffers on irregular code paths with poor locality, branchy kernels, or pointer chasing. Maverick-2 leans on a dataflow fabric and a runtime that rearranges blocks around the work. Think of it as exposing more of the schedule to hardware that can shape itself in flight. That is why graph analytics and sparse math show the biggest wins in the vendor slides.

Software stack and portability

The bold claim is that you bring unmodified code and see gains. NextSilicon says C, C++, Fortran, OpenMP, and Kokkos are supported now, with CUDA and HIP integrations coming next. There are also references to running common AI frameworks. If that holds in practice, it reduces the pain of adoption for HPC codes that were never written with a GPU in mind. The devil is always in the runtime and the compiler. That deserves third-party testing.

RISC-V test chip called Arbel

Alongside the accelerator, NextSilicon announced Arbel, a 5 nm RISC-V test chip. The company positions it as an enterprise-grade core that can stand with Intel Lion Cove and AMD Zen 5. That is a strong statement for a test vehicle. The read here is strategic. NextSilicon wants an in-house general purpose core for tighter coupling and control over the host side. It also gives the company an option to license IP or build more of the stack over time.

What I like

Targeting the right pain points. Graph analytics, sparse linear algebra, and random update patterns are where GPUs look least brilliant. If Maverick-2 really lifts those without code surgery, that is useful.
Form factor sanity. PCIe and OAM align with how data centers integrate accelerators. Memory at 96 GB or 192 GB of HBM3e is a practical choice for graph sizes that blow past on-board caches.
Signals of real deployments. There are references to Sandia’s Vanguard-II and dozens of sites. If those customers vouch for out-of-the-box gains, confidence grows.

What we need to see

Independent results. PageRank and HPCG should be easy for labs to replicate. Put head-to-head runs next to H200, B200, and Gaudi-class parts with power logs and wall-clock times.
Software friction. Show a real code path, like an existing CFD preconditioner or a production graph traversal, that moves over without months of care. Publish the steps and the patches, if any.
The power story. Clarify the 300 W versus 400 W PCIe figure and the 600 W versus 750 W OAM figure. Buyers plan racks around nameplate numbers, not marketing ranges.

Context versus Nvidia and Intel

Nvidia still owns dense GEMM and transformer training with tuned libraries. Intel’s Gaudi line competes on cost and bandwidth and has seen traction on inference clusters. NextSilicon is not trying to win the same charts. The focus is on the ugly corner cases where SIMD starves and traffic patterns dominate. If the company keeps scope discipline, it can live beside GPUs, not replace them. That hybrid is how most data centers will look anyway.

Bottom line

There is a credible idea here. A dataflow accelerator that adapts around irregular code and gives you speedups without a six-month port is exactly the kind of tool HPC buyers ask for. The claims are bold, and the right next step is replication in independent labs with real codes and power logs. If those results hold, Maverick-2 will earn a slot next to GPUs in mixed racks. If they do not, it will be another interesting slide deck. Either way, it is a space worth testing rather than arguing about on paper.