NVIDIA is winning now, but is it already peaking?

Nvidia is on top of the AI world right now. It owns the accelerator market, prints margins that would make a drug cartel blush, and has everyone from startups to the biggest cloud providers begging for more silicon. It is also, in my view, already at its peak. Not because the technology suddenly becomes bad, but because physics, power, and economics do not care about one company’s run. The rest of the industry has finally taken Nvidia seriously. The next few years will be about eroding that lead from every angle.

How Nvidia won the current cycle

First, it is worth being honest about why Nvidia is winning right now. It is not a fluke. Nvidia read the room earlier than almost anyone. It built CUDA when everyone else was still treating GPUs as dumb frame pushers. It invested in libraries, frameworks, and devrel when competitors barely had a working compiler. It designed data center GPUs with serious matrix hardware and put them into systems that actually shipped at scale. The result is a portfolio that looks coherent to buyers who have to ship real products rather than presentations.

H100 and its cousins are not perfect, but they hit the timing of the generative AI boom essentially dead on. They offer massive compute throughput, high bandwidth HBM, and a software stack that everyone from a graduate student to a trillion-dollar cloud provider already knows how to use. Nvidia then did what every smart vendor would do in the same position. It is priced to the pain. When demand is infinite and supply is constrained, you do not discount. You capture as much of that curve as your customers will tolerate.

Along the way, CUDA became the gravitational field of accelerated computing. Frameworks assumed CUDA first. Tooling assumed CUDA first. When people say “GPU” in the context of training and inference clusters today, they usually mean “Nvidia accelerator with CUDA underneath.” That is power. Real, practical power. It is also a trap for anyone who believes that current conditions are permanent.

The physics and power story has a ceiling.

Every generation, high-performance computing has a phase where you can buy your way forward with more power, more cooling, and more silicon. Eventually, physics gets an opinion. Nvidia’s data center parts are already deep into the multi-kilowatt per server world and multi-hundred-kilowatt per rack. The B200 and its successors are not magically going to slide back to modest power levels. They will push up against what facilities can reasonably deliver and what operators can afford to cool.

That matters because hyperscalers are not just looking at how fast a model trains. They are looking at the bill for megawatts, for outside plant, for liquid distribution, for replacement pumps, and for the engineers who have to keep all of that running. Energy per token matters. Cooling cost per token matters. Rack density matters. HBM capacity per package matters. Once those numbers become the constraints, performance per watt and performance per rack matter more than peak FLOPs per chip.

Nvidia can keep improving its architectures, but it does not escape the basic trade-offs. Higher performance usually means more power and more current density. More HBM means more complex packages, more stress, and more expensive substrates. Interconnects get more demanding as speeds rise. You cannot run away from thermals or electromigration with branding. Everyone in the game now understands this. That is why Nvidia’s peak looks like a wide plateau, not a cliff, and why others finally have space to catch up.

CUDA is a moat, but the water level is changing

CUDA has been Nvidia’s unfair advantage for more than a decade. It is a programming model, a set of libraries, and a huge collection of tuned kernels that let engineers get work done without thinking about every vector unit manually. You can argue that CUDA lock-in is bad for the market, but you cannot argue that it has not been effective. Most companies could not afford to fight that head-on in the early years.

That story is changing slowly, and it will keep changing. Frameworks are drifting toward higher-level abstractions where hardware back ends can be swapped more easily. Compilers like Triton and graph optimisers are increasingly able to target multiple architectures. Vendors who are not Nvidia are funding the work needed to port kernels to their stacks, because they finally have no choice. The big cloud providers are also investing in tools that treat accelerators as back ends rather than as single vendor worlds.

None of this kills CUDA overnight. It does weaken the total lock. Once more, code paths are tested and maintained on non-NVIDIA hardware, the moat becomes shallower. If you are a cloud provider with your own silicon and a room full of Nvidia gear, your incentive is to move as much steady state load as possible to your own chips and keep Nvidia gear for the high mix, high variance jobs where developer familiarity still matters most. That is how erosion starts. Quietly, job by job.

Hyperscalers cannot afford to be captive forever.

Right now, with the big buyers of accelerators being mostly hyperscalers and venture-funded AI labs. They are happy to pay Nvidia to get a first-mover advantage. They are less happy about being dependent on a single supplier long-term. That is why every serious cloud has one or more internal accelerator projects. Google has TPUs. Amazon has Trainium and Inferentia. Microsoft is working with multiple partners and playing with its own silicon ideas. Meta is not sitting still either.

None of those projects needs to match Nvidia on every metric. They just need to be “good enough” for a large fraction of the workloads each cloud cares about. The bar is often lower than people think. If you can run inference at similar quality with slightly worse latency but significantly lower total cost of ownership, you move that workload. If you can move training for specific models onto your own hardware with a predictable schedule and known internal cost, you do that too. Every workload that moves off Nvidia reduces the leverage Nvidia has for pricing and allocation.

This is where the idea of peak starts to take shape. Nvidia is winning now because it is the default for almost everything. Over time, it becomes the default for fewer things. That does not mean Nvidia collapses. It means the growth curve flattens, and the ability to push margins as hard as it has in the last few years becomes weaker.

AMD is finally using its strengths properly

AMD spent a decade cleaning up its own mess and building Zen into a CPU core that does not just compete but often leads on perf per watt. While all of that was happening, Nvidia quietly ate the accelerator market. AMD’s datacenter GPUs spent years living behind driver issues, soft stacks, and d general lack of focus. That situation is changing, slowly, and it matters more than it did even two years ago.

Recent MI series accelerators show that AMD can pack serious HBM bandwidth and compute into packages that do not look like science projects, and that it can ship them in real systems rather than just dev kits. The software stack still has catching up to do, but large customers are now invested in making that happen because the alternative is paying Nvidia for absolutely everything. Once again, “good enough” is powerful. If MI parts can take chunks of training or inference spend in specific verticals, Nvidia’s peak looks flatter.

There is also the simple fact that AMD has become very good at balancing power, thermals, and yield in a way that scales across many SKUs. Zen has shown that AMD can deliver consistent efficiency at a time when everyone cares about electricity and cooling. If the same discipline keeps bleeding into AMD’s accelerator designs, buyers will notice. They will not swap wholesale overnight, but they will keep a second option alive on purpose.

Intel is behind, but not out.

Intel’s attempts at datacenter GPUs have not exactly set the world on fire, but the company understands packaging, HBM, and power delivery at a level that should not be dismissed. It is already shipping Gaudi accelerators that some customers quietly use for specific workloads, particularly where pricing from Nvidia has been painful. The advantage Intel has is a long history of working with the same buyers on Xeon. The disadvantage is obvious. It waited too long to take this market seriously.

In 2026 and beyond, Intel’s accelerator play will likely lean on its packaging story, its ability to bring 18A and advanced HBM together, and its desire to be a foundry for other people’s designs. It does not have to win the whole market for Nvidia to be past its peak. It just has to be credible enough that large buyers can threaten to move workloads if Nvidia is too aggressive. That bargaining power is what erodes a monopoly position slowly.

Custom silicon is a permanent feature now

The biggest threat to Nvidia’s long-term dominance is not another GPU vendor. It is the quiet spread of custom accelerators designed by the same companies that currently write the biggest cheques to Nvidia. TPUs at Google are a clear example. They have gone from quirky side project to serious training and inference engines that carry a significant slice of Google’s own workloads. You do not see TPUs marketed for everyone, because they are not meant for everyone. They exist to serve Google’s own software stack and services.

Every hyperscaler is walking the same path in its own way. They will still buy Nvidia for some workloads, especially at the bleeding edge, where you need a mature, flexible stack. They will also keep trying to move repeatable, predictable jobs to their own hardware. That keeps more value inside the company and reduces reliance on any one supplier. Once those custom chips have been paid for, their owners are very motivated to keep them fed.

This does not replace Nvidia. It does box Nvidia in. The more silicon that buyers build for themselves, the narrower the space Nvidia owns. That space can still be enormous. It will not be infinite.

The economics of tokens, not FLOPs.

However, once your favourite model pipeline is built, the economics usually boil down to cost per token or cost per query at a given latency and quality. That is the metric that matters to businesses. It does not care which logo sits on the heatsink. Nvidia has been able to win on that front because its accelerators are fast, well understood, and available at scale to at least some customers. As alternatives mature, that story changes.

If a combination of on-hardware accelerators, AMD cards, and slightly older Nvidia gear can deliver similar cost per token at acceptable latency for a given workload, buyers will start spreading their budgets around. They will reserve the latest Nvidia parts for workloads that are either too big, too sensitive, or too awkward to move yet. That is not hypothetical. It is the natural outcome of anyone who takes budgeting seriously, and hyperscalers are very good at spreadsheets.

Once the market stops being “Nvidia or nothing,” Nvidia’s ability to charge whatever it likes weakens. It will still charge plenty, but the slope of the line changes. At some point, you have already extracted most of the available economic surplus from your customers. After that, you are just arguing about how the growth curve flattens.

Packaging and HBM are not exclusive anymore

One of Nvidia’s big strengths in this cycle has been its ability to put lots of compute right next to lots of HBM in packages that do not immediately fall apart. That is not trivial work. It involves large interposers or bridges, careful HBM stacking, and power delivery that keeps everything inside spec. For a while, Nvidia looked almost alone at the top of that hill. That period is ending.

AMD, Intel, and the memory vendors themselves are all more capable now. HBM3E and future HBM generations are not Nvidia only. Advanced packaging techniques such as hybrid bonding, silicon interposers, and high-density organic bridges are being invested in across the industry. The learning curve is still steep, but Nvidia does not get to bank on packaging as a permanent moat. It gets to bank on having a head start. The difference between those two positions is the difference between a monopoly and a strong competitor.

Interconnects and networking are another front.

Nvidia has done good work with NVLink and its networking stack, especially since the Mellanox acquisition. It understands that fast accelerators are pointless if the fabric is slow. That is another area where it is ahead today. It will not be alone forever, either. Ethernet-based fabrics are evolving, InfiniBand continues to move forward, and other vendors are building their own coherent interconnects.

This matters because large AI jobs care about the entire path from storage through network to accelerator and back again. If a non-NVIDIA stack can offer slightly slower raw per-chip performance but a solid networking story at a lower cost, that becomes attractive. As with everything else, “good enough” is powerful. Buyers will take a five or ten percent hit on raw training time if they save much more than that on the total cost of ownership and avoid single vendor lock-in.

Regulators are not asleep forever.

There is also a non-technical angle. When one company becomes synonymous with an entire market, regulators start to pay attention. Nvidia sits at a point where it influences how easily new players can get into accelerated computing. CUDA’s dominance, its control over key networking products, and its pricing power make it an obvious target for scrutiny.

I am not saying regulators will suddenly break Nvidia up or force a rewrite of CUDA. I am saying that Nvidia has to be careful not to overplay its hand. If buyers complain loudly enough or some markets feel frozen out, pressure builds. That can show up as merger blocks, as conditions on acquisitions, or as quieter nudges that favour standards and portability over proprietary control. None of that helps Nvidia’s peak last longer.

Why does this look like a peak, not a fall?l

It is easy to misread this argument as “Nvidia is doomed.” That is not my point. Nvidia has earned its current position. It will remain a major player in AI hardware for many years. What I am saying is that the combination of physics, power, economics, competition, and buyer behaviour means the period of unchallenged, explosive growth and extreme pricing power will not last forever.

Today, Nvidia sells nearly everything it can make into a market that treats it as the default choice. That is peak behaviour. From here, more capacity comes online, more competitors become credible, more internal chips hit production at hyperscalers, and more software stacks become portable. The result is not a cliff. It is a long, slow flattening where Nvidia still sells a lot of chips but has to work harder to justify pricing and keep mindshare.

What Nvidia can do to avoid wasting its peak

If I were inside Nvidia looking at this from the other side, I would treat the current peak as an opportunity to strengthen the parts of the business that survive even if accelerator margins come down. That means doubling down on software and services that are genuinely valuable across hardware generations. It means building platforms where Nvidia provides more than silicon, such as turnkey systems, orchestration, and managed services that integrate with customer workflows deeply enough that ripping them out has a real cost.

It also means being careful with pricing. There is a point at which pushing customers too hard triggers a coordinated response. Hyperscalers will tolerate high margins for a while if they believe Nvidia is enabling them to capture even more value themselves. They will not tolerate feeling like permanent hostages. Finding that line is not easy, but ignoring it is reckless.

Why this matters to everyone else

If you are a cloud provider, a startup, or even an end user who just wants AI services to be affordable and reliable, Nvidia’s peak matters. It sets the tone for pricing, capacity, and innovation. An Nvidia that has to compete a little harder is good for the rest of the ecosystem. It encourages AMD and Intel to keep investing. It pushes hyperscalers to polish their own silicon rather than treating it as a distraction. It pushes software stacks toward portability rather than monogamy.

If you are a hardware fan, the next few years will be interesting. We will see more diversity in accelerator designs, more experiments with HBM and packaging, and more creative ways of splitting workloads across GPUs, CPUs, NPUs, and custom chips. Some of those experiments will fail loudly. Some will quietly work and chip away at Nvidia’s share.

My take

Nvidia deserves credit for seeing this wave early and building the hardware and software to ride it. It also deserves criticism for how aggressively it has priced and how comfortable it has become as the default. The market is finally pushing back, not through noise on social media, but through silicon roadmaps inside the biggest buyers.

Nvidia is winning now. That is not in dispute. What I am saying is that this looks a lot like the top of the curve. From here, the company either uses its position to build a broader, more resilient business that can live with lower margins and more competition, or it clings too hard to the current model and finds itself fighting both customers and rivals at the same time. Physics will keep tightening the screws on power and thermals. Economics will keep tightening the screws on costs. Hyperscalers will keep tightening the screws on dependency.

The peak is not the end. It is the point where you decide what kind of company you are on the way down from it. Nvidia has the talent and the cash to get that decision right. Whether it does or not will define the next decade of accelerated computing more than any single benchmark ever will.

Be the first to comment

Leave a Reply

Your email address will not be published.


*