Nvidia Strategy 2026

The One Line That Nobody Is Talking About

Mar 22, 2026

Let me start with a sentence that almost no one is discussing.

Jensen Huang, speaking to Stratechery's Ben Thompson right after his GTC 2026 keynote, said this: “It’s not just gonna be people banging on our SQL database now, it’s gonna be a whole bunch of agents banging on it. They’re gonna need to do it way faster.”

That is not a hardware announcement. That is a product warning.

Here is why I think that.

Every piece of software you have ever used was designed with one assumption baked in at the foundation. A human is on the other side.

Not just any human. A slow human. A human who reads, pauses, gets distracted, reopens the tab, waits for the page to load, and tolerates 400 milliseconds of latency without complaint.

Software was optimised for this human. The entire architecture of modern software products, rate limiting, pricing tiers, API design, and query performance budgets were calibrated to serve this specific use case.

That creature is no longer the primary user. And most products have not noticed yet.

Why Tools Break When Agents Show Up

Think about what actually happens when you plug an AI agent into an existing software tool.

The agent does not browse. It does not think about whether to click. It does not go to lunch.

It executes at machine speed, in a loop, with no downtime. A human sales rep might open your CRM 30 times in a workday.

An agent doing the same research task will hit your CRM API 30,000 times in an hour. The tool was never tested for this. The rate limit was never designed for this. The pricing model was never imagined for this.

The interesting thing is that this does not break gradually. It breaks at a threshold. Below it, everything works fine, and nobody notices. Cross it, and your authentication server is overwhelmed, your database connection pool is exhausted, your cost per customer quadruples overnight, and your support queue fills up with confused enterprise customers who say “we did not change anything.”

They did not change anything visible. They shipped an agent. That is the invisible change that breaks everything at once.

The moment when an AI agent becomes the dominant user of a system that was designed for humans, and all the original design assumptions fail simultaneously. It is not a gradual migration. It is a cliff.

The reason it is a cliff and not a slope comes down to something simple. Human tools were built on human time. Agents operate on machine time. These are not just different speeds. They are different operating philosophies.

When you build for human time, you tolerate latency because the human tolerates it. You accept occasional errors because the human catches them. You design for sessions because humans use things in sessions. You price per seat because humans use things individually.

None of that works in machine time. Latency compounds in a chained agent pipeline. A 10-step workflow where each step waits half a second end-to-end takes 5 seconds. Run 1,000 instances of that workflow for an enterprise customer and you are delivering 1,000 simultaneous calls that each need a sub-500ms response. The tool was tested for 50 concurrent users, not 1,000 parallel agents.

This is not a scalability problem in the traditional sense. You cannot just throw more servers at it. The fundamental design logic of the product is wrong for the new user.

The CPU Comeback Explains Everything

Here is something that most people found confusing at GTC. Jensen Huang, the man who spent years arguing that CPUs are bottlenecks and everything should be GPU-accelerated, is now selling CPUs. He announced the Vera CPU architecture, specifically designed for AI agent workloads. Why?

The answer reveals something deep about how agents actually work.

An agent does not run on a GPU the whole time. An agent reasons, then calls a tool, then waits for the result, then reasons again.

The reasoning part wants a GPU. The orchestration, the sequential decision making, the “what should I do next” part, runs on a CPU. One step at a time. Single-threaded. You cannot parallelise “what should I do next” because each decision depends on the previous one.

So here is what happens without a fast CPU. The agent reasons on the GPU, calls a tool, and then has to coordinate the response through a slow CPU before the next GPU operation starts.

The GPU sits idle.

You have bought millions of dollars of inference hardware that is waiting for a single-threaded CPU bottleneck.

Huang puts it directly: if the CPU is not fast, it holds back the GPU. The most expensive accelerators in the world are bottlenecked by the cheapest component in the chain.

The data centre was designed for a workload profile that no longer exists. Now it needs to be rebuilt around the actual agent workflow, which is not “run fast in parallel always” but “reason sequentially, then execute in parallel, then reason again.”

The product lesson here is not about CPUs. The lesson is that agent workloads expose hidden dependencies throughout the stack. Components that looked fine under the old workload become the binding constraint under the new one. This is true for hardware, and it is equally true for your product’s architecture.

The Inference Pricing Split That Nobody Has Built Yet

The Groq acquisition is more interesting than it looks.

Groq’s hardware is built for one thing: generating tokens extremely fast with very low latency. It trades raw throughput for speed. More tokens per second per user, fewer users served simultaneously. This is the opposite of what most inference providers optimise for.

Huang’s explanation of why Nvidia bought them is worth sitting with. He says there are two fundamentally different products in inference. One is throughput. The other is intelligence per token, and the willingness to pay for speed.

Let me translate that into product terms.

If your AI product is running document summaries overnight, processing invoices in batch, or generating reports on a schedule, you want cheap tokens. You are not time-sensitive. You will happily queue your requests. Cost per token is the only metric that matters. This is the utility pricing model.

If your AI product is powering a coding agent that a senior engineer is actively waiting on, the economics flip completely. That engineer costs the company somewhere between INR 3000 and INR 4000 per hour of productive time.

If your AI responds in 2 seconds instead of 8 seconds, you are not just providing a better experience. You are directly multiplying the productivity of the most expensive person in the building. At that point, cost per token is irrelevant. You will pay a significant premium for speed, because the alternative is wasting an engineer’s time.

These are not two tiers of the same product. These are two different products that happen to use the same underlying model.

Most AI companies today sell one product. Maybe they have a “basic” and “pro” tier differentiated by model quality or rate limits. Almost none of them have structured pricing around the latency-throughput tradeoff in a way that captures the actual economic value being delivered.

This is a large gap. The teams that close it first will find that their best customers are willing to pay multiples of what they are paying today for the fast tier, while their cost structure for the slow tier can be dramatically reduced through batching and scheduling.

Groq gives Nvidia the architecture to serve both ends of this spectrum simultaneously. The strategic insight is not that Groq is better hardware. It is that Nvidia is acknowledging that inference is not one product and building a system to serve both.

The Five Layer Problem: Where Your Moat Actually Lives?

There is a framing that Huang uses, which I want to push back on slightly, because I think the standard interpretation leads PMs in the wrong direction.

He describes AI as a five-layer stack: power, chips, infrastructure, models, and applications. The usual takeaway from this framing is “figure out which layer you are in and defend your position in it.” That is fine as far as it goes. But it misses the more important question.

Where does your customer’s willingness to pay land, and who captures it?

Consider what is actually happening in AI applications right now. An enterprise pays 50 dollars per user per month for an AI writing tool. The tool spends 30 dollars of that on inference costs, 10 on engineering and hosting, and makes 10 dollars of margin. Now the model provider raises prices because the new model is smarter. Inference cost goes to 40 dollars. Margin goes to zero. The application layer captured the customer, but the model layer captured the economics.

This is not a hypothetical. This is the P&L reality for a large number of AI startups right now.

The moat in the application layer is not the AI. I want to say that again because it is not obvious enough.

The moat in the application layer is not the AI. The AI is available to everyone. The moat is everything that makes your AI better than anyone else’s in a specific context: proprietary training data, deep workflow integration, switching costs built from user behaviour over time, regulatory or compliance advantages, and domain expertise baked into your prompting and evaluation infrastructure.

Strip the AI out of your product and ask what is left. If the answer is “not much,” you do not have a moat. You have a feature built on someone else’s moat.

The model layer is also not as safe as it looks from the outside. The frontier models compete on capability. The efficient models compete on cost. Everything in the middle is getting squeezed from both directions.

A mid-tier model that costs 3x a commodity open source model and is 1.2x smarter is not a sustainable business unless that 1.2x maps directly to a specific high-value task that customers cannot accomplish with the cheaper option.

The infrastructure layer has the most durable economics, and also the highest capital requirements to enter.

The companies winning here are not winning on features. They are winning on integration depth. The harder it is to extract your data from their system, the more your workflows depend on their primitives, the stickier the business. This is not glamorous. It is the most important thing.

What You Should Actually Do

If you are a PM reading this, here is where first principles land for each situation.

If you are building a tool with an API, audit it right now for Tool Inversion risk. Not in theory. Actually simulate what happens when an AI agent hits your API at 100 times your current peak load. Your rate limits, your database connection pooling, your authentication flow, your pricing model, all of it was calibrated for humans. Most of it will not survive an enterprise customer deploying a serious agent workflow. Find the cliff before your customers fall off it.
If you are pricing an AI product, separate your latency-sensitive and throughput-sensitive use cases. They have different economics and different willingness to pay. If you are charging a flat per-seat fee for a product that some customers use for async batch work and others use for real-time agent workflows, you are simultaneously undercharging your highest-value users and overcharging your lowest. That is not a pricing model. That is a cross-subsidy that your competitors will eventually arbitrage.
If you are defining your moat, be honest. Not to your investors. To yourself. Which layer does your advantage actually sit in? Domain data and workflow lock-in are durable. Model quality is not. UI differentiation is not. Feature parity is obviously not. If your honest answer is “our moat is that we were early,” that is not a moat. That is a head start, and head starts expire.

The SQL database was not broken. It worked perfectly for everything it was designed for.

The problem was that it was designed for the wrong future.

If you enjoy reading this article, you will absolutely love our AI Tech and Strategy Book

For Full Detailed cases Studies and AI & Strategy — Download this Book ( 5/5 Rated )

Download the Book

You can also check out our Highest rated AI PM course · 4.9/5 · 500+ enrollments → See testimonials and course details 60% OFF for a limited time — Code: NYE26

About Author

Shailesh Sharma! I help PMs and business leaders excel in Product, Strategy, and AI using First Principles Thinking. For more, check out my AI Product Management Course, PM Interview Mastery Course, Cracking Strategy, and other Resources

A guest post by

Apoorva Mittal

Product Manager

Technomanagers

Discussion about this post

Ready for more?