The CPU Strikes Back
Why NVIDIA’s Vera CPU Is Powering the Next Wave of Agentic AI - Including in Healthcare
View all my published articles
GPUs were supposed to make the CPU irrelevant for AI inference. Then agents showed up and changed the rules.
For years the narrative was clean and confident: GPUs had won. They crushed the massive parallel matrix operations that power large language models, delivering the throughput that made generative AI practical. CPUs were demoted to supporting roles - data movers, schedulers, and hosts. Inference became a GPU story. The future, we were told, ran on accelerators.
Then AI stopped being content to just generate. It started trying to do things.
Agentic systems - models that reason step by step, maintain state, call tools, execute code in sandboxes, branch on results, and loop until a goal is achieved - exposed something important. GPUs are extraordinary at what they were designed for. They are not built for the sequential, branching, stateful, and often lightly-threaded work that actually drives an agent. That work is landing back on the CPU, and NVIDIA just shipped a processor built from the ground up for exactly this new reality.
The GPU Era’s Blind Spot
In traditional generative AI, the heavy lifting is the forward pass - token prediction. A relatively small amount of CPU work (preprocessing, batching, post-processing) sits alongside it. CPU-to-GPU ratios in many clusters settled around 1:4 to 1:8.
Agentic AI flips the script. An agent doesn’t just answer once. It runs loops: call model, use tool or execute code, observe result, reason, and repeat. Much of that “use tool or execute code” and the orchestration around it happens on the CPU. Sandboxed code execution, API calls, state management, planning logic, and coordination across multiple agents are CPU-bound tasks.
The result? The industry is seeing CPU:GPU ratios move toward 1:1 or even more CPU-heavy in agentic deployments. The CPU has become a first-class bottleneck - and opportunity - in AI factories.
NVIDIA Vera: Built for the New Workload
NVIDIA’s response is the Vera CPU - the company’s first standalone processor explicitly purpose-built for agentic AI and reinforcement learning.
It features 88 custom NVIDIA-designed “Olympus” cores with Spatial Multithreading for predictable performance under load. It delivers up to 1.2 TB/s of memory bandwidth (roughly 3x per-core bandwidth of traditional data center CPUs) at roughly half the power of conventional DDR setups. Early benchmarks show ~50% faster performance and 2x efficiency versus traditional rack-scale CPUs on agentic sandbox and orchestration workloads.
NVIDIA CEO Jensen Huang put it bluntly:
“The CPU is no longer simply supporting the model; it’s driving it.”
Vera isn’t meant to run the big neural net forward passes - that’s still the job of GPUs (now paired tightly with Rubin via high-bandwidth NVLink-C2C). Vera excels at the control plane: running the agent loops, executing tools, managing sandboxes for reinforcement learning, and orchestrating the messy real-world work that turns model outputs into actions.
NVIDIA is already shipping early systems to partners including OpenAI, Anthropic, and SpaceX (the latter testing it for reinforcement learning and agent-based simulations). A full Vera CPU rack can sustain more than 22,500 concurrent CPU environments - exactly the kind of dense, efficient capacity needed for large-scale agent swarms and RL training.
The Hybrid AI Factory
This isn’t CPUs versus GPUs. It’s the recognition that modern AI systems are heterogeneous by nature. The best architectures will use:
GPUs (or future accelerators) for the heavy parallel compute inside the model
Specialized CPUs like Vera for orchestration, tool use, state, and agent control loops
Tight high-bandwidth interconnects so the two don’t become each other’s bottleneck
The Vera Rubin platform embodies this: racks that combine both, designed as a complete AI factory rather than a GPU-only cluster with CPUs as an afterthought.
Implications for Healthcare
Healthcare is one of the most promising - and demanding - domains for agentic AI. Clinical and administrative workflows are inherently multi-step, tool-heavy, and stateful.
An agent might review a patient’s full history, cross-reference guidelines, order labs or imaging, interpret results, coordinate with specialists, generate documentation, and handle follow-up. Administrative agents could manage prior authorizations, coding, billing, and scheduling across disparate legacy systems.
These tasks require exactly what optimized CPUs excel at: reliable tool calling, secure code execution in sandboxes, long-context state management, and predictable orchestration - while the heavy diagnostic reasoning or summarization can still leverage GPUs for the model inference steps.
NVIDIA’s Vera CPU, with its strong single-thread performance, high memory bandwidth, and power efficiency, could enable more agents to run reliably and cost-effectively in hospital data centers or secure hybrid clouds. This matters because healthcare organizations often face strict requirements around data residency, low latency for real-time clinical decisions, comprehensive audit trails, and total cost of ownership.
The shift toward hybrid CPU+GPU architectures may ultimately help health systems deploy sophisticated agentic tools - from clinical decision support to care coordination and revenue cycle management - without requiring massive GPU over-provisioning for every workflow step. It also highlights the need for healthcare IT and AI teams to plan for increased high-performance CPU capacity alongside accelerators as these applications move from pilots into regulated production environments.
What This Means
For infrastructure teams, it means rethinking procurement and ratios. More high-performance CPU capacity will be needed alongside GPUs as agentic workloads grow.
For developers building agents (including in healthcare), it means the performance of tool calling, code execution, and orchestration layers now matters as much as raw model speed.
NVIDIA didn’t go back to CPUs. They built the CPU the age of agents actually needs.
The era of treating the CPU as an afterthought in AI infrastructure is ending. The question is no longer whether GPUs or CPUs will dominate. It’s how intelligently we combine them to build reliable, efficient, and useful agentic systems - especially in high-stakes domains like healthcare.
Sources & Additional Information
NVIDIA Launches Vera CPU, Purpose-Built for Agentic AI
https://nvidianews.nvidia.com/news/nvidia-launches-vera-cpu-purpose-built-for-agentic-aiNVIDIA Vera CPU Delivers High Performance, Bandwidth, and Efficiency for AI Factories (Technical Blog)
https://developer.nvidia.com/blog/nvidia-vera-cpu-delivers-high-performance-bandwidth-and-efficiency-for-ai-factories/
Additional context from NVIDIA GTC 2026 announcements and partner deployments (OpenAI, Anthropic, SpaceX, Oracle Cloud) plus industry analysis on shifting CPU:GPU ratios in agentic AI.
Paul J. Swider is CEO & Chief AI Officer at RealActivity, a Microsoft Partner specializing in mission-critical AI for healthcare systems. He has 30+ years in healthcare technology, has trained over 3,000 engineers across GE, IDX, and Microsoft, and is the founder of BOSHUG, the Boston Healthcare Cloud & AI Community spanning 50+ countries.


