-

GTC 2026 Live: Jensen Huang's Trillion-Dollar Token Vision

2026-03-17

San Jose, California – March 17, 2026 – When NVIDIA founder and CEO Jensen Huang stepped into the spotlight at the SAP Center in his signature leather jacket, he was no longer merely holding up a new chip. Instead, he unveiled a grand blueprint spanning from ground-based data centers to orbital space platforms. The GTC 2026 conference was not just a hardware launch; it was a revaluation of "AI economics." Huang presented a core thesis to the world: Tokens are the new currency, and AI factories are the infrastructure that produces them.

The Trillion-Dollar Bet: Evolution from Chip Company to "Token Factory"

Huang opened his keynote with a startling prediction: by 2027, market demand for Blackwell and Vera Rubin systems will generate at least $1 trillion in revenue, a figure that has doubled from last year's forecast. He pointed out that AI is rapidly transitioning from the "model training era" to the "model inference era"—where previously the focus was on the computing power needed to train models, now, as models are deployed in real-world applications, every interaction and every generated image consumes substantial inference compute.

"Every data center, every factory, by definition, is power-limited. A 1GW factory can never become 2GW; that's the law of physics and atoms." For the first time, Jensen Huang systematically articulated "Token Factory Economics": within a fixed power envelope, the entity with the highest token throughput per watt achieves the lowest production cost. He even outlined five commercial tiers for future AI services, from a free tier to a hyper-speed tier, arguing that a company's competitiveness will directly depend on the cost and efficiency of producing these intelligence tokens.

The Vera Rubin Universe: A Seven-Chip Coordinated Compute Powerhouse

The most significant announcement at this GTC was undoubtedly the next-generation AI platform, "Vera Rubin." This is no longer a simple GPU but a massive system comprising seven distinct chips, covering compute, networking, and storage, designed to function as a single, coordinated supercomputer.

The spotlight was on the newly self-developed Vera CPU. This data center processor, featuring 88 custom "Olympus" cores, is the world's first CPU purpose-built for agentic AI and reinforcement learning. It utilizes LPDDR5X memory, boasting 1.2TB/s of bandwidth and doubling energy efficiency. Huang emphasized its "unmatched single-thread performance and efficiency." The Vera CPU connects to the Rubin GPU via NVLink-C2C interconnect, providing a massive 1.8TB/s of coherent bandwidth, serving as the core that drives AI "thinking."

The Rubin GPU is equally breathtaking: manufactured using TSMC's 3nm process, it integrates 336 billion transistors, features 288GB of HBM4 memory, and achieves memory bandwidth of 22TB/s. Its inference compute, measured at FP4 precision, hits an astounding 50 PFLOPS, five times that of Blackwell; its training compute stands at 35 PFLOPS, 3.5 times beyond Blackwell. The complete Vera Rubin NVL72 rack is equipped with 260TB/s of NVLink 6 bandwidth—according to NVIDIA, this already exceeds the total bandwidth of the entire internet.

Chip/Platform	Core Specifications	Performance Gain
Vera CPU	88-core ARM, LPDDR5X memory, 1.2TB/s bandwidth	2x efficiency vs. traditional CPUs
Rubin GPU	3nm process, 336B transistors, 288GB HBM4	FP4 inference at 50 PFLOPS, 5x Blackwell
Vera Rubin NVL72	72 Rubin GPUs + 36 Vera CPUs	350x increase in token generation rate

Decoupled Inference: The 35x Speed Leap from Groq LPU

To solve the bandwidth bottleneck under extreme inference conditions, NVIDIA unveiled the final solution integrating technology from the acquired company Groq: asymmetric, decoupled inference.

Jensen Huang explained that using the Dynamo software system, they decompose the inference process: the "prefill" stage, which requires massive computation and memory, is handled by Vera Rubin, while the "decode" stage, which is extremely latency-sensitive, is handled by Groq's LPU (Language Processing Unit). The Groq chip features 500MB of SRAM, allowing it to read model weights with extremely low latency, thus overcoming the "memory wall" problem that causes slow token generation. This combination achieves a staggering 350x leap in token generation speed within a gigawatt-scale factory, compared to Moore's Law which would only deliver about a 1.5x improvement over the same period.

Huang even offered enterprise configuration advice: "If your work is primarily high-throughput, use 100% Vera Rubin; if you have a high volume of high-value, programming-level token generation needs, allocate 25% of your data center scale to Groq."

Rubin Ultra and Feynman: The Future is Coming

During the keynote, Huang also showcased a longer-term roadmap. Rubin Ultra will feature a new "Kyber" rack architecture, vertically integrating 144 GPUs and pushing FP4 inference compute to 15 ExaFLOPS, with mass production and delivery expected in the second half of 2027.

The next-generation Feynman architecture was also unveiled: it will utilize TSMC's 1.6nm process, feature a new CPU codenamed "Rosa," and introduce customized HBM technology and co-packaged optics solutions. Huang stated that the Feynman era marks NVIDIA's deep coupling of compute, memory, and packaging, evolving the data center into a highly integrated "mega-supercomputer."

OpenClaw: The Operating System for the Agentic Era, Ending SaaS

The most profound software impact of the entire keynote came from Jensen Huang's commentary on the open-source project OpenClaw. He described OpenClaw as "the most popular open-source project in human history," noting that it surpassed the achievements Linux made over 30 years in just a few weeks.

"Technically, OpenClaw can be understood as an operating system for agentic computers," Huang said. It can connect large language models, manage computing resources, access file systems, and decompose complex problems for sub-agents to solve collaboratively.

Huang made a striking assertion: "All SaaS companies will disappear." In the agentic era, traditional enterprise software will shift towards service platforms centered on intelligent agents, i.e., AaaS (Agent as a Service). He warned that every company in the world today needs to formulate its own OpenClaw strategy immediately.

To this end, NVIDIA launched the enterprise-grade security solution NemoClaw, providing network guardrails and privacy routers through the OpenShell security layer to ensure agents operate securely within corporate networks.

Physical AI: From Autonomous Driving to Disney's Olaf

AI is stepping out of screens and beginning to interact with the physical world. Progress in autonomous driving and robotics was particularly noteworthy at this GTC.

BYD, Geely, and Hyundai Motor announced they are joining NVIDIA's "robotaxi ready" platform, indicating these brands will adopt NVIDIA's full-stack solutions to develop autonomous robotaxis. Uber also plans to deploy autonomous vehicle fleets based on NVIDIA's Drive AV software starting next year.

A highlight of the keynote was the stunning appearance of the Disney robot "Olaf." This robot, developed collaboratively by NVIDIA and Disney, could walk, wave, and engage in simple conversational interactions with Jensen Huang, demonstrating its flexible interactive capabilities in the physical world. Behind this is NVIDIA's long-term investment in robotics: from the Isaac Sim simulation platform used for training to the Jetson Thor compute module deployed on the robots themselves.

Compute in Space: AI's Reach Extends to the Cosmos

If AI factories on the ground were impressive enough, NVIDIA's release of the Space-1 Vera Rubin module extends AI's territory into space.

This is an AI compute module designed for the extreme environment of space, featuring radiation-hardened capabilities, and can be deployed on satellites or space stations. Future satellites will no longer be merely "signal relay stations"; they can become "intelligent nodes" operating in orbit, processing captured images and analyzing sensor data in real-time without needing to transmit massive amounts of raw data back to Earth. Huang called it the first step in "building a complete compute architecture from space to the ground."

Market Reaction and Future Challenges

Boosted by Jensen Huang's optimistic guidance, NVIDIA's stock price rose over 4.3% during trading. Goldman Sachs released a report stating that this clear long-term revenue visibility significantly exceeded Wall Street expectations, directly alleviating investor concerns that AI capital expenditures might peak in 2026.

However, some analysts pointed out that market expectations were already high, and the conference provided limited incremental information about NVIDIA itself. Future improvements in NVIDIA's price-to-earnings ratio, besides requiring faster AI application adoption, will also need new growth trajectories like "Physical AI" and "Space Computing" to drive it.

Computing Power Democratization: Embracing the Infrastructure Dividends of the Token Economy Era

As the "Token Factory" era depicted by Jensen Huang accelerates, enterprises face increasingly severe challenges: Vera Rubin superchip clusters costing tens of millions are not easily affordable for all developers or small-to-medium-sized businesses. Beyond this hardware arms race led by major corporations, how to access this torrent of computing power more flexibly and cost-effectively becomes the key determinant of whether countless AI innovators can keep pace with "Token Economics."

As demonstrated at GTC, AI application scenarios are becoming unprecedentedly diverse—from complex agentic inference and physical AI simulation to traditional graphics rendering and scientific computing—each task imposing differentiated demands on compute types. In this context, professional computing power rental platforms like Omniyq (www.omniyq.com) are becoming bridges connecting cutting-edge hardware with broad applications. This platform aggregates resources ranging from cost-effective inference solutions like the RTX 4090 to cluster infrastructure designed for large-scale training, such as A100, H100, H200, B200, B300. Leveraging its rich pool of GPU computing resources and highly cost-effective, elastic computing power rental services, it offers users significant freedom of choice. By effectively lowering the barrier for developers to experiment with the latest AI technologies and build large-scale inference applications, such platforms enable more enterprises to embrace Jensen Huang's "trillion-dollar Token Economy" with light assets, focusing on creating their own value in the AI factory era without being burdened by heavy hardware investments.

Declaration: This article is originally created by Shenzhen Cloud Engine - a cost-effective AI computing power service platform. For reprint, please indicate the source link: http://m.omniyq.com/en/sys-nd/431.html