By 2026, the rapid iteration of AI is pushing the world into an unprecedented compute supply crisis. The comfort zone of falling cloud service prices has completely vanished, replaced by intensifying supply-demand imbalances, delivery delays caused by geopolitical conflicts and infrastructure bottlenecks, and a full-blown competition in compute power as a new dimension of global tech rivalry. Amid this turbulent landscape, China’s domestic compute industry, leveraging a full‑industry‑chain layout and unique power cost advantages, is rapidly forging an independent path – from chip substitution to system architecture innovation, and then to global export of compute services. This report starts with the deep causes of the global compute shortage, analyzes the breakthroughs and technological turning points of China’s domestic compute, discusses the pricing logic and profitability prospects of the compute rental market, and deeply examines the strategic significance, impact, and cost advantages of global compute‑as‑a‑service export. The goal is to present a multi‑dimensional, panoramic view of the new paradigm in the compute industry.
In the first quarter of 2026, global AI compute rental prices experienced a landmark surge: the spot rental price of NVIDIA’s flagship Blackwell chip soared 48% in just two months, from 4.08 per GPU hour; meanwhile, the one‑year contract rental price of the previous generation workhorse H100 rose 40% since October 2025, reaching $2.35 per GPU hour. This price surge is no short‑term fluctuation, but an inevitable explosion of intertwined supply‑demand tensions.
First, the AI industry is undergoing a critical structural shift from “training‑first” to “inference deployment.” After New Year’s Day 2026, token consumption of leading large models spiked sharply, driven by AI agents, multimodal interactions, and collaborative agent applications, making inference compute the new growth engine surpassing training. ByteDance’s Doubao large model doubled its average daily token volume within three months, exceeding 120 trillion by March 2026. Meanwhile, OpenRouter data shows that in February 2026, Chinese vendors’ token calls surpassed those of US vendors for the first time, with cluster‑style volume growth, reaching 7.359 trillion in March – a significant share of global top model calls. Token consumption leaped from hundreds per conversation to millions or tens of millions, turning compute from a one‑time expense into a continuous operational cost, greatly increasing reliance on cloud compute rentals.
The supply side is even more severe. On chip manufacturing, ASML, the world’s only EUV lithography supplier, ships only about 70 units per year. Building 1 GW of AI compute requires about 3.5 EUV lithography machines, causing tight capacity in logic wafers, HBM memory, and advanced process nodes. NVIDIA and others face delivery lead times of 6‑7 months on average. Even though major cloud capital providers are increasing Capex – the top four US cloud vendors are expected to exceed $700 billion in 2026 – converting this into effective compute is bottlenecked by multiple factors: top customers that have received cards rarely release idle resources, spot market liquidity dries up; one‑third to half of US data centers planned for 2026 face delays or cancellations, not because of chip prices but due to severe shortages in power infrastructure – lead times for large transformers and switchgear reach several years and rely heavily on imports. JPMorgan estimates the global data center capacity gap was 2 GW in 2024, 3 GW in 2025, and will reach 7 GW in 2026.
Overall, the global AI compute supply‑demand gap was about 11.6% in 2024, sharply widened to 22% in 2025, and will narrow slightly to 18.2% in 2026, but the absolute gap (200 EFLOPS) remains larger than in 2025, and tight conditions will persist throughout the forecast period. The core logic – that compute rental prices will keep rising for at least two years – lies in the transition from short‑term disturbances to a structural norm of supply‑demand mismatch.
Against the backdrop of an extremely tight global compute supply chain, China’s domestic compute industry has entered a historic “counterattack window.”
According to Bernstein Research and multiple authoritative sources, NVIDIA’s share of China’s AI chip market has plummeted from 95% three years ago to just 8%, while domestic AI accelerator cards have surpassed a 60% market share, with localization reaching over 60% for the first time. IDC data shows that total AI accelerator card shipments in China in 2025 were about 4 million units, of which domestic vendors delivered 1.65 million, accounting for 41%. Huawei Ascend alone shipped about 810,000 units, with a market share of about 20%, followed by Cambricon, Hygon, Alibaba’s T‑Head, and others.
Performance breakthroughs by domestic vendors are especially critical. In March 2026, Huawei officially launched the Atlas 350 AI training/inference accelerator card equipped with the new Ascend 950PR, whose inference performance is three times that of NVIDIA H20, completely breaking the previous narrative of “insufficient performance” of domestic chips. Cambricon’s Q1 2026 revenue reached RMB 2.885 billion, up 159.56% year‑on‑year, with net profit attributable to parent of RMB 1.013 billion, up 185.04%; its Siyuan 370 chip, at the same compute power, costs only one‑third of NVIDIA’s A10. Hygon’s “Shensuan No.2” achieves 80% of NVIDIA A100’s training efficiency, and its Chiplet architecture plus CUDA‑like ecosystem is accelerating enterprise migration. Multiple domestic chips achieved “Day‑0” simultaneous adaptation upon the release of DeepSeek V4, marking the rapid formation of a collaborative ecosystem in China’s domestic compute industry chain – chip vendors no longer need lengthy debugging cycles.
It is worth emphasizing that this breakthrough is not an isolated technology catch‑up, but the result of full‑stack system innovation. The Huawei Ascend 950PR comes with enhanced CUDA software ecosystem compatibility, allowing developers to smoothly migrate models previously based on NVIDIA’s CANN. Major internet giants like ByteDance and Alibaba are planning batch orders – the standard edition is priced at about RMB 50,000 per chip, and the high‑end HBM version at about RMB 70,000. This marks the transition of domestic chips from “usable” to “good‑to‑use,” completing a qualitative shift from hardware autonomy to ecosystem synergy.
By the end of June 2025, China had 10.85 million standard racks in active data centers, intelligent computing scale reaching 788 EFLOPS, deployment of 400G high‑speed ports rising to 14,060, and average PUE optimized to 1.42. The “East‑West Computing Transfer” project has formed 8 hub nodes and 10 data center clusters, covering 14 provinces across eastern, central, and western regions, driving over RMB 1 trillion in social investment, and providing about 80% of the nation’s intelligent computing power.
Under the explicit guidance of the 15th Five‑Year Plan, “moderately advanced construction of new infrastructure” has become a national top‑level design. Domestic compute clusters are no longer relying solely on single‑card performance comparisons, but moving toward system architecture innovations such as “cable‑less supernodes” – through Scale‑Up interconnection architecture, vendors like Huawei are shifting from single‑card catch‑up to rack‑level solution competition, directly benchmarking against NVIDIA’s GB200 supernode. The Beijing Compute Interconnection and Interoperability Platform aggregates intelligent computing resources from Tianjin, Hebei, Inner Mongolia, and other regions, totaling over 60 EFLOPS, enabling “cross‑region, cross‑entity, cross‑architecture” unified scheduling.
The monthly fee for an 8‑card H100 bare‑metal server stabilized around RMB 75,000 in the first half of 2026. Leading domestic cloud providers – Alibaba Cloud, Baidu AI Cloud, Tencent Cloud – have successively raised AI compute product prices in the first half of 2026, with increases ranging from 5% to over 400%. Looking at the global pricing system, the fair monthly rental for a three‑year locked contract for an NVIDIA H100 server is RMB 65,000‑70,000, with the scarcity of same‑generation compute power providing solid support for this price center.
More significantly, the business model of compute rentals is upgrading. The traditional bare‑compute rental model is rapidly shifting toward Model‑as‑a‑Service (MaaS) or token revenue sharing – i.e., from “selling compute” to “selling tokens.” This upgrade is expected to greatly improve the profitability and valuation of compute rental companies, driving their valuation from P/E to P/S multiples. By 2026, China’s compute rental market size is expected to reach RMB 260 billion, with intelligent computing demand growing at 43% annually, of which the internet industry accounts for 62%, government departments 14%, and finance and healthcare 6% and 5% respectively.
Currently, China’s total demand for intelligent computing has reached 4423 EFLOPS, while effective supply is only 1590 EFLOPS, a gap exceeding 2800 EFLOPS. NVIDIA has locked in over $1 trillion in orders through 2027, GPU delivery lead times are generally extended to 6‑7 months, top vendors release few idle resources due to hoarding, and shortages of core components like HBM memory and CPUs reach 30%‑40%.
Against this backdrop, leading cloud vendors such as Alibaba Cloud and Tencent Cloud have adopted highly differentiated pricing strategies for different customer segments: for medium‑sized users with annual spending under RMB 10 million who are unwilling to build their own GPU clusters, they push price increases in exchange for compute resource priority; for new users, the discount is tightened from 40% of list price last year to 50%; while retaining prices for core large customers with high annual spending. The Ministry of Industry and Information Technology has guided the industry away from irrational price wars toward rational pricing, further strengthening seller pricing power.
This seller’s market is attracting a flood of capital. Xiechuang Data has applied for RMB 50 billion in credit lines and is seeking a Hong Kong listing, while Hongjing Technology is seeking RMB 60 billion in credit to build intelligent computing clusters – the industry is entering a phase of capital‑intensive expansion.
Notably, while major vendors are raising prices, high‑value compute rental platforms such as ominyq.com offer another option for SMEs and developers. Through fine‑grained scheduling and idle resource integration, the platform consistently maintains stable rental prices more than 10% below market average, providing a rare cost oasis in the tight compute environment.
The breadth and depth of China’s compute export far exceed expectations. According to the Cloudpilot report, in Q1 2026, the total duration of overseas enterprise customers calling Chinese cloud and AI compute on the platform exceeded 1.2 billion core hours, a surge of 280% quarter‑on‑quarter. In six Middle Eastern countries, Chinese top cloud vendors for the first time captured over 50% of compute service contract wins in government and state‑owned enterprise digitalization projects. In new enterprise compute procurement in the Middle East and Southeast Asia, Chinese vendors’ shares reached 35% and 42%, respectively. In emerging markets such as Asia‑Pacific, the Middle East, and Africa, Chinese top cloud vendors’ combined market share exceeded 50% for the first time, displacing traditional giants like AWS and Azure.
The underlying logic of token export is not simply compute export, but a closed‑loop three‑dimensional export of “domestic large models + compute + Chinese electricity” – the full chain of “offshore data center → token production → overseas invocation → compliant data transmission” is already in place. The price per million tokens of Chinese models is only one‑tenth that of overseas peers, forming an extremely significant cost‑performance gap while achieving near‑top‑tier performance, becoming a global passport for Chinese token exports.
Alibaba Cloud, Tencent Cloud, and Huawei Cloud – three major Chinese cloud vendors – are expanding globally along different routes. Alibaba Cloud activated a data center in Dubai in 2016, established a joint venture with Saudi Telecom Company to build two data centers in Riyadh, and obtained Saudi Arabia’s highest‑level Class C certification, allowing it to host government top‑secret data; it has over 40 partners in the Middle East and Africa and a 150 million to precisely serve key customers such as Meituan Keeta and gaming companies. Huawei Cloud’s approach is even more “full‑stack export” – bringing the entire AI infrastructure package, including efficient power supply, liquid cooling technology, and talent training, to the Middle East, having trained about 500,000 students and established 330 ICT academy partnerships. These three strategies outline the richness of China’s compute export: from localized deep cultivation to precise targeting, from heavy asset construction to ecosystem building.
Meanwhile, China Telecom’s “One Network for Going Global” service plan builds a “2+16+X” overall architecture, covering seven service packages including data processing for overseas, cross‑border e‑commerce, token export, and overseas IDC deployment, providing strong underlying support for the comprehensiveness of compute export.
The ultimate moat of compute export comes from China’s unmatched structural advantage in the “energy‑compute” dual base. Western China green electricity prices are only RMB 0.1‑0.3 per kWh, about one‑fourth to one‑fifth of Europe and the US, while electricity accounts for over 70% of total token inference costs. This electricity price difference directly reduces China’s unit token inference cost to one‑third to one‑fifth of that overseas. While European data centers are forced to shut down due to soaring energy prices, data centers in Guizhou and Inner Mongolia enjoy green electricity at less than RMB 0.3 per kWh. The “East‑West Computing Transfer” strategy places China’s intelligent computing centers mainly in western energy‑rich regions, with ample green power supply, PUE reduced to below 1.1, and some advanced clusters even reaching 1.05.
At the same time, domestic intelligent computing centers, through liquid cooling technology upgrades and high‑density cluster construction, have achieved dual improvements in energy efficiency and utilization. Domestic servers, optical modules, cold‑plate liquid cooling systems, and other digital supply chains are bundled into export services. This determines the underlying competitive ability of China’s “compute‑as‑a‑service” exports – it is not just hardware export, but the integrated output of “Chinese electricity + Chinese chips + Chinese models + Chinese operations and maintenance,” building a cost‑performance barrier that is globally difficult to replicate.
China’s leap in compute capability and its output are profoundly reshaping the global compute map. When developers, enterprises, and government agencies around the world call compute from data centers in Hangzhou, Beijing, and Shenzhen to train Arabic large models and run city transportation systems, a decentralized form of compute trade is quietly taking shape. This “compute‑as‑a‑service” export model circumvents traditional hardware tariffs while “packaging and exporting” upstream industrial chains such as servers, optical modules, and liquid cooling systems. Saudi officials have explicitly stated that China’s strong technical strength in AI makes it an important partner for Saudi Arabia’s AI and digital economy development – this state‑to‑state trust provides a foundation for China’s compute export that goes beyond mere commerce.
From “world factory” to “global digital infrastructure,” China is completing a key transition from labor‑ and resource‑intensive exports to technology‑, data‑, and intelligence‑intensive service exports. The “compute‑as‑a‑service” export model will further drive the coordinated development of the entire compute industry chain, including data centers, green electricity, liquid cooling technology, optical modules, and network design. As compute settlement services become active, it may increase the use of RMB in digital trade and enhance China’s voice in setting global digital trade rules.
It is also necessary to clearly see several key challenges facing China’s compute development. NVIDIA and other foreign vendors still dominate the high‑end training space in terms of both discourse power and ecosystem stickiness; domestic GPUs may struggle to fully cover commercial needs in large‑scale substitution scenarios within two years. Global AI agent applications are accelerating rapidly, and compute demand could spike again at any time; the extreme rigidity of supply will constrain compute service elasticity for a long time. On the geopolitical front, building overseas data centers faces multiple tests such as data sovereignty, cross‑border compliance, and grid carrying capacity; power grids in parts of Central Asia and Southeast Asia are already near their limits. Under the capital‑intensive model, the compute rental market must also guard against structural risks from over‑expansion.
Overall, the structural shortage of global compute supply opens a historic window for China’s domestic compute to transition from a follower to a leader. The China solution – “independent chip innovation + system architecture breakthroughs + green power cost moat + globalized compute services” – is reshaping the competitive rules of global AI compute. China is not only becoming a major supplier of global compute but also beginning to dominate the standards, culture, and value orientation of global digital infrastructure.
This is a moment of multi‑dimensional change. The global compute market has fallen into deep supply‑demand mismatch due to demand explosion, manufacturing bottlenecks, and infrastructure lag, profoundly rewriting the pricing logic of the NVIDIA‑dominated era. At this industrial inflection point, where crisis and opportunity coexist, China’s domestic compute has achieved a breakthrough from point‑technology efforts to full‑stack system innovation, moving from chip substitution to system architecture iteration, and from domestic consumption to global export. The “compute‑as‑a‑service” export model is becoming China’s second globally competitive core strength after manufacturing. From low‑cost green electricity in the western deserts to dense footprints of hyperscale digital hubs in the Middle East, from full‑stack hardware‑software autonomy to the strategic high ground of token export – China is completing, at unprecedented speed and depth, a historic turnaround from “compute consumer” to “global compute foundation.”