News China Wants 300 ExaFLOPS of Compute Power by 2025

Status
Not open for further replies.

bit_user

Titan
Ambassador
Okay, so a single H100 delivers between 25.6 TFLOPS (PCIe; 350 W) and 33.5 TFLOPS (SXM; 700 W) @ fp64. So, 39 of the PCIe cards or 30 of the SXM cards would get you to 1 PFLOPS. Multiply that by 100k to reach 100 EFLOPS. I don't know what's Nvidia's annual production volume of H100's, but I'm pretty sure it's not more than 3M! Also, you could look at what proportion of the global production volume of HBM or DDR6 it would consume.

The situation with the A100 is even worse, since it delivers only 5.2 TFLOPS or 9.7 TFLOPS. So, either 192 PCIe cards or 103 SXM cards per PFLOPS.

Of course, I'm taking a leap by assuming they're talking about fp64 compute, but that's the standard for scientific & technical computing (i.e. HPC).

Another interesting angle is to consider how much power that would consume. 3.9M * 350W = 1.37 Gigawatts, best case (assuming they mean fp64). If it's using their own processors, made on like 12 nm, then maybe like 10x as much?
 
  • Like
Reactions: helper800

jp7189

Distinguished
Feb 21, 2012
395
221
19,060
Okay, so a single H100 delivers between 25.6 TFLOPS (PCIe; 350 W) and 33.5 TFLOPS (SXM; 700 W) @ fp64. So, 39 of the PCIe cards or 30 of the SXM cards would get you to 1 PFLOPS. Multiply that by 100k to reach 100 EFLOPS. I don't know what's Nvidia's annual production volume of H100's, but I'm pretty sure it's not more than 3M! Also, you could look at what proportion of the global production volume of HBM or DDR6 it would consume.

The situation with the A100 is even worse, since it delivers only 5.2 TFLOPS or 9.7 TFLOPS. So, either 192 PCIe cards or 103 SXM cards per PFLOPS.

Of course, I'm taking a leap by assuming they're talking about fp64 compute, but that's the standard for scientific & technical computing (i.e. HPC).

Another interesting angle is to consider how much power that would consume. 3.9M * 350W = 1.37 Gigawatts, best case (assuming they mean fp64). If it's using their own processors, made on like 12 nm, then maybe like 10x as much?
Considering the focus was on AI, I think it unlikely FP64 performance will be the benchmark. H100 is 8x faster @ FP16. 100 FP16 EFLOPS looks a lot more attainable.
 
  • Like
Reactions: bit_user

zsydeepsky

Great
Oct 12, 2023
35
30
60
Another interesting angle is to consider how much power that would consume. 3.9M * 350W = 1.37 Gigawatts, best case (assuming they mean fp64). If it's using their own processors, made on like 12 nm, then maybe like 10x as much?
just considering the electricity requirements...
averagely speaking, China increases its electricity generation by ~200,000 Gwh, per year. a 1.3 Gigawatts system runs 365x24 consumes 11,388 Gwh energy, which China can just add around 17.5 of them per year.
so it's not a problem at all.
 
Last edited:

bit_user

Titan
Ambassador
China can just add around 17.5 of them per year.
so it's not a problem at all.
You seem to assume China is able to keep ahead of demand for electricity, however that will be growing as they continue to push for electrification of their transportation infrastructure.

Also, I just computed the GPU power. I didn't factor in the host machine, cooling, or infrastructure. So, probably multiply my estimate by at least 1.5.

The other thing is that you took my best-case estimate of using H100, ignoring the part where I floated the notion of a 10x less-efficient accelerator on a node they could domestically mass-produce. Taken together, that would be something like 15x as high as the estimate you used. Do you still think it's not a problem, at all?

Furthermore, since China's power-generation is predominantly based on fossil fuel, they'd have to keep scaling up their fuel importation/production to match.
 

zsydeepsky

Great
Oct 12, 2023
35
30
60
You seem to assume China is able to keep ahead of demand for electricity, however that will be growing as they continue to push for electrification of their transportation infrastructure.

Also, I just computed the GPU power. I didn't factor in the host machine, cooling, or infrastructure. So, probably multiply my estimate by at least 1.5.

The other thing is that you took my best-case estimate of using H100, ignoring the part where I floated the notion of a 10x less-efficient accelerator on a node they could domestically mass-produce. Taken together, that would be something like 15x as high as the estimate you used. Do you still think it's not a problem, at all?

Furthermore, since China's power-generation is predominantly based on fossil fuel, they'd have to keep scaling up their fuel importation/production to match.
read carefully, I said, China can afford *17.5* your new system, per year.
since we still have 2 years to go, China can afford ~35 of your system.
even if you multiply the energy cost by 10 times it's still feasible.

also, China has its own version of H100/A100: Huawei's Ascened card, which was reported on Tom's hardware as well:
and Huawei has access to at least 7nm tech which was proven by their new cell phone release. the energy efficiency will be much better than your estimates.

besides, in 2022, when China adds new electricity outputs, about 50% is non-fossil energy. so just by the newly added renewable energy, China can easily handle at least 35 / 2 (energy efficiency) / 2 (renewables) / 2 (heat management) = 4.4 computation systems.

so, again, easy task.
 

bit_user

Titan
Ambassador
read carefully, I said, China can afford *17.5* your new system, per year.
since we still have 2 years to go, China can afford ~35 of your system.
even if you multiply the energy cost by 10 times it's still feasible.
Assuming they have either the ability to scale up much faster than their current rate, or that their current rate of construction is creating plenty of spare capacity, sure. I expect neither is exactly true.

also, China has its own version of H100/A100: Huawei's Ascened card, which was reported on Tom's hardware as well:
It's not "H100/A100". The two are definitely not interchangeable, as should be quite clear from my first post. The article said A100.

If it's exactly as powerful and efficient as the A100 (which seems unlikely, but let's assume so), then the total would be about 4.8 GigaWatts for just the GPUs. If we add 50% for hosts, infrastructure, and cooling (TBH, I think it's probably more like 70%, since IIRC cooling is typically a 30% multiplier on everything else), then the figure would be more like 7.2 GW.

Not as extreme as 17.5, but still big enough that you probably can't just assume that much excess capacity will be available - would have to at least be planned for and budgeted.

and Huawei has access to at least 7nm tech which was proven by their new cell phone release. the energy efficiency will be much better than your estimates.
The A100 was made on TSMC N7. Do we have good information on how those nodes compare?

besides, in 2022, when China adds new electricity outputs, about 50% is non-fossil energy. so just by the newly added renewable energy, China can easily handle at least 35 / 2 (energy efficiency) / 2 (renewables) / 2 (heat management) = 4.4 computation systems.
Your original figure seemed to indicate total generation capacity added. I think you're double-counting.
 

zsydeepsky

Great
Oct 12, 2023
35
30
60
The A100 was made on TSMC N7. Do we have good information on how those nodes compare?

according to some Chinese analysis, just comparing the transistor density, SMIC N+2 (which was used on Kirin 9000s) is equivalent to TSMC N7P, or Intel 7nm, or Samsung 6LPP

<Non English Language link removed by moderator>

If it's exactly as powerful and efficient as the A100 (which seems unlikely, but let's assume so), then the total would be about 4.8 GigaWatts for just the GPUs. If we add 50% for hosts, infrastructure, and cooling (TBH, I think it's probably more like 70%, since IIRC cooling is typically a 30% multiplier on everything else), then the figure would be more like 7.2 GW.

...

Your original figure seemed to indicate total generation capacity added. I think you're double-counting.

I'll just use the 70% multiplier, then the total energy cost for the system to run 24*365 should be:
4.8GW* 170% * 24*365 = 71481.6 GWh = 71.5 TWh
it still doesn't exceed China's energy output adding capacity, *averagely* it adds 200TWh energy output capacity *per year*, and it still falls in the *renewable* energy capacity range.

besides, according to historical data, in a not so "average" year, China can add way more capacity, for example in the year 2021, China added 750TWh energy generation capacity:


as a comparison, that's equivalent to almost 1/5 of the US total annual energy generation capacity (4100 TWh), or 127% of Germany (588 TWh), or 78% of Japan (967 TWh), so I understand why that energy cost for such system seems unfeasible to people who are not familiar with China's industry scale
 
Last edited by a moderator:
Status
Not open for further replies.