News Xbox Series X: 12 Teraflops of GPU Performance Confirmed, More Details Revealed

Admin · Feb 24, 2020

Xbox Series X got some official specs, including 12 teraflops of GPU power, hardware accelerated ray tracing and smart delivery

Xbox Series X: 12 Teraflops of GPU Performance Confirmed, More Details Revealed : Read more

bwana · Feb 24, 2020

10% the flops of a titan. So why don’t PCs have 10x the frame rate in games?

prtskg · Feb 24, 2020

bwana said:
10% the flops of a titan. So why don’t PCs have 10x the frame rate in games?

The article is talking about fp32 (single precision) compute while what you are talking about is tensor cores compute performance. This will give better comparison
https://www.anandtech.com/show/13668/nvidia-unveils-rtx-titan-2500-top-turing

bwana · Feb 24, 2020

Thank you. So it looks like the Xbox is very close to the titan in single precision. But why is the 2080ti so low in tensor performance if its CUDA count is on par w the others?

bit_user · Feb 24, 2020

bwana said:
why is the 2080ti so low in tensor performance if its CUDA count is on par w the others?

Because Nvidia intentionally nerfed it. The same GPU delivers 2x the fp16-multiply/fp32-accumulate performance in the Titan RTX and equivalent Quadro RTX model. They just didn't want people buying gaming cards for AI training workloads, which is the main purpose of that feature. So, they cut the throughput of that particular instruction in half.

However, if you compare the fp16-multiply/fp16-accumulate performance (not shown in that table), they're on par. That's for inference, which is used to accelerate things like global illumination ray tracing. So, they kept it at full performance.

There's a wealth of information, buried in this page: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units

Giroro · Feb 24, 2020

bwana said:
Thank you. So it looks like the Xbox is very close to the titan in single precision. But why is the 2080ti so low in tensor performance if its CUDA count is on par w the others?

Nvidia likes to reduce certain features (especially Double Precision) in their gaming cards, so the customers who need it like professionals and data centers need to upgrade to the far more expensive Quadro line. Tensor performance isn't really that important to gaming right now compared to how important they are to Nvidia's big AI customers.

bit_user · Feb 24, 2020

Giroro said:
Nvidia likes to reduce certain features (especially Double Precision) in their gaming cards, so the customers who need it like professionals and data centers need to upgrade to the far more expensive Quadro line.

Fun fact: they haven't done that since Kepler. Since then, all of the consumer GPUs really don't have the hardware on die for more fp64 performance. In the case of Titan V, their only data center GPU to reach consumers since then, they kept fp64 performance at full speed.

The Titan RTX is not properly built on a datacenter GPU - it's just an uncrippled version of the RTX 2080 Ti, which is a consumer GPU without more than token fp64.

BTW, AMD's Radeon VII is built on a datacenter GPU, and AMD crippled its fp64 to 1/4th of the native capability. Even after that, it's still the fastest fp64 you can get below the $3000 Titan V.

hannibal · Feb 24, 2020

Interesting to see how big part of that 12 teraflops computational power is from raytrasing hardware...
It is possible that this has less rasterisation power than 5700 has but still have more computational power!

Giroro · Feb 24, 2020

bit_user said:
Fun fact: they haven't done that since Kepler. Since then, all of the consumer GPUs really don't have the hardware on die for more fp64 performance. In the case of Titan V, their only data center GPU to reach consumers since then, they kept fp64 performance at full speed.

The Titan RTX is not properly built on a datacenter GPU - it's just an uncrippled version of the RTX 2080 Ti, which is a consumer GPU without more than token fp64.

BTW, AMD's Radeon VII is built on a datacenter GPU, and AMD crippled its fp64 to 1/4th of the native capability. Even after that, it's still the fastest fp64 you can get below the $3000 Titan V.

If it really is the case that the Tensor and RTX cores weren't left over from their datacenter GPUs... then I very much can't explain why they wasted so much of the TU102 die space and power consumption on them.

LordVile · Feb 24, 2020

admin said:
Xbox Series X got some official specs, including 12 teraflops of GPU power, hardware accelerated ray tracing and smart delivery

Xbox Series X: 12 Teraflops of GPU Performance Confirmed, More Details Revealed : Read more

Eh the vega 56 has 10.5 TFLOPs and the 64 had a shade under 13 (the 2070 comes in at 7.5). Not really a measure of a GPUs power in gaming.

bit_user · Feb 24, 2020

Giroro said:
If it really is the case that the Tensor and RTX cores weren't left over from their datacenter GPUs... then I very much can't explain why they wasted so much of the TU102 die space and power consumption on them.

See for yourself, there's no Tesla card with a TU102:

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#Tesla

Quadro RTX? Yes, it's on the 6000 and 8000 cards. While you can put them in servers, but they're mainly workstation-oriented cards.

If servers were a big market for the TU102, there should be a Tesla model - like the Tesla P40, which featured the GP102 (of GTX 1080 Ti fame).

Giroro · Feb 24, 2020

bit_user said:
See for yourself, there's no Tesla card with a TU102:

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#Tesla

Quadro RTX? Yes, it's on the 6000 and 8000 cards. While you can put them in servers, but they're mainly workstation-oriented cards.

If servers were a big market for the TU102, there should be a Tesla model - like the Tesla P40, which featured the GP102 (of GTX 1080 Ti fame).

Interesting; I guess I was off base but I'm no expert in server or even workstation GPUs.
To me, NVidia's handling of RTX features has felt like they have been scraping to find ways to sell datacenter features to gamers, not the other way around.
Or maybe render farms and AI don't need DP? That's not in my realm of experience.

Chung Leong · Feb 24, 2020

hannibal said:
Interesting to see how big part of that 12 teraflops computational power is from raytrasing hardware...

The answer is zero. RT hardware accelerates the search for an intersecting triangle. It doesn't do any compute in the usual sense.

alextheblue · Feb 24, 2020

Giroro said:
If it really is the case that the Tensor and RTX cores weren't left over from their datacenter GPUs... then I very much can't explain why they wasted so much of the TU102 die space and power consumption on them.

Like Bit said, they're a repurposed workstation design. They decided they could push those features into the gaming space, especially with the aid of Developer Bucks™. In their estimation this was a better move than putting together a new chip for the high-end gaming market. I think they were right, even if I feel RT is not incredibly useful below the 2080 (due to the performance hit).

bit_user said:
BTW, AMD's Radeon VII is built on a datacenter GPU, and AMD crippled its fp64 to 1/4th of the native capability. Even after that, it's still the fastest fp64 you can get below the $3000 Titan V.

Actually it's half of the native capability. VII FP64 is 1/4 of the FP32 rate, which is in turn 1/2 the native 1/2 FP64 rate of the Vega 20 silicon.

bit_user · Feb 25, 2020

hannibal said:
Interesting to see how big part of that 12 teraflops computational power is from raytrasing hardware...

I could be wrong (companies have done worse), but I really think they're not counting that.

bit_user · Feb 25, 2020

LordVile said:
Eh the vega 56 has 10.5 TFLOPs and the 64 had a shade under 13 (the 2070 comes in at 7.5). Not really a measure of a GPUs power in gaming.

True, but within a product line it is. Also, note that the 2070 Super peaks at 9 TFLOPS.

Anyway, it's probably reasonable to compare a 12 TFLOPS RDNA2 GPU against the 9.8 TFLOPS RDNA RX 5700 XT. That assumes that RDNA2 is at least as efficient as RDNA, and that they roughly scale up memory bandwidth, to match. Two ways they could add bandwidth are by going to a 384-bit bus, like the XBox One X, or adding some in package memory, like the original XBox One.

bit_user · Feb 25, 2020

Giroro said:
To me, NVidia's handling of RTX features has felt like they have been scraping to find ways to sell datacenter features to gamers, not the other way around.

Oh, I totally agree. They were definitely stretching to find justifications to put Tensor Cores in gaming GPUs.

Giroro said:
Or maybe render farms and AI don't need DP? That's not in my realm of experience.

I think the distinction we're tripping over is that the datacenter market has fragmented. For inferencing, AI can use 8-bit, 4-bit, and people (not very successfully, AFAIK) are even trying to use 1-bit. I've never heard of deep learning using fp64, for either inferencing or training. For most inferencing scenarios, 32-bit, and even potentially 16-bit is overkill, although training is a different story.

Meanwhile, traditional HPC continues to need fp64, while also starting to take advantage of AI.

LordVile · Feb 25, 2020

bit_user said:
True, but within a product line it is. Also, note that the 2070 Super peaks at 9 TFLOPS.

Anyway, it's probably reasonable to compare a 12 TFLOPS RDNA2 GPU against the 9.8 TFLOPS RDNA RX 5700 XT. That assumes that RDNA2 is at least as efficient as RDNA, and that they roughly scale up memory bandwidth, to match. Two ways they could add bandwidth are by going to a 384-bit bus, like the XBox One X, or adding some in package memory, like the original XBox One.

Is a semi custom chip though so even though the arc is the same it’s not the same product line and won’t have the same layout.

bit_user · Feb 25, 2020

LordVile said:
Is a semi custom chip though so even though the arc is the same it’s not the same product line and won’t have the same layout.

Uh, probably the compute units and even higher-level blocks are the same as those destined AMD's mainstream GPU line. Note it's a semi-custom chip, not full-custom. Of course, being a different generation will mean differences, at that level, between it and the first-gen RDNA products.

Where you'll see differences vs. RDNA2 dGPUs is in how they're connected to the memory subsystem(s).

Anyway, I stand by my earlier claim that performance should scale relative to first-gen RDNA, if not better (i.e. due to things like variable-rate shading).

LordVile · Feb 25, 2020

bit_user said:
Uh, probably the compute units and even higher-level blocks are the same as those destined AMD's mainstream GPU line. Note it's a semi-custom chip, not full-custom. Of course, being a different generation will mean differences, at that level, between it and the first-gen RDNA products.

Where you'll see differences vs. RDNA2 dGPUs is in how they're connected to the memory subsystem(s).

Anyway, I stand by my earlier claim that performance should scale relative to first-gen RDNA, if not better (i.e. due to things like variable-rate shading).

Well performance should be better than a like for like part anyway due to console optimisation.

News Xbox Series X: 12 Teraflops of GPU Performance Confirmed, More Details Revealed

Administrator

Distinguished

Distinguished

Distinguished

Polypheme

Splendid

Polypheme

Distinguished

Splendid

Splendid

Polypheme

Splendid

Reputable

Distinguished

Polypheme

Polypheme

Polypheme

Splendid

Polypheme

Splendid

Share this page