News Intel's Return to HEDT? Xeon W9-3495X Hits Geekbench

dalek1234

Honorable
Sep 27, 2019
251
113
10,860
Your table with Geekbench result has misaligned headers. It make it look like AMD lost in almost every benchmark, when it fact those poor results belong to Intel.

Which bring me to the next point: "[Xeon] a strong rival [Threadripper] ". What are you smoking? Intel is behind AMD in 7 out of 8 of your benchmarks, sometimes by a big margin. And the only benchmark AMD loses to is "crypto". Nobody does crypto on a CPU, so that's a useless benchmark anyway. And what about power consumption and heat generated? I bet that even with the poor Xeon performance, it probably sucks rediculous amounts of power compared to Threadripper.
 
  • Like
Reactions: bit_user

jp7189

Distinguished
Feb 21, 2012
532
303
19,260
Your table with Geekbench result has misaligned headers. It make it look like AMD lost in almost every benchmark, when it fact those poor results belong to Intel.

Which bring me to the next point: "[Xeon] a strong rival [Threadripper] ". What are you smoking? Intel is behind AMD in 7 out of 8 of your benchmarks, sometimes by a big margin. And the only benchmark AMD loses to is "crypto". Nobody does crypto on a CPU, so that's a useless benchmark anyway. And what about power consumption and heat generated? I bet that even with the poor Xeon performance, it probably sucks rediculous amounts of power compared to Threadripper.
I agree with most of your points, except crypto refers to cryptographic functions which can be a significant piece of the workload for the target market. It does not refer to crypto currency.

Considering the low single core crypto score, it seems a strange outlier for unusually high multi core score.
 

PCWarrior

Distinguished
May 20, 2013
216
101
18,770
This is not a 56core cpu. It is a 28core/56thread cpu. What is reported in Geekbench is the number of logical cores.
https://browser.geekbench.com/v5/cpu/20093542

This is evidenced by the L1 and L2 Cache numbers where the multiplier is 28 and not 56.
L1 Instruction Cache 32.0 KB x 28
L1 Data Cache 48.0 KB x 28
L2 Cache 2.00 MB x 28

Intel will be entering the HEDT market again with prices up to $2000 or even up to $3000 but not higher. So they won't be offering their 56-core server parts as workstation offerings. It is very likely that the HEDT flagship will be markeked as the successor of the 28core W3175X which launched for $3000. In a market where even consumer-grade higher-end boards cost over $600, good motherboards for the HEDT platform will easily cost $1000-$2000, making a $2000-$3000 cpu kind of a reasonable pairing.
 

bit_user

Titan
Ambassador
As in some other articles I've seen, the table column headings are shifted left by one.

Xeon W9-3495XRyzen Threadripper Pro 5995WX
General specifications56C/112T, 1.90 GHz - 3.20GHz, 105MB L364C/128T, 2.70 GHz - 4.50 GHz, 256MB L3
Single-Core | Integer11201316
Single-Core | Float13381719

I think adding a column heading at the left would fix it (maybe call it "Test").

TestXeon W9-3495XRyzen Threadripper Pro 5995WX
General specifications56C/112T, 1.90 GHz - 3.20GHz, 105MB L364C/128T, 2.70 GHz - 4.50 GHz, 256MB L3
Single-Core | Integer11201316
Single-Core | Float13381719
 

bit_user

Titan
Ambassador
Which bring me to the next point: "[Xeon] a strong rival [Threadripper] ". What are you smoking?
I generally agree, but there's a possibility this is still just an engineering sample, which typically perform a worse than final production examples. I'm rather suspicious that's what we're seeing, since especially the single-core scores don't tally with what we saw when Alder Lake launched against Ryzen 5000.

Regarding multi-threaded, 56 Golden Cove cores should be very competitive against 64 Zen3 cores. It's really the matchup against the 7000-series Threadripper that will be interesting.
 

bit_user

Titan
Ambassador
This is not a 56core cpu. It is a 28core/56thread cpu. What is reported in Geekbench is the number of logical cores.
https://browser.geekbench.com/v5/cpu/20093542
If the model number is being correctly reported and they're not doing anything weird by restricting cores or running without hyperthreading, then it's truly a 56-core/112-thread CPU.
LpUJggdUNj4hkQmmMS7AdN.jpeg

This is evidenced by the L1 and L2 Cache numbers where the multiplier is 28 and not 56.
L1 Instruction Cache 32.0 KB x 28
L1 Data Cache 48.0 KB x 28
L2 Cache 2.00 MB x 28
I wouldn't read too much into that. Especially for an unreleased CPU. There could be some bug in how it's detecting those stats.

Next, let's look at scaling data, to see how that aligns with different hypothetical core & thread counts. Below, I've taken each CPU's multithreaded score and divided it by the respective single-threaded score. What we get tells us how well performance scaled up.
TestXeon W9-3495XTR Pro 5995WX
Integer
30.0​
35.0​
Float
30.1​
28.7​
Crypto
19.9​
11.7​
Overall
31.4​
30.1​

This shows is that Geekbench's multithreaded tests apparently don't scale very well, because even the 64-core/128-thread TR 5995WX tends to scale up to about 30x the performance of single-threaded. There could be lots of reasons for this, including poorly-written benchmarks or perhaps encountering memory bottlenecks or excessive contention for L3 cache. Also, the single-core tests will be running at max turbo, while multithreaded tests will tend to run at much lower base clocks.

Intel will be entering the HEDT market again with prices up to $2000 or even up to $3000 but not higher. So they won't be offering their 56-core server parts as workstation offerings.
Not well-researched statements, it seems. Check the source link above, and you'll also find a list of the 2400-series models. I expect the 2400-series CPUs to range from about $800 to $3500, and the 3400-series CPUs to range from about $2k to $7k.
 

PCWarrior

Distinguished
May 20, 2013
216
101
18,770
If the model number is being correctly reported and they're not doing anything weird by restricting cores or running without hyperthreading, then it's truly a 56-core/112-thread CPU.
LpUJggdUNj4hkQmmMS7AdN.jpeg


I wouldn't read too much into that. Especially for an unreleased CPU. There could be some bug in how it's detecting those stats.

Next, let's look at scaling data, to see how that aligns with different hypothetical core & thread counts. Below, I've taken each CPU's multithreaded score and divided it by the respective single-threaded score. What we get tells us how well performance scaled up.

TestXeon W9-3495XTR Pro 5995WX
Integer
30.0​
35.0​
Float
30.1​
28.7​
Crypto
19.9​
11.7​
Overall
31.4​
30.1​


This shows is that Geekbench's multithreaded tests apparently don't scale very well, because even the 64-core/128-thread TR 5995WX tends to scale up to about 30x the performance of single-threaded. There could be lots of reasons for this, including poorly-written benchmarks or perhaps encountering memory bottlenecks or excessive contention for L3 cache. Also, the single-core tests will be running at max turbo, while multithreaded tests will tend to run at much lower base clocks.


Not well-researched statements, it seems. Check the source link above, and you'll also find a list of the 2400-series models. I expect the 2400-series CPUs to range from about $800 to $3500, and the 3400-series CPUs to range from about $2k to $7k.
The 3495X is indeed a 56core/112thread cpu but apparently in this reported benchmark it run with half part of it disabled. Not only is this evident by the reported cache multipliers (x28 instead of x 56) but also by the fact that the reported memory channels are 4 instead of 8 (all the leaked slides are mentioning 8 memory channels). After all the 3495X is a two-tile cpu so maybe it was running with one tile disabled?
 

bit_user

Titan
Ambassador
After all the 3495X is a two-tile cpu so maybe it was running with one tile disabled?
According to the benchmark analysis I posted (see the table), it's not plausible it was running with only 28 cores.

Again, this silicon is quite likely an engineering sample and Geekbench could've had some bugs in the code trying to query its stats. Two potential sources of mis-reporting I find more plausible than that it was really running on just 28 cores. Anyway, we'll know soon enough how the 3495X truly performs.
 

PCWarrior

Distinguished
May 20, 2013
216
101
18,770
According to the benchmark analysis I posted (see the table), it's not plausible it was running with only 28 cores.

Again, this silicon is quite likely an engineering sample and Geekbench could've had some bugs in the code trying to query its stats. Two potential sources of mis-reporting I find more plausible than that it was really running on just 28 cores. Anyway, we'll know soon enough how the 3495X truly performs.
Your scaling analysis is not quite complete however as it doesn’t take into account the difference in clock speeds between single and all-core/threaded workloads. The 5995WX has a single-core boost of 4.5GHz but its all-core boost is closer to its base clock of 2.7GHz. If on Geekbench the 5995WX runs at 3GHz all-core you expect (from the frequency difference alone) to get only 2/3 of the performance compared to all cores running at 4.5Ghz. The Intel ES on the other hand is quite plausible that it was locked to the base frequency of 1.9GHz for both single and multithreaded benchmarks. And you can get scaling higher than 28x with 28cores /56threads due to hyperthreading.
 

SunMaster

Respectable
Apr 19, 2022
220
200
1,960
Your scaling analysis is not quite complete however as it doesn’t take into account the difference in clock speeds between single and all-core/threaded workloads. The 5995WX has a single-core boost of 4.5GHz but its all-core boost is closer to its base clock of 2.7GHz. If on Geekbench the 5995WX runs at 3GHz all-core you expect (from the frequency difference alone) to get only 2/3 of the performance compared to all cores running at 4.5Ghz. The Intel ES on the other hand is quite plausible that it was locked to the base frequency of 1.9GHz for both single and multithreaded benchmarks. And you can get scaling higher than 28x with 28cores /56threads due to hyperthreading.

Good thing you know all about how this unreleased, unfinished CPU performs despite being - according to you - misrepresented and misinterpreted by geekbench.

Thank you so much.
 

bit_user

Titan
Ambassador
Your scaling analysis is not quite complete however as it doesn’t take into account the difference in clock speeds between single and all-core/threaded workloads.
In fact, I did actually say:
"Also, the single-core tests will be running at max turbo, while multithreaded tests will tend to run at much lower base clocks."​

I have no visibility into why those benches didn't scale better, without more information than was provided. All I know is that a 64-core/128-thread threadripper tended to scale similarly to that Xeon, which supports the idea that the Xeon was actually running on all 56 cores.

If on Geekbench the 5995WX runs at 3GHz all-core you expect (from the frequency difference alone) to get only 2/3 of the performance compared to all cores running at 4.5Ghz.
...if you assume linear scaling. It turns out scaling is almost never linear, due to things like:
  • memory contention
  • storage contention
  • synchronization overhead and lock contention
  • poor load-balancing
  • scheduling overhead
And, to the extent that there's either significant contention or poor load-balancing, you could actually have some cores running higher than base clocks. In fact, in low-IPC code, you could also see all-core workloads running at higher than base clocks.

So, we really don't have enough information to establish confident expectation of how it should scale, other than simply comparing it with that threadripper.

And you can get scaling higher than 28x with 28cores /56threads due to hyperthreading.
The threadripper has it, too. Presumably, it was enabled on both.
 

bit_user

Titan
Ambassador
Good thing you know all about how this unreleased, unfinished CPU performs despite being - according to you - misrepresented and misinterpreted by geekbench.

Thank you so much.
Chill out, please. This is just low-stakes, armchair speculation. It's not worth getting snarky.

By working through the data, maybe we get some insights. I believe @PCWarrior is posting in good faith, so I'm happy to walk through the data and my analysis.
 

PCWarrior

Distinguished
May 20, 2013
216
101
18,770
In fact, I did actually say:
"Also, the single-core tests will be running at max turbo, while multithreaded tests will tend to run at much lower base clocks."​
I was referring to your numbers and your subsequent statement that based on those it was implausible for the 3495X to be running with one tile disabled.​

...if you assume linear scaling. It turns out scaling is almost never linear, due to things like:
  • memory contention
  • storage contention
  • synchronization overhead and lock contention
  • poor load-balancing
  • scheduling overhead
And, to the extent that there's either significant contention or poor load-balancing, you could actually have some cores running higher than base clocks. In fact, in low-IPC code, you could also see all-core workloads running at higher than base clocks.
Of course, performance never scales linearly with number of cores and/or frequency due to the reasons you stated. I never claimed that it did though. What I said was that due to the frequency difference alone you expect only 2/3 of the performance of single threaded score times the number of cores. In other words, I was referring to the reason why even the theoretical maximum number you would get with such a division (of multi-threaded score/single-threaded score) would have to be significantly smaller than the number of cores or threads. Also, when we refer to reasons of poor scaling, we really can’t include the difference in frequency as that alone is naturally expected to cause a linear increase/decrease in performance and it is due to other bottlenecks/reasons that doesn’t happen.

I have no visibility into why those benches didn't scale better, without more information than was provided. All I know is that a 64-core/128-thread threadripper tended to scale similarly to that Xeon, which supports the idea that the Xeon was actually running on all 56 cores.

So, we really don't have enough information to establish confident expectation of how it should scale, other than simply comparing it with that threadripper.

The threadripper has it, too. Presumably, it was enabled on both.
Well it can work the other way around too so you can' t tell from the performance. Because if, as you support, Geekbench can only scale to around 30x the cores/threads then a 28core/56thread cpu would perform similarly to a 56core/112thread one.

As a matter of fact, the MT score of the 64core/128thread 5995WX on Geekbench is 26031 and that of the 32core/64Thread 5975WX is 26768. So the 5995WX and the 5975WX have similar MT performance on Geekbench. By the same logic a 28core Xeon and a 56 core Xeon would have to have indistinguishable performance as well.
https://browser.geekbench.com/processors/amd-ryzen-threadripper-pro-5995wx
AMD Ryzen Threadripper PRO 5975WX Benchmarks - Geekbench Browser

I guess we will have to wait and see how it performs with final clocks.
 
  • Like
Reactions: bit_user

Cooe

Prominent
Mar 5, 2023
27
23
535
I was referring to your numbers and your subsequent statement that based on those it was implausible for the 3495X to be running with one tile disabled.​

Of course, performance never scales linearly with number of cores and/or frequency due to the reasons you stated. I never claimed that it did though. What I said was that due to the frequency difference alone you expect only 2/3 of the performance of single threaded score times the number of cores. In other words, I was referring to the reason why even the theoretical maximum number you would get with such a division (of multi-threaded score/single-threaded score) would have to be significantly smaller than the number of cores or threads. Also, when we refer to reasons of poor scaling, we really can’t include the difference in frequency as that alone is naturally expected to cause a linear increase/decrease in performance and it is due to other bottlenecks/reasons that doesn’t happen.


Well it can work the other way around too so you can' t tell from the performance. Because if, as you support, Geekbench can only scale to around 30x the cores/threads then a 28core/56thread cpu would perform similarly to a 56core/112thread one.

As a matter of fact, the MT score of the 64core/128thread 5995WX on Geekbench is 26031 and that of the 32core/64Thread 5975WX is 26768. So the 5995WX and the 5975WX have similar MT performance on Geekbench. By the same logic a 28core Xeon and a 56 core Xeon would have to have indistinguishable performance as well.
https://browser.geekbench.com/processors/amd-ryzen-threadripper-pro-5995wx
AMD Ryzen Threadripper PRO 5975WX Benchmarks - Geekbench Browser

I guess we will have to wait and see how it performs with final clocks.
... Annnnnnnnd it sucks. A lot. Like by an unbelievable amount actually. Sapphire Rapids is an absolute dumpster fire. Wouldn't be surprised if your absolutely brainwashed Intel fanboy self still thinks it's awesome though. I mean, you are literally the exact same person who said that AMD should have cancelled 3D V-Cache... 🤦