News AMD Instinct MI300A data center APU underperforms against mainstream CPUs in Geekbench — MI300A submissions show lower performance than a Core i5-1...

NinoPino

Respectable
May 26, 2022
483
300
2,060
What's the sense of this article ?
Maybe it is a prototype.
Maybe not well cooled.
Unknown motherboard type.
Maybe software have incompatibility.
Variability of results indicate serious issues in the benchmarking.
Geekbench is hardly the aimed workflow of these chips, so the value of geekbench results is questionable, also if results are very good.
 
What's the sense of this article ?
Maybe it is a prototype.
Maybe not well cooled.
Unknown motherboard type.
Maybe software have incompatibility.
Variability of results indicate serious issues in the benchmarking.
Geekbench is hardly the aimed workflow of these chips, so the value of geekbench results is questionable, also if results are very good.
About the only use for Geekbench in a data center chip is to test that it completes the benchmark if it is an early silicon.
 

purposelycryptic

Distinguished
Aug 1, 2008
48
57
18,610
A prototype of a new-generation data center APU didn't perform up to par in its first run of a standard CPU benchmark? I'm sorry, but that really doesn't tell us much about how the final product will perform in its intended role.

The integrated AI-GPU, NPU, or whatever AMD's brand of AI-accelerators is called, provides a significant part of the value of these chips, hence them being an entirely separate processor line from their server CPUs.

Sidenote: the industry really need to decide on a standardized, non-copyrighted generic name for these things - Intel calls them NPUs, my home server runs a couple of discrete Google Coral TPUs, I'm guessing Qualcomm probably has their own name for them... It's a mess. The only thing everyone seems to agree on is to advertise their performance using OPS (TOPS, POPS, etc), the same kind of seemingly but not really useful measure for comparison as FLOPS continue to be for GPUs.

Anyway, these things are trying to balance multi core processing power with power consumption, *and* have an additional powerful integrated secondary AI processor onboard, so, comparing benchmarks with consumer CPUs, or workstation CPUs like the Threadrippers that are designed around maximizing all-around single-processor performance with minimal consideration to power consumption, is kind of an 'apples to bananas to cumquats' comparison, even if this were a production chip.

These numbers obviously don't look good, but the benchmarking tool they are coming from also hasn't been updated for these processors, nor accounts for them being special-purpose APUs. So they are numbers resulting from running an uncalibrated tool on a prototype of a specialized processor.

We know AMD can make killer processors, the article even compared it to one. I feel like it's more than a little early to call them out on this one based on this information alone.

But, then again, I'm hardly objective when it comes to AMD - I invested fairly heavily in them (as well as less but still significantly in Nvidia and Intel... and pretty much the entire industry chain down from ASML up to Supermicro) earlier this year, when everything was all sunshine and lollipops for AMD/Nvidia, and it looked like Intel had gone as low as it was going to, and headed for better days. Needless to say, things have looked more than a little rough in the semiconductor industry this latter half of the year, and, with around 10% of my portfolio directly invested there, not counting double-dips through ETFs, it would make me very happy to see AMD's fancy new toy do well. Make of that what you will.
 
Sep 7, 2024
3
2
10
Do you see any Xeons at the top of the geekbench scores? And there's no Epycs, or any of the highest priced latest gen processors there. I think you're really onto something here mate!! Totally cracked open the case... or ya know, you can have someone a bit more familiar with enterprise hardware explain why a processor with 24 full cores, and a base speed in the 3.x range, and obviously not using all, or even half the cores, may perform 30ish percent worse, than an unlocked, enthusiast processor (certainly with the RAM and HDD also being maximized, since who cares if you have to reboot a system used for having fun), that can easily reach 5.8ghz on all 8 performance cores...

As youre filing your next huge break in this case, don't forget to mention how both of those processors are thoroughly trounced in performance per dollar measurements, by the i3-10100. And it's not even close. How embarrassing. Why is anyone even buying new hardware, amirite?!?!!?
 

Gururu

Prominent
Jan 4, 2024
301
202
570
The fact is that the performance is horrific. However, it is clear as day to me that the caveats for such preliminary testing are adequately listed. Something to keep an eye on, since there will be a LOT of data coming from chipmakers this month.
 

EasyListening

Distinguished
Mar 14, 2017
14
6
18,525
The fact is that the performance is horrific. However, it is clear as day to me that the caveats for such preliminary testing are adequately listed. Something to keep an eye on, since there will be a LOT of data coming from chipmakers this month.
No the performance is where it should be for an early engineering sample. You should read others' posts before doubling down on your own ignorance.
 

bit_user

Titan
Ambassador
The fact is that the performance is horrific.
GB6 is so weird, though. It would be informative to spend some time browsing their results database. Like, how on earth is the top multi-core Windows score held by an i7-13700K?

I've also dug through some wildly different multicore scores on the same Xeon W model and concluded that the MT benchmark must be heavily dependent on memory bandwidth & latency. What I think happened, in that case, is that some of those Xeon W workstations had only half of their memory channels populated, because the lower results were about half of the better scores for that same CPU (and this CPU didn't support multi-socket configurations).

Also, note how the results for the same CPU differ significantly, between operating systems.

Before I'm willing to take Geekbench scores seriously, I'd need to see at least two things:
  1. A results browser that shows histograms of data for each CPU + platform combination, so you can easily see what the typical values are, by looking for the peaks. This enables outliers (due to overclocking, hacks, etc.) to be easily disregarded.
  2. Some data showing how well GB6 scores correlate to real-world performance metrics. Please, Geekbench, show me how your black box benchmark is at all relevant to anyone, for anything!

I can't believe I'm telling someone with your username to chill out, but here we are!
; )
 
Last edited:

Quirkz

Prominent
Feb 17, 2023
30
28
560
Data center chip designed for AI workloads.
The only important benchmark is how well the chip performs on those tasks, not a CPU benchmark.
AMD has optimised for a particular task; and I'm betting that they're dedicating most of the power budget and die space to that task.

Run large neural networks on a computer, and it's the GPU that's being hammered, not the CPU.
 
  • Like
Reactions: NinoPino

Quirkz

Prominent
Feb 17, 2023
30
28
560
No it’s not. All Epyc chips score significantly less than Ryzen in single threaded benchmarks. They’re only running 60% of the clock speed.
exactly.
Epyc/data centre is all about performance per watt.
And performance per watt is best at the lower voltages/frequency sweetspot.

There are very few datacenter loads where max singlethreaded performance is important (some databases, certain very specific low latency number crunching)
 

bit_user

Titan
Ambassador
Epyc/data centre is all about performance per watt.
And performance per watt is best at the lower voltages/frequency sweetspot.
Not always. Some hardware and workloads are so expensive and such high-value that the hardware is often pushed outside its efficiency range. AI is currently one such example.

There are very few datacenter loads where max singlethreaded performance is important (some databases, certain very specific low latency number crunching)
I think it's the MT benchmark score is most nonsensical. This is the reason why a 24-core threadripper was chosen for comparison, but I think GB6 is too opaque and we don't have enough insight into what the MT benchmark is really measuring.
 

Quirkz

Prominent
Feb 17, 2023
30
28
560
Not always. Some hardware and workloads are so expensive and such high-value that the hardware is often pushed outside its efficiency range.
Agreed, and I gave examples of this - such as some DB loads, and low latency number crunching (specifically finance).
AI is currently one such example.
Respectfully disagree here. Most AI workloads are very parallel vector workloads that benefit from a very large number of execution units: Which usually means you can aim for the power curve sweetspot, and add compute units to the chip using that extra power budget. This is why they've got over three times the number of compute units on the MI300 vs the 7900XTX and added the 24 core CPU, all for around twice the power budget.
The only reason to push clock speeds is if the silicon gets so large you're having yield issues before you reach TDP limits.

The cost of the silicon is not much of an issue - Most data enter costs are dominated by power/cooling over the lifetime of the product.

I think it's the MT benchmark score is most nonsensical. This is the reason why a 24-core threadripper was chosen for comparison, but I think GB6 is too opaque and we don't have enough insight into what the MT benchmark is really measuring.
Also agreed. GB6 is a completely irrelevant benchmark. This is an AI accelerator. Using a CPU benchmark is ignoring three quarters of the transistors on the silicon.
 
  • Like
Reactions: NinoPino

bit_user

Titan
Ambassador
Respectfully disagree here. Most AI workloads are very parallel vector workloads that benefit from a very large number of execution units: Which usually means you can aim for the power curve sweetspot, and add compute units to the chip using that extra power budget.
Okay, so let's look at an example in the Nvidia H100 (Source: https://www.anandtech.com/show/1878...-memory-server-card-for-large-language-models ):
  • The PCIe version is limited to 350 W and achieves 756 fp16 tensor TFLOPS (2.16 per W).
  • The SXM version is limited to 700 W and achieves 990 fp16 tensor TFLOPS (1.41 per W).

Clearly, the SXM version is pushed well beyond the efficiency sweet spot.

The cost of the silicon is not much of an issue - Most data enter costs are dominated by power/cooling over the lifetime of the product.
According to this, the average datacenter PUE ratio is 1.58.

According to this, datacenter electricity costs range from $0.047 to $0.15 per kWh.

Let's take the upper end of that range, which works out to $1.31 per Watt-year. So, even if the GPU is running at max power, 24/7, it's only costing $1454 per year, to power. Now, you've also got the overhead of the server and networking gear, but those stay relatively fixed, irrespective of how fast the GPU is running, so factoring in those wouldn't strengthen the case for slowing the GPU.

So, let's compare that to the cost of the hardware. A DGX H100, containing 8x H100 SXM cards had a launch price of $482k, which works out to $60k per GPU. Current street price of a H100 (PCIe) seems to be about $30k. The useful service life of this hardware is only about 3-4 years, before it becomes too obsolete.

So, even if we assume a 4-year service life and the lowest price of $30k per H100, we're still talking about hardware that costs at least 5.16x as much to purchase as its lifetime energy costs (even at 100% duty cycle + including cooling), for that entire time!

Amiright, @helper800 ?

Also agreed. GB6 is a completely irrelevant benchmark. This is an AI accelerator. Using a CPU benchmark is ignoring three quarters of the transistors on the silicon.
I'm pretty certain the CCDs in it are the same ones in Genoa or Genoa-X EPYC CPUs. So, it would be weird if there were a real discrepancy between models of equal core count, after accounting for their relative clock speeds.

One wild card is the HBM3. I don't know how the best-case latency on it compares with DDR5. However, certainly the latency-under-load should be better, due to its higher bandwidth. And it's latency-under-load should be what the MT benchmark is seeing, especially if I'm right about it being memory-heavy.
 
  • Like
Reactions: helper800

bit_user

Titan
Ambassador
I stand chastised and corrected.
I hope that was gentle.

Thanks for the research and links!
I've said some of the same things you did, in the past. Talking about both GPU efficiency and about datacenters favoring energy costs.

Except, in this case, I had seen some troubling signs about datacenter GPUs being pushed probably to either their frequency or thermal limits, with hardware costs increasing so much and AI demand rising so high. Therefore, I figured that we were probably in somewhat exceptional territory and was curious both if that was true and to what extent. So, it was also educational for me to do the work and see for myself.
 
  • Like
Reactions: helper800