News AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for single...

Admin · Jun 26, 2024

AMD's MI300X was tested by Chips and Cheese, looking at many low-level performance metrics and comparing the chip with rival Nvidia H100 in compute throughput and cache intensive benchmarks. Scaleout performance unfortunately wasn't tested, which is far more important for data center hardware, and there are plenty of other questions raised.

AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for single... : Read more

Makaveli · Jun 26, 2024

Why does it seems like this write up of the Chips and Cheese article has a negative spin on it in the conclusion here.

That whole take these results with a pinch of salt section....

edzieba · Jun 26, 2024

There's a big truckload of salt not mentioned with that pinch: they benchmarked the chips for model inference, but everyone is buying up H100s for model training. Two different workloads.

There are two other frontpage articles that gloss over this fundamental errror ([1] & [2]), which would be like comparing consumer GPUs based on which can play back FMVs with the lowest power - it's nice, but nothing to do with by you bought the GPU in the first place.

bit_user · Jun 26, 2024

edzieba said:
There's a big truckload of salt not mentioned with that pinch: they benchmarked the chips for model inference, but everyone is buying up H100s for model training. Two different workloads.

True, but it's not practical to train a LLM on a single GPU, no matter how powerful. There are lighter-weight models they could use to benchmark training, but then you'd have to consider whether those are a good proxy for LLM training performance.

MLPerf might be a good avenue to explore, though I have no experience with it:

Benchmark MLPerf Training | MLCommons Version 2.0 Results

The MLPerf Benchmark Suites measures how fast machine learning systems can train models to a target quality metric using v2.0 results.

mlcommons.org

Maybe @cheesecake1116 knows more about this decision.

KnightShadey · Jun 26, 2024

This article does seem to have a heavy slant on it due to all the salt & shade being pre-applied.

[tin foil hat]

*OR* since we're throwing around unsubstantiated ulterior motive accusations willy-nilly, could this be a hit-piece to undermine C&C's article in the lead-up to THG's own forthcoming article mentioned in yesterday's Geekbench thread? 🤔🤨 *dramatic conspiracy music plays*

[/remove tin foil]

These parts seem inaccurate at best, and possibly deliberately misleading given the rest of the tone, and statement of possible bias (which is just shy of a flat out accusation of intent to deceive IMO).

"Chips and Cheese also mentions getting specific help from AMD with its testing" ... "so there could be some bias in the benchmark results "

bookended with another accusation not supported by the original write-up..

as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems." No mention is made of any consultation with any Nvidia folks, and that suggests this is more of an AMD-sponsored look at the MI300X.

The use of the term sponsorship doubles-down on the implied bias, for something not conveyed by the testing or the acknowledgment itself which is more about ensuring external validity and reproducibility regarding the product that is the focus of the testing.

When THG does reviews and asks for the latest drivers for pre-release hardware, to ensure tests reflect shipping final production hardware, and doesn't allow competitors to provide pre-production drivers/software or early access to the review, does that then become a 'sponsored' article because it required assistance from the products mfr/vendor to ensure validity without equal time/efforts/words focused on the competitors hardware?

Sure there should be a bunch of caveats, and you can disagree with the chosen tests, methodology, but this article, and the above in particular goes well beyond that.

Even C&C's testing also shows there is a lot of untapped potential in the MI300X, and it's still on AMD to improve the ecosystem that can provide that potential to customers instead of leaving that potential/value locked behind less than stellar software support, and that shortcoming is clearly also addressed in the article too.

Perhaps in the effort to seem balanced THG's overshot the mark, but it has resulted in the appearance of defensive bias on Nvidia's behalf or worse. 🫤

I could be wrong, as you (we all) could be, but it sure feels like an attempt to undermine their work.

JarredWaltonGPU · Jun 26, 2024

KnightShadey said:
This article does seem to have a heavy slant on it due to all the salt & shade being pre-applied.

[tin foil hat]

*OR* since we're throwing around unsubstantiated ulterior motive accusations willy-nilly, could this be a hit-piece to undermine C&C's article in the lead-up to THG's own forthcoming article mentioned in yesterday's Geekbench thread? 🤔🤨 *dramatic conspiracy music plays*

[/remove tin foil]

These parts seem inaccurate at best, and possibly deliberately misleading given the rest of the tone, and statement of possible bias (which is just shy of a flat out accusation of intent to deceive IMO).

"Chips and Cheese also mentions getting specific help from AMD with its testing" ... "so there could be some bias in the benchmark results "

bookended with another accusation not supported by the original write-up..

as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems." No mention is made of any consultation with any Nvidia folks, and that suggests this is more of an AMD-sponsored look at the MI300X.

The use of the term sponsorship doubles-down on the implied bias, for something not conveyed by the testing or the acknowledgment itself which is more about ensuring external validity and reproducibility regarding the product that is the focus of the testing.

When THG does reviews and asks for the latest drivers for pre-release hardware, to ensure tests reflect shipping final production hardware, and doesn't allow competitors to provide pre-production drivers/software or early access to the review, does that then become a 'sponsored' article because it required assistance from the products mfr/vendor to ensure validity without equal time/efforts/words focused on the competitors hardware?

Sure there should be a bunch of caveats, and you can disagree with the chosen tests, methodology, but this article, and the above in particular goes well beyond that.

Even C&C's testing also shows there is a lot of untapped potential in the MI300X, and it's still on AMD to improve the ecosystem that can provide that potential to customers instead of leaving that potential/value locked behind less than stellar software support, and that shortcoming is clearly also addressed in the article too.

Perhaps in the effort to seem balanced THG's overshot the mark, but it has resulted in the appearance of defensive bias on Nvidia's behalf or worse. 🫤

I could be wrong, as you (we all) could be, but it sure feels like an attempt to undermine their work.

Whether you like it or not, receiving help to get things running from AMD and not receiving equivalent input from Nvidia is inherently biased. I appreciate what Chips and Cheese has done, I think it's interesting, and we wrote up an article to promote it. But we have to call out potential issues, because those issues are absolutely real.

FWIW, I have reached out to clamchowder and offered to provide my Nvidia contacts. Whether or not Nvidia will repond and offer help is beside the point. It should be allowed to at least give some suggestions. I always do this when looking at new and unusual benchmarks, and likewise when I run tests on a game where results seem odd.

I get accused of being heavily biased in favor of Nvidia on a regular basis — like for example because I have a bunch of ray tracing enabled games that I use in benchmarks and reviews. But not having those would show significant bias in the other direction. It's a catch-22, damned if you do, damned if you don't sort of situation. Chips and Cheese gets to be in the same boat, simply by virtue of writing about MI300X and H100. (What, no Ponte Vecchio benchmarks!? LOL)

What I can tell you is that I'm routinely in contact with representatives from all three GPU companies to field questions and potentially find workarounds to problems I encounter. The Stable Diffusion benchmarks are a great example of this. When I first started testing SD performance, I couldn't get it to work on anything but Nvidia GPUs, because they were the default that most public projects targeted. Over time, I received instructions that allowed me to get things running on AMD and Intel GPUs. It took over six months in some cases... and now we have Stable Diffusion 3 and ComfyUI, and I'm almost back to square one it feels.

bit_user · Jun 26, 2024

KnightShadey said:
This article does seem to have a heavy slant on it due to all the salt & shade being pre-applied.

[tin foil hat]

*OR* since we're throwing around unsubstantiated ulterior motive accusations willy-nilly, could this be a hit-piece to undermine C&C's article in the lead-up to THG's own forthcoming article mentioned in yesterday's Geekbench thread? 🤔🤨 *dramatic conspiracy music plays*

[/remove tin foil]

Nah, I think Toms knows they're not in quite the same market as Chips&Cheese. I've never seen them do anything to actively undermine another tech publication, like you're suggesting.

A lot of these writers and editors have worked for different publications and outlets. I doubt any of them would want to burn bridges like that, especially for such small potential upside.

cheesecake1116 · Jun 26, 2024

bit_user said:
True, but it's not practical to train a LLM on a single GPU, no matter how powerful. There are lighter-weight models they could use to benchmark training, but then you'd have to consider whether those are a good proxy for LLM training performance.

MLPerf might be a good avenue to explore, though I have no experience with it:

Benchmark MLPerf Training | MLCommons Version 2.0 Results

The MLPerf Benchmark Suites measures how fast machine learning systems can train models to a target quality metric using v2.0 results.

mlcommons.org

Maybe @cheesecake1116 knows more about this decision.

So we tried to run MLPerf training but it ended up being a lot more involved then I was thinking it was going to be and it would have taken more time then we had, it's why we didn't have any training data.... we did try....

As for MLPerf Inference, we decided that running actual LLM models on the hardware would be more interesting data.... that was our logic, for better or for worse.....

As for the Tom's article itself, I have major issues with it.....

For example, the paragraph that says that we used H100 PCIe for our inference results is just wrong... We clearly stated that our inference results were run on H100 SXM, even in the charts....
Also, we did not receive any assistance from AMD.... All AMD did was verify that our numbers were reproducible on their systems... that is all....
Frankly it was a judgement call on whether or not to even mention AMD here, but I thought it was more honest to say that we did reach out to AMD then to say nothing......

As for why no GH200 numbers for LLaMA3-70B, it's simple, we didn't have access to that machine anymore.... One of the Co-Authors', Neggles, rented that machine out, from Hydra.ai, for testing and we only had that machine for about 6 hours thanks to a mess up on Hydra's end. And we weren't just interested in the Hopper part of the system but also the Grace part of that system so we were running CPU microbenchmarks on it as well which limited the amount of time we had for testing even more.....

Luckily Neggles had access to a H100 SXM machine where we ran the rest of our inference testing, we didn't have the chance to rerun the rest of our suite hence why we disclaimed at the start why our numbers for the microbenchmarks and the OCL tests were run on H100 PCIe... Would I have liked to rerun all of our benchmarks on H100 SXM or GH200, of course... But we simply didn't have the time or resources to do so.....

That's why the article is as it is..... we are not a large site with a ton of resources.... we take what we can get and we do our best with what we have......

JarredWaltonGPU · Jun 26, 2024

cheesecake1116 said:
So we tried to run MLPerf training but it ended up being a lot more involved then I was thinking it was going to be and it would have taken more time then we had, it's why we didn't have any training data.... we did try....

As for MLPerf Inference, we decided that running actual LLM models on the hardware would be more interesting data.... that was our logic, for better or for worse.....

As for the Tom's article itself, I have major issues with it.....

For example, the paragraph that says that we used H100 PCIe for our inference results is just wrong... We clearly stated that our inference results were run on H100 SXM, even in the charts....
Also, we did not receive any assistance from AMD.... All AMD did was verify that our numbers were reproducible on their systems... that is all....
Frankly it was a judgement call on whether or not to even mention AMD here, but I thought it was more honest to say that we did reach out to AMD then to say nothing......

As for why no GH200 numbers for LLaMA3-70B, it's simple, we didn't have access to that machine anymore.... One of the Co-Authors', Neggles, rented that machine out, from Hydra.ai, for testing and we only had that machine for about 6 hours thanks to a mess up on Hydra's end. And we weren't just interested in the Hopper part of the system but also the Grace part of that system so we were running CPU microbenchmarks on it as well which limited the amount of time we had for testing even more.....

Luckily Neggles had access to a H100 SXM machine where we ran the rest of our inference testing, we didn't have the chance to rerun the rest of our suite hence why we disclaimed at the start why our numbers for the microbenchmarks and the OCL tests were run on H100 PCIe... Would I have liked to rerun all of our benchmarks on H100 SXM or GH200, of course... But we simply didn't have the time or resources to do so.....

That's why the article is as it is..... we are not a large site with a ton of resources.... we take what we can get and we do our best with what we have......

Thanks for coming here, cheesecake, and I really don't want any bad blood between us. News pieces do get funneled through and things slip in that maybe shouldn't be. I know my initial response when reading the writeup on your site was, "Wait, why didn't they ask Nvidia for comment / help / input?"

We've tried to make it clear in the text that the PCIe H100 was used for the low-level testing, while SXM H100 (and GH200) were used for the inference testing. If there's a specific sentence that I've missed that suggests PCIe was used, LMK and I'll fix that.

I've also toned down any rhetoric suggesting AMD sponsorship. That was my bad to begin with, because I've seen stuff over the years where it's obvious one company basically provided a lot of help to do specific testing that would make their products look better. And whether intentional or not, this MI300X piece feels a bit that way (mostly due to a lack of resources, which I totally get).

If there's anything still in the text (refresh it to get the latest updates) that really strikes you as being off base or wrong, let me know.

KnightShadey · Jun 26, 2024

JarredWaltonGPU said:
Whether you like it or not, receiving help to get things running from AMD and not receiving equivalent input from Nvidia is inherently biased.

It's not whether I like it or not, it's accusations towards actions that don't rise to the level of those statements, which at least to me, seems unwarranted and factually incorrect, or at the very least bad form, especially if you have experienced similarly likely baseless claims. 🤨
Again, did AMD get things running, or just help make sure that their results were valid for other MI300X? Those are two different claims. One implies a direct hand, the other implies confirming results or informing of dissimilarities with other MI300s

The criticism as presented seems too harsh for what amounts to the crime of omission or limitations due to time with the hardware which needed to be returned to Hot Aisle.

Additionally, it's not inherently biased, which implies a deliberate act for/against that is unreasonable (you had time but didn't bother) or prejudicial (your obvious hatred of Apple, errr.. ATi, errr.. whatever) to the point of invalidating the content, requiring a willful act. Whereas throughout C&C constantly acknowledged the limitations of their setup and testing, even questioning too favourable AMD results, including (but not limited to);

Starting the 3rd Paragraph;
" Please note that all of our H100 data, except for the inference data, was generated using the PCIe version of H100, which features slower HBM2e memory, fewer CUDA cores, and a reduced TDP of 350 watts. Our inference data was generated on a H100 SXM5 box which has 3.35TB per second of memory bandwidth (compared to the 2.04TB/s of H100 PCIe), 18 more SMs compared to the PCIe version, as well as having an increased TDP of 700 watts.:"

"My INT32 add test also behaves as if MI300X is packing values and performing math at double rate, but something definitely went wrong.."

"NVIDIA doesn’t advertise support for FP16 through OpenCL, so I’m testing on H100 by not doing the check and using the half datatype anyway."

and my personal fav as mentioned in the other thread the CGP test (which also included the Radeon 780M and 610M because... everyone is using those for training, and is Industry Standard hardware... or perhaps for humour & illumination... you decide);
"I spent about a day writing this and did not put effort into optimization. The code should be considered representative of what happens if you throw a cup of coffee at a high school intern and tell them to get cracking."
"The Raphael iGPU included with AMD’s Zen 4 desktop CPUs is another option. It can do the same work in 4.2 hours, which leaves time for a coffee break."

Sure it's not representative of the full picture due to choices made and limitations they expressed, but that is different from bias.

Also for what opened & ended with comments on the lack of CUDA support for AMD hardware and their struggles to address that, I don't see many reviewers saying they refuse to use nV tools as it disadvantages the others, they just say they tried to use equivalent tools... To what extent do you go with what you have in front of you and can easily access? Does everyone get a hotline to every IHV/ISV and delay their investigation?

nVidia has the same right as anyone else, the right of reply. And they will. of that I am sure. And if C&C get the chance I doubt they'd turn down an opportunity for a follow-up investigation.

My issue is that the level of accusations/criticism/skepticism of a well documented test is harsher than the criticism of THG's reporting of nVidia claims of future CopilotAsterixRT++ support that they are supposedly working on with M$, or AMD's IPC slide claims, or intel's claims, all with far FAR Less supporting details or caveats than that opening 3rd paragraph.

The balance of skepticism seems to be focused in the wrong area IMO...

... but as ever that my 2 frames worth, your mileage may vary. 🤷🏻‍♂️

KnightShadey · Jun 26, 2024

Well, guess I shouldn't have wasted time re-reading the article if Cheese was going to reply himself...... Oh well. 🤣

ps, I STILL stand by my Original Claim.. Soup Should Be Eaten WITH A SPORK !!🤪

bit_user · Jun 26, 2024

KnightShadey said:
Also for what opened & ended with comments on the lack of CUDA support for AMD hardware and their struggles to address that,

The Chips & Cheese piece barely mentioned HIP, which is AMD's CUDA compatibility layer. There's only a single, passing mention of one of its tools (HIPify), towards the end of the article.

cheesecake1116 · Jun 26, 2024

bit_user said:
The Chips & Cheese piece barely mentioned HIP, which is AMD's CUDA compatibility layer. There's only a single, passing mention of one of its tools (HIPify), towards the end of the article.

So... not quite...
It's a little confusing but ROCm is AMD's software stack which uses HIP and HIP libraries.
HIPIFY is the tool that can convert CUDA code to HIP code so that it can run on ROCm.

jp7189 · Jun 27, 2024

NVIDIA/CUDA tends to be first implemented and well optimized with ROCm being an afterthought to the point of being implemented and never tested in some projects due devs just simply not having a ROCm card. From my point of view, extra attention is required for AMD products to level the playing field with CUDA, and I wouldn't see a problem if AMD had a chance to provide optimizations whereas NVIDIA doesn't need to.

KnightShadey · Jun 27, 2024

cheesecake1116 said:
As for why no GH200 numbers for LLaMA3-70B, it's simple, we didn't have access to that machine anymore.... One of the Co-Authors', Neggles, rented that machine out, from Hydra.ai, for testing and we only had that machine for about 6 hours thanks to a mess up on Hydra's end. And we weren't just interested in the Hopper part of the system but also the Grace part of that system so we were running CPU microbenchmarks on it as well which limited the amount of time we had for testing even more.....

I have a feeling that that is where this Geekbench number came from.... 🤔

Nvidia 72-core Grace CPU Performance close to AMD 96-core Threadripper 7995WX

Nvidia's 72-core Grace CPU has recently appeared in GeekBench benchmark tests, showing performance levels close to AMD's 96-core Threadripper 7995WX. The Grace CPU's multi-core score of 74,400 not only rivals the 96-core AMD Threadripper but also surpasses Intel's 56-core Xeon W9-3495X.

www.guru3d.com

If Cheese is still poking in the forum/thread, it would be interesting to see if it was them, and also id they just ran it stock (as I would suspect for a coupla reasons) as Geekbench doesn't report Freq for that result.

Now people can talk about the absurdity of running Geekbench in a Grace too for balance. 😉

bit_user · Jun 27, 2024

KnightShadey said:
I have a feeling that that is where this Geekbench number came from.... 🤔

Nvidia 72-core Grace CPU Performance close to AMD 96-core Threadripper 7995WX

Nvidia's 72-core Grace CPU has recently appeared in GeekBench benchmark tests, showing performance levels close to AMD's 96-core Threadripper 7995WX. The Grace CPU's multi-core score of 74,400 not only rivals the 96-core AMD Threadripper but also surpasses Intel's 56-core Xeon W9-3495X.

www.guru3d.com

I haven't known Chips & Cheese to use GeekBench. Perhaps because it's too opaque and their benchmarking seems mostly focused on microbenchmarks that actually reveal details about what's going on inside of the hardware.

There's a company that builds workstations based on Nvidia Grace and Grace Hopper superchips. Maybe it was one of those systems.

GTPshop.ai

GPT shop inference tuning Nvidia GH200 GB200 computer PC workstation desktop server hardware system LLM AI artificial intelligence ML machine learning transformer

gptshop.ai

They even gave Phoronix access to run a battery of tests on it, so it's quite plausibly them.

KnightShadey · Jun 27, 2024

bit_user said:
I haven't known Chips & Cheese to use GeekBench...

So to sum up your answer: You don't know.

OK, thanks. 🤙

bit_user · Jun 27, 2024

KnightShadey said:
So to sum up your answer: You don't know.

I did check and found only 5 articles from their 5 year history where they actually ran (or tried to run) Geekbench. So, it's not impossible, but does suggest other possibilities should be considered.

Speaking of other possibilities, did you consider that it could be GPTshop.ai? If not, then maybe you should've included that in your summary of my post. Given their use of other generic and application-level benchmarks, it seems to me at least as likely. Perhaps even more likely, if Geekbench cannot be run without a GPU, since their workstations can accommodate a GPU, whereas most servers have just BMC graphics.

KnightShadey said:
OK, thanks. 🤙

You're welcome.

KnightShadey · Jun 27, 2024

bit_user said:
..then maybe you should've included that in your summary of my post.

Or, maybe, you missed the point. 🤔

However, in the spirit of fairness of the thread, I will give your editorial suggestion the full consideration it deserves. 😜

Edit: After giving it full consideration, along with the supporting evidence below, I feel confident.. you missed the point.

bit_user · Jun 27, 2024

KnightShadey said:
Or, maybe, you missed the point. 🤔

Summarizing other people's posts isn't something I typically see on here. I think my post was short enough that a summary wasn't warranted.

Also, you never acknowledged whether you were aware of GPTshop.ai or their Grace-based workstations. It seems to me that this information was new to you and a possibility that you had not considered while you were jumping to conclusions.

BTW, I have not found a conclusive answer on whether Geekbench can be run on a system without a GPU. If you look at all of the systems where ChipsAndCheese did run it, they all seem to be on PCs that were probably tested locally and would've had a dGPU or iGPU.

JarredWaltonGPU · Jun 27, 2024

bit_user said:
BTW, I have not found a conclusive answer on whether Geekbench can be run on a system without a GPU. If you look at all of the systems where ChipsAndCheese did run it, they all seem to be on PCs that were probably tested locally and would've had a dGPU or iGPU.

Pretty sure @cheesecake1116 said in the Geekbench MI300X thread that it was C&C that ran those results.

bit_user · Jun 27, 2024

JarredWaltonGPU said:
Pretty sure @cheesecake1116 said in the Geekbench MI300X thread that it was C&C that ran those results.

Thanks, Jarred. I believe the thread you mention is here:

https://forums.tomshardware.com/thr...lve-times-as-expensive.3847857/#post-23288627

In it, Cheese has only two short posts, neither of which mention Grace. I think that only came up in this thread.

Having reviewed the Geekbench database entry cited, a few interesting details jump out:

The entry doesn't list a GPU of any kind, so maybe that confirms one isn't required.
The hardware platform is a Supermicro server board that they don't officially sell separately, but I suppose we shouldn't rule out the possibility that GPTshop.ai worked out an arrangement to source it from them. However, the motherboard reported by Phoronix's tests of GPTshop.ai's system was reported as "Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G", which is also a server board. So, that confirms GPTshop basically just took an existing server board and slapped it in a pretty, windowed tower chassis. Given that, it's conceivable (if unlikely) that they switched board suppliers.
The OS is Rocky Linux, whereas the system Phoronix tested was running Ubuntu 23.10.

All of which is pointing away (though not conclusively) from GPTshop.ai and towards something else.

JarredWaltonGPU · Jun 28, 2024

bit_user said:
Thanks, Jarred. I believe the thread you mention is here:

https://forums.tomshardware.com/threads/amd-mi300x-posts-fastest-ever-geekbench-6-opencl-score-—-19-faster-than-rtx-4090-and-only-twelve-times-as-expensive.3847857/#post-23288627

In it, Cheese has only two short posts, neither of which mention Grace. I think that only came up in this thread.

Sorry, I think I missed the bit that you were wondering about Grace / GH100 / GH200. I thought we were talking about just the GB OpenCL result from the MI300X. 😎

News AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for single...

Administrator

Splendid

Distinguished

Titan

Reputable

Splendid

Titan

Distinguished

Splendid

Reputable

Reputable

Titan

Distinguished

Distinguished

Reputable

Titan

Reputable

Titan

Reputable

Titan

Splendid

Titan

Splendid

Share this page