Review AMD 4th-Gen EPYC Genoa 9654, 9554, and 9374F Review: 96 Cores, Zen 4 and 5nm Disrupt the Data Center

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

bit_user

Titan
Ambassador
This means that it actually takes two clock cycles to execute an AVX-512 instruction (256b data path), but the architecture executes the issue and retire for AVX-512 in one operation.
No, it can issue one instruction per port, every 2 cycles. Because they have like 6 vector-FP ports, you can issue a max of more than 1 AVX-512 instruction per cycle (should be up to 3, if you have the right mix and no outstanding data-dependencies).

"Double-pumped" isn't the right way to describe it, but it sounds good. I think that's why AMD's marketing settled on the term.
 
Last edited:
  • Like
Reactions: prtskg

domih

Reputable
Jan 31, 2020
205
183
4,760
I just looked at the numbers of OpenSSL and I just laughed... Intel is SO screwed for general purpose machines.

If you want to laugh even more look at the benchmark results from Michael on Phoronix. Oh my, my, Oh my. It's murder and bloodbath in the racks. INTEL is going to need to work harder and longer to be able to catch AMD EPYC. INTEL will eventually achieve parity, but they are going to lose a lot more market shares for the next few years. EPYC Rome was the first salvo dethroning INTEL in terms of performance, density and TCO. Genoa and its upcoming derivatives are the second, third and fourth salvos.

The commenters who feel there is a bias in TH reporting should not worry too much. TH has zero influence on choices made by Enterprise, Data Center, CAD and Hollywood decision makers.
 

bit_user

Titan
Ambassador
@PaulAlcorn , thanks for the review!

@JarredWaltonGPU , just to chip in my $0.02: I think it would help if the site had a standard definition of what the different numbers of stars mean. Perhaps something like:
  • 5.0 stars: perfectly-executed product that introduces significant new features/functionality at existing price points, or offers leading performance and feature-parity below the established price points.
  • 4.5 stars: perfectly-executed, while not pushing the envelope on features or pricing. Still represents a solid value.
  • 4.0 stars: solid execution, with caveats. Interesting, from a features or pricing perspective.
  • 3.5 stars: solid execution, with caveats. Features & pricing comparable to existing market offerings.
  • 3.0 stars: usable execution, with significant caveats. Features & pricing comparable to existing market offerings. Or solid execution, with caveats + missing significant features or unattractive price point.
  • 2.5 stars: usable execution, with significant caveats. Missing significant features or unattractive price point.
  • 2.0 stars: problematic execution. Otherwise, competitive.
  • 1.5 stars: problematic execution. Otherwise, uncompetitive.
  • 1.0 stars: virtually unusable. Otherwise, competitive.
  • 0.5 stars: virtually unusable. Otherwise, uncompetitive.
  • 0.0 stars: completely unusable.
Obviously, there's some room to trade points between usability, features, pricing, etc. However, by establishing some kind of standard ratings scale, the scores should be less controversial. As for the top of the scale, my thinking is that a 5.0 would be truly rare - almost to the point that they must've erred and priced it too low, if it gets a 5.0.

As for this particular review, what I'd probably give it is a 4.5. However, my rationale would be that pricing increased too much on the 64-core version, relative to inflation. In the features category, it gets credit for CXL + PCIe 5.0, but I don't consider DDR5 a "feature" . Its adoption of AVX-512 only catches up with Intel, so that also doesn't earn any extra credit.
 

bit_user

Titan
Ambassador
I'd have a hard time discussing competitiveness on something already bearing the "virtually unusable" label no matter how much competition there is for virtually unusable stuff :)
I just meant that pricing & features had parity with competitors. This fact can be relevant if/when the manufacturer fixes whatever issues kept it from being usable.

However, I generally agree that once you get to about 2 stars and below, the details are almost irrelevant to most readers.
 

gruffi

Distinguished
Jun 26, 2009
44
32
18,560
Okay, for those more knowledgeable than me (I'm not really into server tech), how is it that Intel is so far behind in terms of core count with these systems? Looking at some of the benches (and I might as well be blind in both eyes and using a magnifying glass to scroll through the data!) it seems to me that if Intel were able to increase core count, that they would be comparable in performance to the AMD counterparts?
Not really. As you can see in the benchmark charts, 24 Genoa cores can easily compete with 40 Ice Lake Xeon cores. It's not just the core count. There might be more reasons for Intel's inability to compete but I see three main problems:

  1. Intel's p-core is very inefficient compared to AMD's Zen core, power and especially area wise. For example, Zen 3 and Golden Cove are using process nodes with similar density. While Golden Cove offers ~10% higher IPC and slightly higher clock speed it needs massive more die area, ~75%. Zen 4 improves that advantage even further. It is only about half the size of Raptor Cove but offers similar IPC and clock speed.
  2. Process node advantage. TSMC's 7nm and Intel's 10nm (Intel 7) are quite comparable. But Zen 4 is using TSMC's advanced 5nm technology. Intel simply cannot offer something comparable. Their process technology is quite significantly behind because of their 10nm fiasco.
  3. Chiplets. Intel can only design monolithic Xeons. At least so far. That's why their core count is even more limited. AMD is much more flexible with their chiplet designs. One 8-core Zen 4 chiplet is tiny with only ~72mm². It's easy to produce, with excellent yields, and all you need to make packages with up to 96 cores (up to 12 CPU chiplets). Ice Lake Xeon is a massive 660 mm². There are way less fully functional dies. That's why it has some reserve cores. There are 42 cores in total. But only 40 of them are enabled. Which is somewhat wasting silicon.
 

bit_user

Titan
Ambassador
  1. Intel's p-core is very inefficient compared to AMD's Zen core, power and especially area wise. For example, Zen 3 and Golden Cove are using process nodes with similar density. While Golden Cove offers ~10% higher IPC and slightly higher clock speed it needs massive more die area, ~75%. Zen 4 improves that advantage even further. It is only about half the size of Raptor Cove but offers similar IPC and clock speed.
  2. Process node advantage. TSMC's 7nm and Intel's 10nm (Intel 7) are quite comparable. But Zen 4 is using TSMC's advanced 5nm technology. Intel simply cannot offer something comparable. Their process technology is quite significantly behind because of their 10nm fiasco.
  3. Chiplets. Intel can only design monolithic Xeons. At least so far. That's why their core count is even more limited. AMD is much more flexible with their chiplet designs. One 8-core Zen 4 chiplet is tiny with only ~72mm². It's easy to produce, with excellent yields, and all you need to make packages with up to 96 cores (up to 12 CPU chiplets). Ice Lake Xeon is a massive 660 mm². There are way less fully functional dies. That's why it has some reserve cores. There are 42 cores in total. But only 40 of them are enabled. Which is somewhat wasting silicon.
Ice Lake uses 2 generations earlier core micro-architecture and manufacturing node than Alder Lake. So, the stuff you said about P-cores and Intel 7 is irrelevant, here. We're seeing 2022's Zen 4 compete against an Intel design & node that first launched back in 2019.

As for chiplets, the Sapphire Rapids Xeons seem to use up to 4 tiles, for the top-tier (non-Max) models. And yet, yield is reportedly still a major issue for them.

In January we will finally see how Genoa compares with a Xeon comprised of Golden Cove P-cores made on Intel 7.
 

gruffi

Distinguished
Jun 26, 2009
44
32
18,560
Ice Lake uses 2 generations earlier core micro-architecture and manufacturing node than Alder Lake.
Not really. It's more like one generation. Sunny Cove and Willow Cove aren't that much different. The IPC is almost the same. So, it's Skylake -> Sunny Cove / Willow Cove -> Golden Cove / Raptor Cove. But that's not relevant. Ice Lake is using Intel's 10nm process, just like SPR. That's more of a problem for them.

So, the stuff you said about P-cores and Intel 7 is irrelevant, here.
No, it isn't. It's absolutely relevant and one of the main reasons why Intel is trailing so much behind technologically. Zen 3 and Golden Cove are using similar process nodes. Intel 7 is said to have even somewhat better density. According to WikiChip it's ~91 MTr/mm² for TSMC's N7 vs ~101 MTr/mm² for Intel 7. So, while Golden Cove offers ~10% higher IPC and somewhat higher clock speed, resulting in ~15% higher performance, it also needs massively more die area, ~75% (~4mm² vs ~7mm² per core including L2). The performance per area of Zen 3 already is waayyyyy better. Intel's upcoming SPR features exactly that Golden Cove core. There might be some server specific tweaks. But don't expect any miracles. While AMD already takes the next step with Zen 4. Which improves the significant efficiency advantages of Zen 3 even further.

Intel urgently needs a much more efficient p-core. That's what the Royal core is rumored to be, a new grounds up architecture. But it isn't expected before Nova Lake somewhere around 2025. Their "e-core" also won't help them until then because it isn't any better than AMD's p-core efficiency wise. And AMD will launch their first design based on the new c-core next year. Which will improve efficiency even more, while offering clearly higher performance than Intel's "e-core".

In January we will finally see how Genoa compares with a Xeon comprised of Golden Cove P-cores made on Intel 7.
Genoa completely destroys everything that Intel can offer right now. A 24-core Genoa can already compete with a 40-core Ice Lake Xeon in performance while being much more power efficient. What do you expect from a Xeon with 40% more cores (56 vs 40), 10-20% higher IPC and somewhat higher clock speed? Exactly, not that much. I'm not even sure if that's enough to compete with Genoa at similar core counts. At least performance wise. Efficiency wise likely not. But AMD can offer >70% more cores on a single package, with clearly better power efficiency. And there will be Bergamo in 1H 2023. Which offers up to 128 power and area optimized Zen 4c cores. SPR's HBM might shine in some areas. But that will easily be countered by AMD with V-Cache based SKUs. I rather see the gap widening than narrowing for Intel. With Milan AMD could offer 60% more cores on a similar process node. With Genoa / Bergamo AMD can offer ~70-130% more cores on a better process node. Intel is dead for now in the high performance professional market. And the main reasons are the ones I already mentioned. Intel can hide their deficits in the client market to some degree. Because core count or maximum power consumption aren't much of a problem here. But that won't work in every segment.
 

bit_user

Titan
Ambassador
Not really. It's more like one generation. Sunny Cove and Willow Cove aren't that much different. The IPC is almost the same.
But the process node - and therefore peak clocks & power-efficiency - are that much different. You can't just gloss over such details.

Ice Lake is using Intel's 10nm process, just like SPR.
That's even more misleading than equating Skylake's version of 14 nm with Rocket Lake's 14 nm++++. If Ice Lake had use 10 nm ESF, it wouldn't have sucked and Zen 2 would've gotten smashed by it. Intel wouldn't have had to make Whiskey Lake, either.

What do you expect from a Xeon with 40% more cores (56 vs 40), 10-20% higher IPC and somewhat higher clock speed?
Intel will have a several niches, where they can pull ahead. Beyond that, they'll have to compete on pricing and with Xeon Max.

SPR's HBM might shine in some areas. But that will easily be countered by AMD with V-Cache based SKUs.
No, they're different enough that I expect they'll each tend to excel in somewhat different areas.

Intel is dead for now in the high performance professional market.
They'll have a per-core lead in AVX-512 and they have AMX. They also scale to more CPUs (up to 8 sockets, I think). Most importantly, they have more volume manufacturing capacity than AMD has been able to get from TSMC, and I'm guessing this will remain the case.
 

gruffi

Distinguished
Jun 26, 2009
44
32
18,560
But the process node - and therefore peak clocks & power-efficiency - are that much different. You can't just gloss over such details.
No. Both are using Intel's 10nm. But that's irrelevant. The core logic itself hasn't changed much. From Skylake to Sunny Cove it was a big step. From Sunny Cove to Willow Cove it wasn't. 6th gen Skylake also had much lower clock speed than 10th gen Skylake. That doesn't mean anything. It's just the usual incremental process node optimization from gen to gen.

That's even more misleading ...
No, it isn't. It's a simple fact.

Intel will have a several niches, where they can pull ahead.
What niches? I don't see much in the 1P/2P market. And that's also not Intel's standards, to serve niches. All they can do is selling at low prices. But that will ruin their server business in the long run. Being the cheap and inferior brand isn't what professionals prefer.

Yes. The cases where more but slower HBM excels will be very rare compared to AMD's V-Cache.

They'll have a per-core lead in AVX-512 and they have AMX.
Irrelevant. Zen 4 supports AVX-512 as well. And it does so more effectively and more predictable because of the implementation. I doubt Intel will have a per core lead. Benchmarks of Zen 4 show otherwise. Genoa also will be less effected by throttling and still offers way more cores.

CcfMcfI.png


AMX also won't help Intel much. Especially if you have to pay for those extra features. As Intel's On Demand rumors suggest.

They also scale to more CPUs (up to 8 sockets, I think).
Again, irrelevant. 1P and 2P dominate the market. 4P and 8P are niche. No one buys 4P if you can get similar or better performance with 2P from the competition.

Most importantly, they have more volume manufacturing capacity than AMD has been able to get from TSMC, and I'm guessing this will remain the case.
Still no argument if customers don't want your hardware because the competition is so much better. That's just like a creeping death.
 

bit_user

Titan
Ambassador
No. Both are using Intel's 10nm.
That's incorrect. The process nodes they're using are very different. At this point, you seem to be dissembling so I give up.

From Skylake to Sunny Cove it was a big step. From Sunny Cove to Willow Cove it wasn't. 6th gen Skylake also had much lower clock speed than 10th gen Skylake. That doesn't mean anything. It's just the usual incremental process node optimization from gen to gen.
The funny thing is that this whole tangent is about how many generations of skew there were in your comparison. The fact that the number wasn't zero is the key point, and this entire line of debate seems designed to distract from that important and irrefutable fact.

Irrelevant. Zen 4 supports AVX-512 as well. And it does so more effectively and more predictable because of the implementation. I doubt Intel will have a per core lead. Benchmarks of Zen 4 show otherwise. Genoa also will be less effected by throttling and still offers way more cores.

CcfMcfI.png
This is another grossly invalid comparison. Rocket Lake was a single-FMA desktop CPU on 14 nm. The AMD CPU it's being compared against is made on TSMC N5. It has virtually zero predictive value in telling us how Sapphire Rapids' AVX-512 will compare with Epyc Genoa.

The basis for my assertion that Sapphire Rapids' AVX-512 will likely out-perform Genoa is based on the fact that the former has 2 FMAs, where Genoa has a throughput of only one AVX-512 FMA per cycle. Also, Genoa has a latency penalty, due to having to split its AVX-512 ops across 2 cycles. With that said, it's still speculation and there are other factors at play. We'll have to wait and see.

Again, irrelevant. 1P and 2P dominate the market. 4P and 8P are niche. No one buys 4P if you can get similar or better performance with 2P from the competition.
There are use cases for it, which is why Intel supports those configurations. That said, the mainstream is indeed 1P and 2P.

Still no argument if customers don't want your hardware because the competition is so much better. That's just like a creeping death.
If the lead time on Epyc hardware is too long or the pricing is too high, customers will continue to buy Intel. It's not a great strategy, but it should give them enough revenue to tide them over until Granite Rapids and Sierra Forest launch. Both of those should be more competitive.