News 6.3 GHz Ryzen 9 9700X beats 7.1 GHz Core i9-14900KF in liquid nitrogen AVX showdown

Some guys just want a htpc and never listen something from it.

Other guys think a 360 rad is a good thing.

And have the LN2 people who got money to make incredible news, Who no one can use...
 

emike09

Distinguished
Jun 8, 2011
174
168
18,760
AVX512 will always outperform AVX2 if written properly and used in the right scenario. IPC being the same, Intel's 13/14th Gen's lack of AVX512 will obviously fail here due to the lack of AVX512 support on consumer chips. Benchmark against Intel's latest overclockable workstation Xeon CPUs and the story could be very different. This is a terrible comparison article and reads out as fanboy-ism for AMD.

Intel dropped AVX512 from consumer CPUs for good reason. Consumers and most power users don't use AVX512. AVX-512 was designed to enhance performance for certain high-performance computing and server applications. For consumers, it's a waste of silicon, adds complexity to design, runs hot, and increases development cost.

Intel's decision to drop AVX512 from consumer chips was a strategic one, aiming to balance silicon space, performance, power efficiency, and cost considerations while targeting the needs of the broader consumer market. Given the limited use case in consumer applications, the development costs associated with integrating and supporting AVX-512 outweighs the benefits for the consumer market.
 
  • Like
Reactions: rluker5

bit_user

Titan
Ambassador
Intel dropped AVX512 from consumer CPUs for good reason. Consumers and most power users don't use AVX512. AVX-512 was designed to enhance performance for certain high-performance computing and server applications. For consumers, it's a waste of silicon, adds complexity to design, runs hot, and increases development cost.
The problem with this logic is that AVX10.1 incorporates all of the AVX-512 instructions that were natively implemented in Golden Cove, and that is coming to consumer CPUs. So, it's not as if Intel actually doesn't want to give consumers those capabilities. The problem is with the maximum vector width (i.e. 512-bit), which would seriously undermine their E-core strategy.

Given the limited use case in consumer applications, the development costs associated with integrating and supporting AVX-512 outweighs the benefits for the consumer market.
As for the general applicability of AVX-512, it has features which make it much better for auto-vectorization, such as per-lane predication. Lately, a trend seems to be for people to use it in acceleration of string operations and other sorts of generic computation:

Not surprisingly, it's even good for video processing and encoding:

As mentioned above, the development costs for AVX10 will be the same - and that's definitely coming to client CPUs. So, it's not a case that it doesn't make sense - just that they went too far in making it 512-bit.
 
AVX512 will always outperform AVX2 if written properly and used in the right scenario. IPC being the same, Intel's 13/14th Gen's lack of AVX512 will obviously fail here due to the lack of AVX512 support on consumer chips. Benchmark against Intel's latest overclockable workstation Xeon CPUs and the story could be very different. This is a terrible comparison article and reads out as fanboy-ism for AMD.

Intel dropped AVX512 from consumer CPUs for good reason. Consumers and most power users don't use AVX512. AVX-512 was designed to enhance performance for certain high-performance computing and server applications. For consumers, it's a waste of silicon, adds complexity to design, runs hot, and increases development cost.

Intel's decision to drop AVX512 from consumer chips was a strategic one, aiming to balance silicon space, performance, power efficiency, and cost considerations while targeting the needs of the broader consumer market. Given the limited use case in consumer applications, the development costs associated with integrating and supporting AVX-512 outweighs the benefits for the consumer market.
It seems that Intel’s AVX-512 architectural layout had a major fault, it ran so hot that Intel had to create a downclock offset whenever it is being used. AMD’s version does not have this problem. Perhaps when Intel reintroduces 512-bit vector compute with AVX10, their redesigned logic will not suffer the same problem.
 

bit_user

Titan
Ambassador
It seems that Intel’s AVX-512 architectural layout had a major fault, it ran so hot that Intel had to create a downclock offset whenever it is being used. AMD’s version does not have this problem.
That's because Intel introduced it at 14 nm. People tested Intel's implementation in Alder Lake (back when you still could) and found neither a significant clock penalty, nor excessive temperatures. 14 nm was just too soon to introduce it into a high-clocking CPU (though fine for Xeon Phi, which only boosted up to 1.7 GHz (1.5 GHz base)).

IMO, the main thing AMD got right was to wait until TSMC N5 to do it. Sure, they also had half the FMA ports and everything in their backed basically retained the same widths as Zen 3.

In the N4P-based Zen 5 (desktop/server), not only did they double the width of the pipes, but they also doubled the number of FMA ports, to equal Golden Cove Server's AVX-512 pipeline arrangement.

9irtERrrsqbgmsY9UNZTc6.jpg


Compare to:


Perhaps when Intel reintroduces 512-bit vector compute with AVX10, their redesigned logic will not suffer the same problem.
Lion Cove already increased the FP pipelines by 33%. It'd be interesting if their first core to implement AVX10/256 had double the issue ports at half the width of their Golden Cove server cores.
 
Some guys just want a htpc and never listen something from it.

Other guys think a 360 rad is a good thing.

And have the LN2 people who got money to make incredible news, Who no one can use...
you can use it a bit, as leaders in LN2 also do best in regular overclocking, at least with 9700x you can shove 170W on it, slap 360 AIO, and enjoy really nice performance for low cost.
7700x works even better if you dont OC the hell of out of them
Lion Cove already increased the FP pipelines by 33%. It'd be interesting if their first core to implement AVX10/256 had double the issue ports at half the width of their Golden Cove server cores.
if we ever see them that is. Release date is always in the classic forever "tomorrow" scenario.
 
I didn't see it in the article nor while skimming the video. Was it mentioned if the e-core on the 14900KS were still enabled? Gracemont e-core have AVX2 and even if they were only at base clock that still would give the 14900KS an extra 16 cores to crunch AVX2 instructions. That means that if they were enabled the AVX512 in Zen 5 is MUCH faster than AVX2 in Raptor/Gracemont.
 
That's because Intel introduced it at 14 nm. People tested Intel's implementation in Alder Lake (back when you still could) and found neither a significant clock penalty, nor excessive temperatures. 14 nm was just too soon to introduce it into a high-clocking CPU (though fine for Xeon Phi, which only boosted up to 1.7 GHz (1.5 GHz base)).

IMO, the main thing AMD got right was to wait until TSMC N5 to do it. Sure, they also had half the FMA ports and everything in their backed basically retained the same widths as Zen 3.

In the N4P-based Zen 5 (desktop/server), not only did they double the width of the pipes, but they also doubled the number of FMA ports, to equal Golden Cove Server's AVX-512 pipeline arrangement.

9irtERrrsqbgmsY9UNZTc6.jpg


Compare to:



Lion Cove already increased the FP pipelines by 33%. It'd be interesting if their first core to implement AVX10/256 had double the issue ports at half the width of their Golden Cove server cores.
Thank you good sir, I stand humbly corrected!!!! 14nm being the cause makes perfect sense. And thinking about your double ports hypothesis, it would help to keep the FP pipelines saturated would it not? If so, I wonder how this vs AMD’s full 512 bit rendition would compare.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
If so, I wonder how this vs AMD’s full 512 bit rendition would compare.
AMD implemented a dual strategy, with Zen 5. In the laptop cores (and I don't mean just the C-cores, but also the full-size Zen 5 laptop cores), they opted to go with the Zen 4 approach of a 256-bit pipeline width. This could be seen as validation of Intel's strategy of sticking with 256-bit for client CPUs.

More:
  1. https://chipsandcheese.com/2024/08/10/amds-strix-point-zen-5-hits-mobile/
  2. https://chipsandcheese.com/2024/08/20/zen-5-variants-and-more-clock-for-clock/

Note that #1 was posted before the review embargo on the Zen 5 desktop products was lifted. So, it lacks any real-world comparative analysis. Thankfully, that's covered in the 9950X article (above) and the "Zen 5 Variants ..." article.
 
AMD implemented a dual strategy, with Zen 5. In the laptop cores (and I don't mean just the C-cores, but also the full-size Zen 5 laptop cores), they opted to go with the Zen 4 approach of a 256-bit pipeline width. This could be seen as validation of Intel's strategy of sticking with 256-bit for client CPUs.

More:
  1. https://chipsandcheese.com/2024/08/10/amds-strix-point-zen-5-hits-mobile/
  2. https://chipsandcheese.com/2024/08/20/zen-5-variants-and-more-clock-for-clock/

Note that #1 was posted before the review embargo on the Zen 5 desktop products was lifted. So, it lacks any real-world comparative analysis. Thankfully, that's covered in the 9950X article (above) and the "Zen 5 Variants ..." article.
Well that matches markets pretty well, 512 makes sense for data center and workstation use cases and no one realistically would use a laptop for serious vector crunching. Knowing that data center and laptop market segments are more lucrative for AMD, this design variance to me reinforces the notion that their desktop lineup is way to sell workstation dies that do not meet power efficiency requirements (IE binning server dies for low leakage/voltage stability, and the rest are used for desktop).

In a way I think AMD is smart for spending R&D on their biggest market to optimize core design for datacenter, then brute forcing gaming performance by slapping 3D cache on top.
 

bit_user

Titan
Ambassador
In a way I think AMD is smart for spending R&D on their biggest market to optimize core design for datacenter, then brute forcing gaming performance by slapping 3D cache on top.
Their solution for "brute forcing" gaming performance was actually to crank up the frequency and power limits.

It somewhat recently came out that the X3D desktop CPUs were basically an accident. When testing Milan-X (the Zen 3-based EPYC that first featured 3D V-cache), they had 7 spare dies with V-cache and someone decided to put them in desktop packages and benchmark them. Once they discovered how much gaming performance benefited from the extra L3 cache, a new product was born!

I'm trying to find the source on this, but not having success. It came directly from an interview with an AMD engineer and they showed off one of the prototypes that had 3D V-cache on both CCDs.
 

bit_user

Titan
Ambassador
Is it actually gone in the newer P-cores as opposed to disabled like in Alder Lake?
Excellent question. I hoped someone would do die shot analysis to confirm whether Raptor Lake got rid of it, but I haven't found anything like that. The next question is whether Redwood Cove has it.

I'm going out on a limb and guessing that Lion Cove lacks it, given that Intel says they optimized away all the structures to support Hyperthreading, yet retained these in the server version. With such a big divergence between their client and server P-cores, I'd be surprised if they wasted die area on AVX-512 with no intention to use it. Especially when paying even more to fab the dies on TSMC, which they've said is going to really hurt their margins on the Gen 15 products.
 
  • Like
Reactions: Li Ken-un