News AVX-512 Works Surprisingly Well on Ryzen 7040 Series Phoenix CPUs

Jul 14, 2023
3
4
15
AMD implemented AVX-512 differently than Intel to address power issues with the Intel AVX-512 implementation. Consequently, there is negligible power difference using AVX-512 on AMD. I have a Intel Cascade Lake machine where AVX-512 was a bust because continuous use of AVX-512 produces such extreme thermal throttling that it ran no faster than not using it.

The Phoronix articles has power use results that show no significant power or CPU frequency degradation when using AMD AVX-512 like the graphic below
onednn-rnni-u8s8f32-cpu-2.svgz
 
  • Like
Reactions: prtskg

Findecanor

Distinguished
Apr 7, 2015
257
175
18,860
AMD CPUs have multiple AVX execution units. For AVX-512, two units operate in lock-step, each operating on different lanes of the same vector.
VIA/Centair has a CPU that does the same thing.

AVX-512 doesn't just increase the register file to 512 bits, however. It is overall a more complete, more modern instruction set than AVX2, and it can operate also on 128-bit and 256-bit vectors.
You could probably get modest speed improvements over AVX2 just by using AVX-512 with 256-bit vectors (sometimes called "AVX-256").
 

bit_user

Polypheme
Ambassador
As usual, the devil is in the details.

First, these tests cover only a hand-picked selection of benchmarks that benefit from AVX-512.

Second, the GeoMean got badly skewed by a few OpenVINO tests which greatly benefited from some newer AVX-512 instructions in Zen 4 that the older Intel CPUs lack. If you exclude those benchmarks, then the speedup seen by Phoenix should more closely match the two Intel CPUs.

As for the baseline/absolute performance difference, those tests compare two quad-core Intel CPUs against an 8-core Phoenix. On the other hand, the Tiger Lake is using significantly more power than either of the other two, which is the main reason it's so much faster than Ice Lake (internally, the two have basically the same microarchitecture).
 

bit_user

Polypheme
Ambassador
To me the elephant in the room is how much power does it take up? IIRC, that was a major issue with AVX-512 workloads.
They measured that, too. I'm not sure if this is the same graphic @drhoi tried to post, but:
cpu-power-consumption-monitor-ptssm.svgz

Basically, here are the averages:

Model
Baseline (AVX/AVX2)​
AVX-512​
Core i7-1065G7
14.57 W​
15.11 W​
Core i7-1165G7
29.93 W​
28.73 W​
Ryzen 7 7840U
16.39 W​
15.88 W​

So, no significant difference between AVX-512 and not. Although, with these being laptops, they're probably just bouncing off their configured power limits in each case.
 
Basically, here are the averages:
Model
Baseline (AVX/AVX2)​
AVX-512​
Core i7-1065G7
14.57 W​
15.11 W​
Core i7-1165G7
29.93 W​
28.73 W​
Ryzen 7 7840U
16.39 W​
15.88 W​


So, no significant difference between AVX-512 and not. Although, with these being laptops, they're probably just bouncing off their configured power limits in each case.
I'm pretty sure this was just the low power limit being hit, rather than something like this (from https://www.anandtech.com/show/16535/intel-core-i7-11700k-review-blasting-off-with-rocket-lake/2)
121878.png


And am too lazy to find out if anyone did an AVX512/no-AVX512 test.
 
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
AMD CPUs have multiple AVX execution units. For AVX-512, two units operate in lock-step, each operating on different lanes of the same vector.
No, actually the way it works is by splitting most operations into two haves and sending them each down the same 256-bit execution port. This is like how Intel CPUs implemented SSE, prior to the Core microarchitecture.

Sandybridge actually implemented AVX the way you're saying, by fusing two 128-bit ports so that a 256-bit op could be dispatched every cycle.
 

bit_user

Polypheme
Ambassador
I'm pretty sure this was just the low power limit being hit, rather than something like this (from https://www.anandtech.com/show/16535/intel-core-i7-11700k-review-blasting-off-with-rocket-lake/2)
121878.png


And am too lazy to find out if anyone did an AVX512/no-AVX512 test.
Yeah, but also remember that Rocket Lake is the 14 nm backport of Ice Lake, whereas the Intel CPUs in Phoronix' benchmark are made on Intel 10 nm.

But, you're almost certainly right that the Intel would use more power on AVX-512, if not clock/power/thermally throttled. The reason Zen 4 might be an exception is that it can do the same amount of work per cycle, either using AVX/AVX2 instructions or AVX-512, given how they implemented AVX-512.

Edit: I found this benchmark of AVX-512 on Ice Lake SP showing AVX-512 used 23.3% more power for only 34.1% more performance:

On Sapphire Rapids, power consumption stayed about the same with as without, but performance jumped by 44.2%. In this case, OpenVINO seemed to play a much smaller role. So, I trust those results a bit more.
 
Last edited:
Yeah, but also remember that Rocket Lake is the 14 nm backport of Ice Lake, whereas the Intel CPUs in Phoronix' benchmark are made on Intel 10 nm.

But, you're almost certainly right that the Intel would use more power on AVX-512, if not clock/power/thermally throttled. The reason Zen 4 might be an exception is that it can do the same amount of work per cycle, either using AVX/AVX2 instructions or AVX-512, given how they implemented AVX-512.
What I'm trying to understand is if this is something I should actually be impressed by, or simply something that's on track for what's expected and this is just websites sensationalizing something again.

Although digging around some more, AnandTech did test AVX512 on Alder Lake and found that the power consumption issue went away. So had Intel kept it around, I'd expect this behavior going forward.
 
What I'm trying to understand is if this is something I should actually be impressed by, or simply something that's on track for what's expected and this is just websites sensationalizing something again.

Although digging around some more, AnandTech did test AVX512 on Alder Lake and found that the power consumption issue went away. So had Intel kept it around, I'd expect this behavior going forward.
Kinda irrelevant if you can't use it though.
 
The evidence that AVX512 doesn't cause a significant power draw in Alder Lake answers my question about power consumption issues that AVX512 had in the past.

So it's totally relevant if this pressures Intel to enable AVX512 in future chips.
Well then I welcome Intel to the game when they show up. But last I read Intel forbade it from being used on Alder Lake for...instability issues. They didn't want to deal with the support. But if I were to guess it was for higher yields and market segmentation.

Competition is a good thing. But until then AMD had a strong advantage in emulators (dolphin) and ai for home use/small business use
 

bit_user

Polypheme
Ambassador
What I'm trying to understand is if this is something I should actually be impressed by, or simply something that's on track for what's expected and this is just websites sensationalizing something again.
I'm not sure exactly which aspect you're referring to, but I'll tell you my main takeaways from Phoronix' testing:
  • AVX-512 offered performance improvements on all 3 CPUs, even at the same power.
  • For Intel client CPUs (i.e. without the extra FMA port), Sunny Cove and Willow Cove got about a 34% improvement from AVX-512 on workloads that stand to benefit from it.

I wish there were a GeoMean which excluded those fp16 OpenVINO tests, so we could say how the efficiency of Zen 4's implementation compared on an apples-to-apples basis. As it stands, that just skews the GeoMean too much for it to be comparable. I think we'd see roughly the same benefit from AVX-512 on all 3 CPUs.

Although digging around some more, AnandTech did test AVX512 on Alder Lake and found that the power consumption issue went away.
Yes, it stayed within its 241 W PL2, but it clearly did that by clock-throttling. That's exactly what the 14 nm Xeons did. Yes, the clock throttling is a lot less than on those Xeons, and that's a good thing. However, I wouldn't say that Intel's AVX-512 power consumption is completely solved. Enough not to really worry about it, though.
 
Last edited:

bit_user

Polypheme
Ambassador
So it's totally relevant if this pressures Intel to enable AVX512 in future chips.
So far, it doesn't look like it will return in Meteor Lake, Arrow Lake, or Lunar Lake. So, don't hold your breath.

until then AMD had a strong advantage in emulators (dolphin) and ai for home use/small business use
Intel and AMD are both going the route of dedicated AI engines.


AMD hasn't announced any plans to include Ryzen AI in desktop CPUs, but it's probably just a matter of time.
 
  • Like
Reactions: digitalgriffin