News AMD's Zen 5 AVX-512 performance tested — Zen 5 performs significantly better than Zen 4 on Linux without consuming any more power

Admin · Aug 19, 2024

Phoronix tested the Ryzen 9 9950X and Ryzen 9 7950X in a head-to-head face-off in AVX-512 workloads to see how the 9950X's AVX-512 implementation is much better.

AMD's Zen 5 AVX-512 performance tested — Zen 5 performs significantly better than Zen 4 on Linux without consuming any more power : Read more

Sleepy_Hollowed · Aug 19, 2024

Wowza, this is nuts.

I might compile a Linux distribution and it’s apps with AVX 512 enabled now just for the hell of it if I get my hands on one of these.

bit_user · Aug 19, 2024

The article said:
In the 90 benchmarks tested, the Ryzen 9 9950X saw an overall performance gain of 27% compared to the Ryzen 9 7950X with AVX turned on.

I think "AVX" might've been used as a shorthand, in this case. Just to be clear: what he did was test with AVX-512 vs. no AVX-512. In most or all of these benchmarks, there will be a fallback path involving AVX2. So, there's probably some form of AVX-family instructions being used in both cases. If not, the performance discrepancy would be even more stark!

Also, given no core count increase and virtually the same memory speeds and cache sizes, a 27% generational uplift is pretty massive! Intel has a CPU with 16 P-cores and AVX-512, called the Xeon W5-2465X. Its list price is about $1400 or about 2x what the R9 9950X costs and it features a TDP of 200W (PL2= 240W). Motherboard costs for those CPUs are probably also about 2x. I'd love to see the two face off on these AVX-512 benchmarks, because I'll bet the Ryzen stomps it at half the price and 3/4ths the power!

The article said:
the Zen 5 chip only consumed a couple more watts at full load than the AVX-512 disabled. On average, the 9950X consumed 205.19 watts at its peak with AVX-512 acceleration turned on. Turned off, the chip consumed 203.94 watts.

I think you read the chart in the way that made more sense. Paradoxically, the chart actually says each CPU used a little less power with AVX-512 on!

The temperature data is consistent with this: 2-3 degrees higher temps for AVX-512 off.

The article said:
utilizing AVX-512 made the 9950X run at slightly higher clock speeds

I know that's what his frequency charts say, but I think that's not what really happened. If you look closely, he charts only the highest frequency core, which is pretty silly for a test like this. Not all threads will be running AVX-512 heavy code paths and not all cores will have 2 SMT threads running on them. The more lightly-loaded cores will be the ones clocking higher.

I think the reason he just looked at peak frequency instead of the average was to exclude idle cores dragging down the mean. That might be fine for some benchmarks. However, in cases like this, such an approach really fails to provide the kind of insight we'd like to have.

In light of that, my conjecture for why power and temperatures decreased with AVX-512 on is that frequencies did indeed drop slightly, for the cores running AVX-512 -heavy code. That's the only sensible explanation I see for it.

AkroZ · Aug 19, 2024

It's not like average consumer need AVX-512, only specialized applications (mostly professional) use it.
Video games can slightly use it for 3D (matrix transformations => 3D animations), and some neural networks (IA) which is most likely why AMD put effort on it.

bit_user · Aug 19, 2024

AkroZ said:
It's not like average consumer need AVX-512, only specialized applications (mostly professional) use it.

I don't entirely disagree, but there have been some interesting applications of it to accelerate string processing.

https://github.com/simdjson/simdjson?tab=readme-ov-file#performance-results

However, that data appears to be just for AVX2 (uploaded March 2021; its filename suggests it was measured on Zen 2 EPYC). When optimizing with AVX-512, they managed to find another 60% performance improvement!

Parsing JSON faster with Intel AVX-512 – Daniel Lemire's blog

lemire.me

Something else about it that a lot of people might not know is that it's not restricted to processing 512-bit vectors. The same instructions will also operate on 128-bit and 256-bit operands. Furthermore, there are aspects of it which facilitate vectorization, such a dedicated set of mask registers that perform per-lane predication. It also doubles the number of software-visible vector registers. Along with a few other details, these improvements make it a superior alternative to all of the prior vector ISA extensions, such as the SSE family and AVX/AVX2.

When you look at it that way, its benefits really needn't be limited to "professional" and scientific applications. However, that's unlikely to happen, now that Intel withdrew support for it, on their mainstream CPUs. Instead, we'll have to wait for a couple more years, until AVX10 support rolls out and gains enough market share for developers to target. AVX10.1 is basically just window dressing on AVX-512, except it provides the option of having implementations limited to just 128-bit and 256-bit operands, which Intel has said they intend to use in their client CPUs.

AkroZ said:
Video games can slightly use it for 3D (matrix transformations => 3D animations),

For just matrix operations, homogeneous coordinates only need 128-bit (assuming fp32 coefficients). There are ways to use wider vectors than that, but mainly if you switch to a SIMD-oriented programming model.

AkroZ said:
and some neural networks (IA) which is most likely why AMD put effort on it.

CPU-based rendering and video compression also benefit from it, but perhaps you lump that in with "professional" applications.

jeremyj_83 · Aug 19, 2024

bit_user said:
I don't entirely disagree, but there have been some interesting applications of it to accelerate string processing.

https://github.com/simdjson/simdjson?tab=readme-ov-file#performance-results

However, that data appears to be just for AVX2 (uploaded March 2021; its filename suggests it was measured on Zen 2 EPYC). When optimizing with AVX-512, they managed to find another 60% performance improvement!

Parsing JSON faster with Intel AVX-512 – Daniel Lemire's blog

lemire.me

Something else about it that a lot of people might not know is that it's not restricted to processing 512-bit vectors. The same instructions will also operate on 128-bit and 256-bit operands. Furthermore, there are aspects of it which facilitate vectorization, such a dedicated set of mask registers that perform per-lane predication. It also doubles the number of software-visible vector registers. Along with a few other details, these improvements make it a superior alternative to all of the prior vector ISA extensions, such as the SSE family and AVX/AVX2.

When you look at it that way, its benefits really needn't be limited to "professional" and scientific applications. However, that's unlikely to happen, now that Intel withdrew support for it, on their mainstream CPUs. Instead, we'll have to wait for a couple more years, until AVX10 support rolls out and gains enough market share for developers to target. AVX10.1 is basically just window dressing on AVX-512, except it provides the option of having implementations limited to just 128-bit and 256-bit operands, which Intel has said they intend to use in their client CPUs.

For just matrix operations, homogeneous coordinates only need 128-bit (assuming fp32 coefficients). There are ways to use wider vectors than that, but mainly if you switch to a SIMD-oriented programming model.

CPU-based rendering and video compression also benefit from it, but perhaps you lump that in with "professional" applications.

Do you know if a single AVX512 pipe can take 2x 256bit or 4x 128bit instructions in parallel instead of a single 512bit instruction? If so it makes even more sense to put it into a CPU as it would help with the other instructions as well.

bit_user · Aug 19, 2024

jeremyj_83 said:
Do you know if a single AVX512 pipe can take 2x 256bit or 4x 128bit instructions in parallel instead of a single 512bit instruction? If so it makes even more sense to put it into a CPU as it would help with the other instructions as well.

Depends on the implementation. In Zen 4, the data would suggest not. Otherwise, we should expect to see a smaller relative benefit from enabling AVX-512.

This writeup certainly arrives at that conclusion:

"the only way to utilize all this new hardware is to use 512-bit instructions. None of the 512-bit hardware can be split to service 256-bit instructions at twice the throughput. The upper-half of all the 512-bit hardware is "use it or lose it". The only way to use them is to use 512-bit instructions."

http://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teardown/#512_bit_required

IMO, the most natural way to divide up 512-bit pipelines would be if they were actually 256-bit and most AVX-512 ops were implemented as a pair of 256-bit ops. That's similar to (if not exactly) what Zen 4 did and it's perhaps why Zen 4's AVX2 performance was closer to that of Zen 5.

In Lion's Cove, Intel is apparently adding 33% more capacity to their set of 256-bit pipes:

Source: https://www.tomshardware.com/pc-com...pc-gain-for-e-cores-16-ipc-gain-for-p-cores/2

Search

News AMD's Zen 5 AVX-512 performance tested — Zen 5 performs significantly better than Zen 4 on Linux without consuming any more power

Admin

Administrator

Sleepy_Hollowed

Distinguished

bit_user

Titan

AkroZ

Reputable

bit_user

Titan

Parsing JSON faster with Intel AVX-512 – Daniel Lemire's blog

jeremyj_83

Glorious

Parsing JSON faster with Intel AVX-512 – Daniel Lemire's blog

bit_user

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page