I don't entirely disagree, but there have been some interesting applications of it to accelerate string processing.
However, that data appears to be just for AVX2 (uploaded March 2021; its filename suggests it was measured on Zen 2 EPYC). When optimizing with AVX-512, they managed to find
another 60% performance improvement!
lemire.me
Something else about it that a lot of people might not know is that it's not restricted to processing 512-bit vectors. The same instructions will also operate on 128-bit and 256-bit operands. Furthermore, there are aspects of it which facilitate vectorization, such a dedicated set of mask registers that perform per-lane predication. It also doubles the number of software-visible vector registers. Along with a few other details, these improvements make it a superior alternative to all of the prior vector ISA extensions, such as the SSE family and AVX/AVX2.
When you look at it that way, its benefits really needn't be limited to "professional" and scientific applications. However, that's unlikely to happen, now that Intel withdrew support for it, on their mainstream CPUs. Instead, we'll have to wait for a couple more years, until AVX10 support rolls out and gains enough market share for developers to target. AVX10.1 is basically just window dressing on AVX-512, except it provides the option of having implementations limited to just 128-bit and 256-bit operands, which Intel has said they intend to use in their client CPUs.
For just matrix operations, homogeneous coordinates only need 128-bit (assuming fp32 coefficients). There are ways to use wider vectors than that, but mainly if you switch to a SIMD-oriented programming model.
CPU-based rendering and video compression also benefit from it, but perhaps you lump that in with "professional" applications.