News Intel Reveals More Ice Lake-SP Details: Claims 32-Core Ice Lake Is Faster Than 64-Core AMD EPYC

I wish Intel would leave AVX512 for their GPU accelerators and just stop trying to shove it into every CPU.

Unless you want every CPU manufacturer to double the decoder width of their CPU just to accomodate AVX152, I don't think it's worth it.
 
  • Like
Reactions: Gomez Addams
Intel will still be faster on AVX512. But AVX512 is a very specific scenario and very few apps need AVX512.
Yes in AVX512 the Ice Lake chip will be faster, but not by as much. However, what I was getting at is Intel was comparing their upcoming chip (not shipping at all until Q1 2021) to previous generation Epyc. Had Epyc not been shipping to people like cloud providers then the comparison would have been more valid. However, there is always the possibility that Intel couldn't get their hands on a Gen 3 Epyc.
 
You forgot to mention the fine print.....those programs are heavy with AVX512 workloads. Anything else and Ice Lake gets obliterated.

AVX512 or not it will matter for the markets this is used.

I do wonder how well AMD has been at rebuilding their relationships with HPC clients. I also hope they have the process capacity to fill orders if they start getting big in that market again.
 
I just about spit out my coffee upon reading that headline. A 32-core CPU beating a 64-core CPU in real world use? Lol, nope. How stupid does Intel think people are? Those sort of claims make Intel look desperate and deceitful. I don't care about your idealized benchmarks and cherry-picked usage scenario -- connect to my database and serve some customers faster.
 
I just about spit out my coffee upon reading that headline. A 32-core CPU beating a 64-core CPU in real world use? Lol, nope. How stupid does Intel think people are? Those sort of claims make Intel look desperate and deceitful. I don't care about your idealized benchmarks and cherry-picked usage scenario -- connect to my database and serve some customers faster.

Its very possible depending on the actual use case. Some applications benefit heavily from AVX instructions which Intel is vastly ahead of AMD in that respect. Now in an application that doesn't and relies n raw core count and power? Probably not.
 
avx512 speed is improved on Ice Lake Server. Also 8 memory channels vs 6 on prior server.

Sapphire Rapids adds bfloat16 in avx512 and in a new tiled matrix processing engine.

Yes, Intel keeps adding more to the CPU, but they have determined that this is more efficient than offload to GPU in some cases. The cases may be fewer as Intel adds the CXL biased cache coherency in Sapphire Rapids.

Intel's Client 2.0 presentations indicate they are moving towards more accelerators in the same package with CPU. Perhaps the interconnect protocols are the same, but with lower power and latency.

I've seen rumors that AMD will add avx512 in zen4. Any confirmation of that from AMD?
 
Did anyone notice that in the article it states that the chips are to launch in the 1st quarter of this year - 2020? 🙄
 
Intel will still be faster on AVX512. But AVX512 is a very specific scenario and very few apps need AVX512.

That's because Intel invented it as an instruction set they could market to speed up some workloads that will apply. As Linus Torvalds said: "I hope AVX-512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on,"
 
  • Like
Reactions: Conahl
AVX512 or not it will matter for the markets this is used.

...
That's not true AT ALL. Didn't you read the article?

Intel yet has to formally introduce its 3rd Generation Xeon Scalable 'Ice Lake' processors, but a number of its customers, including Korea Meteorological Administration, The Max Planck Computing and Data Facility, The National Institute of Advanced Industrial Science and Technology (AIST), The University of Tokyo, Osaka University, and Oracle have already announced plans to deploy the new CPUs for their HPC needs.

HPC IS the intended market for this chip because it isn't particularly competitive at anything else that does not use AVX512.

I find it rather amusing that there is not one word mentioned about power consumption. That is usually a very high priority for server systems. It is obvious HPC applications that use AVX512 are the specific target for this chip.
 
Its very possible depending on the actual use case. Some applications benefit heavily from AVX instructions which Intel is vastly ahead of AMD in that respect. Now in an application that doesn't and relies n raw core count and power? Probably not.
But those same applications (i.e. those that benefit from hugely parallel SIMD) may see a much larger benefit from GPU acceleration. E.g. one of the benchmarks that Intel touts in its slides is LAMMPS, a molecular dynamics simulator. Using AVX-512 significantly improves performance compared to running it on non-AVX-512 CPUs. But based on some benchmarks that Anandtech did on another molecular dynamics simulator, doing it on a GPU is still far faster than an AVX-512 CPU.

AVX-512 isn't Intel's first attempt at 512 bit SIMD on x86: they did it on Xeon Phi as well. Has HPC changed significantly since then? Does having it in the CPU itself rather than in a co-processor important enough that AVX-512 makes sense where Xeon Phi didn't? Or is Intel again trying to use x86 to do things that are better addressed with other tools? I honestly don't know.
 
Last edited:
But those same applications (i.e. those that benefit from hugely parallel SIMD) may see a much larger benefit from GPU acceleration. E.g. one of the benchmarks that Intel touts in its slides is LAMMPS, a molecular dynamics simulator. Using AVX-512 significantly improves performance compared to running it on non-AVX-512 CPUs. But based on some benchmarks that Anandtech did on another molecular dynamics simulator, doing it on a GPU is still far faster than an AVX-512 CPU.
That would mean that researchers would have to learn a whole new program which would set them back for months.
And that is if the other simulator is as accurate as the one they are using.
 
That would mean that researchers would have to learn a whole new program which would set them back for months.
And that is if the other simulator is as accurate as the one they are using.
Not if the software they're using already has GPU acceleration available, as LAMMPS does.
1.3.1. General features
[...]
GPU (CUDA and OpenCL), Intel Xeon Phi, and OpenMP support for many code features]
https://lammps.sandia.gov/doc/Speed_gpu.html

If you're running the same computations then accuracy isn't going to change based on GPU vs CPU.

Also, many compute programming models/frameworks/APIs (e.g. Intel's own OneAPI) are specifically designed to be hardware agnostic, to allow software to take advantage of whatever accelerators (e.g. GPUs) are available. So software developed using tools like those wouldn't necessarily even need any explicit development effort to run on GPUs.
 
Last edited:
Here is one of the best brief explanations I've found to illustrate / list some of the types of usage cases where I would imagine normal PC users might benefit from this type of hardware acceleration.

[Source - https://www.prowesscorp.com/what-is-intel-avx-512-and-why-does-it-matter/ ]
Intel AVX-512 increases the width of the register 16-fold compared to the original 32-bit register and enables twice the FLOPS enabled by Intel AVX2

Intel AVX-512 can accelerate performance for workloads and use cases such as scientific simulations, financial analytics, artificial intelligence (AI)/deep learning, 3D modeling and analysis, image and audio/video processing, cryptography, and data compression.

I would imagine that many video processes will eventually benefit from something like this, especially re-scaling, conversion and translation operations.
 
But those same applications (i.e. those that benefit from hugely parallel SIMD) may see a much larger benefit from GPU acceleration. E.g. one of the benchmarks that Intel touts in its slides is LAMMPS, a molecular dynamics simulator. Using AVX-512 significantly improves performance compared to running it on non-AVX-512 CPUs. But based on some benchmarks that Anandtech did on another molecular dynamics simulator, doing it on a GPU is still far faster than an AVX-512 CPU.

Well, yes, but gpus don't have access to 6TB of memory (or whatever the number is). Gpu's applicability is more limited.
 
  • Like
Reactions: TJ Hooker
GPUs do have access to the system RAM, it just has to transverse the PCIe bus. Done right the software would swap out the required data from VRAM to system RAM all the time.
PCIe bandwidth is less than cpu memory bandwidth. What you suggest is often not as practical for this reason and is harder to implement.