News Breakthrough DL Training Algorithm on Intel Xeon CPU System Outperforms 8 Volta GPUs By 3.5x

Admin · Mar 5, 2020

Breakthrough DL Training Algorithm on Intel Xeon CPU System Outperforms 8 Volta GPUs By 3.5x : Read more

TechLurker · Mar 5, 2020

Now they just need to adapt it to AMD EPYC CPUs.

AnimeMania · Mar 5, 2020

TechLurker said:
Now they just need to adapt it to AMD EPYC CPUs.

I was thinking the same thing, more cores and larger caches.

JayNor · Mar 5, 2020

from the paper ... all the info on the CPUs/Threads is there:
"All the experiments are conducted on a server equipped with two 22-core/44-thread processors (Intel Xeon E5-2699A v4 2.40GHz) and one NVIDIA TeslaV100 Volta 32GB GPU. "

"As mentioned before, ourmachine has 44 cores, and each core can have 2 threads.However, we disable multithreading and the effective number of threads and cores is the same. Hence, we interchangeably use the words “threads” and “cores” from here on. We benchmark both frameworks with 2, 4, 8, 16, 32, 44 threads."

Gomez Addams · Mar 5, 2020

What I think would be interesting is if they were to adapt their algorithm to GPUs. Then it could be even more parallel.

Yes, I read they need a large memory and that could be prohibitive since GPUs max out at 32GB. Maybe Nvidia will up the ante at the end of this month and announce a GPU with 64GB of RAM, or more.

JayNor · Mar 6, 2020

The code is on github, from the link in the paper. The build options enable avx512.

https://github.com/keroro824/HashingDeepLearning/tree/master/SLIDE

ajlogo · Mar 6, 2020

I'm sorry but "Tom" has been played by Intel "research"

"Updated 11:00am PT: Corrected the article to reflect that the tests were conducted with a single V100 GPU. "
The original claim was with eight, good catch.

"The researchers have not talked about any plans or prospects of their algorithm for commercial adoption"

" this effectively moves the performance crown of fastest chip for training to CPUs."

alextheblue · Mar 8, 2020

ajlogo said:
I'm sorry but "Tom" has been played by Intel "research"

"Updated 11:00am PT: Corrected the article to reflect that the tests were conducted with a single V100 GPU. "
The original claim was with eight, good catch.

"The researchers have not talked about any plans or prospects of their algorithm for commercial adoption"

" this effectively moves the performance crown of fastest chip for training to CPUs."

I mean, it's still pretty impressive if two 22 core CPUs are faster than two V100s.

zikitariam · Mar 13, 2020

Tensorflow although commercialized by google is one of the slowest, if not the slowest deep learning libraries that exists. So this article is inaccurate and misleading in saying "3.5 times slower than gpu training". Also a 48 core cpu system might not be a single machine. These tests and comparisons need to benchmark fairly using all deep learning libraries (especially pytorch and chainer).

Search

News Breakthrough DL Training Algorithm on Intel Xeon CPU System Outperforms 8 Volta GPUs By 3.5x

Admin

Administrator

TechLurker

Honorable

AnimeMania

Distinguished

JayNor

Honorable

Gomez Addams

Prominent

JayNor

Honorable

ajlogo

alextheblue

Distinguished

zikitariam

TRENDING THREADS

Latest posts

Moderators online

Share this page