News Breakthrough DL Training Algorithm on Intel Xeon CPU System Outperforms 8 Volta GPUs By 3.5x

JayNor

Reputable
May 31, 2019
430
86
4,760
from the paper ... all the info on the CPUs/Threads is there:
"All the experiments are conducted on a server equipped with two 22-core/44-thread processors (Intel Xeon E5-2699A v4 2.40GHz) and one NVIDIA TeslaV100 Volta 32GB GPU. "

"As mentioned before, ourmachine has 44 cores, and each core can have 2 threads.However, we disable multithreading and the effective number of threads and cores is the same. Hence, we interchangeably use the words “threads” and “cores” from here on. We benchmark both frameworks with 2, 4, 8, 16, 32, 44 threads."
 

Gomez Addams

Prominent
Mar 4, 2020
53
27
560
What I think would be interesting is if they were to adapt their algorithm to GPUs. Then it could be even more parallel.

Yes, I read they need a large memory and that could be prohibitive since GPUs max out at 32GB. Maybe Nvidia will up the ante at the end of this month and announce a GPU with 64GB of RAM, or more.
 
Mar 6, 2020
1
0
10
I'm sorry but "Tom" has been played by Intel "research"


"Updated 11:00am PT: Corrected the article to reflect that the tests were conducted with a single V100 GPU. "
The original claim was with eight, good catch.

"The researchers have not talked about any plans or prospects of their algorithm for commercial adoption"

" this effectively moves the performance crown of fastest chip for training to CPUs."
 

alextheblue

Distinguished
I'm sorry but "Tom" has been played by Intel "research"


"Updated 11:00am PT: Corrected the article to reflect that the tests were conducted with a single V100 GPU. "
The original claim was with eight, good catch.

"The researchers have not talked about any plans or prospects of their algorithm for commercial adoption"

" this effectively moves the performance crown of fastest chip for training to CPUs."
I mean, it's still pretty impressive if two 22 core CPUs are faster than two V100s.
 
Mar 13, 2020
1
0
10
Tensorflow although commercialized by google is one of the slowest, if not the slowest deep learning libraries that exists. So this article is inaccurate and misleading in saying "3.5 times slower than gpu training". Also a 48 core cpu system might not be a single machine. These tests and comparisons need to benchmark fairly using all deep learning libraries (especially pytorch and chainer).