[quotemsg=18462118,0,1425755]Hate to brake it up to you but AVX 512 bit SIMDs on those Phi's don't support 8 bit precision they only supports single (32 bit) & duble (64 bit) ones (how ever it does support unaligned access but no one will use it) & neither is suitable for deep learning that actually uses 4,8 & 12 bit ones. So it's actually 16 (32 bit ones) per one 512 bit SIMD & 32 per Atom[/quotemsg]You're correct about the first part. However, rather than use 32-bit floats, one would likely use AVX2, which allows 32 x 2 bytes to be processed in parallel, per core. I checked and Knights Landing
does support AVX2.
[quotemsg=18462118,0,1425755]You did mention the 3x clock speed of Atoms (even it's not a case it's only 1.9x)[/quotemsg]So, you accuse
me of not reading things, but the EV6x designs (the ones including the 800 MAC/cycle option) are all 500 Mhz. Go and see for yourself. So, 1.5 GHz / 500 Mhz = the 3x multiplier I used.
[quotemsg=18462118,0,1425755]along with Nv dropping support for FF16 bit math processing[/quotemsg]The GP100 definitely
does support fp16. I only said that Knights Landing lost it, relative to Knights Corner. Sad loss, but I can see why they wanted to move to a standard instruction set extension. I hope we'll get it back, in the next generation.
[quotemsg=18462118,0,1425755]
Regarding power, Knights Landing can run 72 cores @ 1.5 GHz + 16 GB of HBM + 6-channel DDR4 memory controller + 40 lanes of PCIe 3.0 + the cache-coherent bus tying it all together, in only 245 W. So, while we don't know exactly how many watts each core uses, it's likely in the vicinity of 2 W.
Speculating again aren't you? You really need to work on your math! It's more likely they use a bit over 3W per core as HMB is really lo on power consumption.[/quotemsg]3W is definitely too high. You're forgetting the other things I mentioned, including the DDR4 controller. Also, the cache-coherent bus connecting all the tiles and memory controllers probably uses a significant amount of power, itself.
[quotemsg=18462118,0,1425755]Really your math is zero[/quotemsg]You're pretty close to being reported.
[quotemsg=18462118,0,1425755]Now we add the architectural difference & say how DSP will have at least 150% performance of SIMD/CISC Atom[/quotemsg]And where did you get that number?
[quotemsg=18462118,0,1425755]no compiler nor math lib that actually can harvest all capabilities on PHI AVX SIMD's & how that will significantly cripple it's performance[/quotemsg]Intel already has libraries optimized for it, and GCC already supports the new instructions. Most developers tuning their code for this chip will use the compiler intrinsics. However, if you read Intel's whitepaper, which is linked from the news article, they cite a version of Caffe that they've already optimized for it.
[quotemsg=18462118,0,1425755]If you use open source compilers that are about 2x slower it rises to 80x.[/quotemsg]Intel
does provide a compiler, so it's not a given that people will use open source compilers. However I'm curious where you got this "2x" number.
[quotemsg=18462118,0,1425755]
My position is simply that it's neither bad, nor ineffective
so your position is deadly wrong. Seams somehow you are stuck with it.[/quotemsg]Okay, it seems we have a disagreement about fundamental semantics, here. You've created a strawman, in the form of the fastest special-purpose accelerator of which you can conceive. And through some mysterious arithmetic of your own, shown that it's sufficiently faster to satisfy your position. Which is where we get to the bit about semantics (i.e. "bad and ineffective" being open to interpretion).
I'm okay with whatever you'd like to believe. It seems whatever I argue, you're going to put the goal posts out of reach. And there's no way either of us can prove anything about hardware that doesn't exist. So, I guess that's it.
[quotemsg=18462118,0,1425755]All do they did make significant improvements compared to previous generation.[/quotemsg]We agree on something!
[quotemsg=18462118,0,1425755]
I don't understand the strong distinction you're drawing between DSPs and GPUs, and I think I do know a bit about each. Please enlighten us, if you wish.
I won't really go into any details hire![/quotemsg]Thanks for the interesting analogy. I don't happen to agree, but that's a completely different debate.
Since you expressed interest in energy-efficient computation, I'll leave you with this link:
http://www.nextplatform.com/2015/03/12/the-little-chip-that-could-disrupt-exascale-computing/
I'm more than a bit skeptical of their claims. I think they're underestimating the threat posed by GPUs (or GPU-like chips, such as Nvidia's GP100), but their core idea of simplifying the hardware to the bare bones & letting software deal with caching and memory indirection is rather compelling. It's like taking the VLIW philosophy to another level. I expect that idea to grow some legs.