hcl123 :
juanrga :
Certain problems require exascale level compute. The problem is that current petascale supercomputers cannot be scaled up 1000x due to the immense power consumption. Therefore research groups concluded that ARM was the kind of efficient architecture needed to build supercomputers at that scale.
The research group has shown that ancient Tegra chips are more efficient than the i7 in single core perf. and had the same efficiency that the i7 in multicore at the same freq.
Ivy Bridge is about a 10-15% better than Sandy and Haswell is (being generous here) about a 10% better than Ivy. I don't know how Broadwell and Skylake will perform, but I doubt that will introduce more than a 20%.
However, the current Tegra 4 is 10x/5x better than Tegra 2/3 they tested. And Parker will be about 100x better and comes in 2015, probably in 14nm. Intel Broadwell also comes in 14nm but it is not more for desktops
http://www.fudzilla.com/home/item/32524-broadwell-won%E2%80%99t-make-it-to-desktop
For the sake of comparison 100x is a 9900% >>> 20%
ummm... i think the key to exascale is in heterogeneous, a kind of APU, not necessarily one that has a full GPU for the heterogeneous part. Perhaps that is why ARM is in HSA, HSA is not about iGPUs as many seem to reason... its all everything heterogenous... yes HSA in exascale supercomputer (OpenMPI is already foreseen).
All this ARM vs i7 is awkward. Even biased blinds can see that an A57 as much more perf/watt... its not only 28nm vs 22nm finfet (double the theoretically price), its also power management and turbo, so its not 1.3Ghz vs 1.49Ghz, ARM as kind of rudimentary power management and no turbo, while Intel has the best power management of them all (this should please hafijur)...
A57 on 22nm finfet and Intel like power management (turbo) will be a severe beating not only at perf/watt, but also in pure performance (where it can win already in most of benches).
Yes, we are talking about heterogeneous supercomputers: the Mont Blanc project uses ARM+CUDA.
The CPU in current x86 + (GPU/accelerators) supercomputers accounts up to a 40% of the total power consumption . Substituting the x86 CPUs by more efficient ARM CPUs is a needed step to scale up.
A note: Nvidia Parker APU will use custom cores more advanced than A57 and it will be made in a FinFet process. The node size is not known, probably 14nm, but some rumours say 16nm whereas other say 10nm.
Cazalan :
I didn't ignore it. Less efficient at single core performance at the lower clock speeds. But when you use it as it's designed with 4 cores and higher clock speeds Intel became more efficient. Chop 2 legs off a horse and yes a human will outrun that horse. In one of the same slides they overclocked an ARM chip and saw the performance/energy actually go down.
You're taking Nvidia marketing terms at face value compared to CPU only Intel advancements. Those NVidia numbers are HEAVILY weighted towards the GPU capability they're adding to the device, not the CPU capability. It's apples and oranges.
Now if you take Sandy GPU benchmarks compared to Haswell GPU benchmarks what do you see?
http://www.sisoftware.net/?d=qa&f=cpu_intel_sb
http://www.sisoftware.co.uk/?d=qa&f=gpu_hsw
Sisoftware Float Vectorized compute shader - Sandy 19.44MPix/s
Sisoftware Float Vectorized compute shader - Haswell 537MPix/s
A 27x improvement in 2 generations. Yes that was cherry picked but the point is NVidia just needs 1 benchmark that shows 100x improvement for that marketing slide to be true. That's not very hard to do with benchmark magic.
You can't take ANY marketing slide at face value. That goes for EVERY company.
PS: MIPS is considered more energy efficient than ARM, it just hasn't had a champion until Imagination bought it last year. Now they have their own energy efficient 64 bit core to bundle with their PowerVR Rogue (Used in iPhone 5S).
It says: "ARM multicores as efficient as Intel at the same frequency"... using ancient Tegra 3 (the supercomputer will use Tegra 6).
The tegra 3 phone chip is offering the same efficiency, or better, up to its maximum freq of 1.3Ghz, whereas the Intel chip (which is not a phone chip) can continue up to 2.4GHz, among other reasons, because its cores are being feed by about 4x more memory bandwidth and 7x more cache size. There is no reason which you cannot clock an ARM core up to 3.5GHz, this is unrelated to x86 vs ARM, but simply the rest of the chip has to be designed accordingly.
Look to the single core efficiency: "ARM platforms more energy-efficient than Intel platform". A single core is being feed accordingly.
If the tests are repeated, but disabling the extra 6MB cache of the i7, and limiting it to a single memory channel, the i7 will perform much poor, whereas consuming about the same energy. I think this would offer a better perspective of the efficiency of ARM vs x86.
About overclocking. Again this is unrelated to ARM vs x86. Pay also attention to the "Thermal package not designed for sustained full-power operation".
You are right, I was comparing CPU improvements with CPU+GPU improvements, my mistake. Still my point holds; CPU improvement for Intel was of about 10%--15% between Sandy and Ivy and then dropped to about 5% for Haswell.
Tegra 3 CPU is about 2x faster than Tegra 2 CPU. Now look how Exynos (dual A15) offers the same performance than Tegra 3 (quad A9) clock for clock. This implies A15 is about 2x faster than A9. Look to the new Apple A7:
Apple doesn't quite hit the 2x increase in CPU performance here, but it's very close at a 75% perf increase compared to the iPhone 5.
About MIPS, I think that what you say is not correct, but the question here is not one RISC vs another RISC, but RISC vs x86 (CISC).