Yeah I agree- I wasn't disputing any of that. The thing that is likely to confuse people though is that to my knowledge there currently aren't any ARM cores competing in the ~90W space, so x86 appears (in pure performance terms) faster core for core, but it isn't as efficient.
What I was describing is related to the AMD A57 server parts that we will see first. They're not a particularly high performance core (as ARM designs are all geared towards low power devices) so 1 A57 core isn't going to outrun an Intel or AMD 'big' core in absolute terms. It should be much more efficient though so a server with lots of them makes allot of sense.
Based on the titbits of information released on K12, that should be an ARM based 'big' core which can be more readily compared to x86 'big' cores. It will be interesting to see how well ARM scales up to much higher TDP, and I think AMD are the ideal company to do it.
I know you agree. I was merely trying to add some more info to your excellent post.
What you say about the A57 is again correct. The A57 core has been designed to fit inside a phone. It is a phone-class core. Thus altough it already surpases jaguar/piledriver in IPC, it is clocked relatively low to maintain power consumption under control. By this reason the 8 A57 cores in AMD Seattle are rated at 25W; whereas 8 Haswell cores, on a better node, are rated at 140W. Evidently a 25W SoC will not beat a 140W CPU in raw performance.
The SoCs that will beat Intel Xeon and AMD Opterons are other. Those SoCs will use high performance cores (sometimes named server-class cores) and don't will be used in phones.
Cavium is designing ~90W SoCs that will surpase Intel top Xeons in performance. I estimate the performance of their ARM SoC on 960 GFLOP/s. The FX-8350 tops at 256 GFLOP/s. The fastest Xeon tops at 518 GFLOP/s.
AMD K12 is also designed for competing with Opteron/Xeons. Broadcom is designing custom ARM cores of high performance (4-wide with SMT4 level) for servers, and rumours point to Nvidia designing ARM cores to beat Opteron/Xeon CPUs as well. Nvidia project is named Boulder
http://www.techpowerup.com/172683/nvidia-to-take-on-xeon-and-opteron-with-a-boulder.html
We know that the reason why Intel is increasing the performance of Xeons, increasing the number of cores (15 cores or more), and lowering prices is because are prearing for the battle with the ARMy.
🙂
Intel is following Nvidia here. Phones are a too competitive market.
Several MIPS licenses have shifted from MIPS to ARM recently.
Audi and Lamborghini are already integrating ARM SoCs in their cars.
The memory point is worth discussing here. Yes, HSA allows the CPU and GPU to exist in the same memory context. The downside to this is the fact that VRAM has to go away, leaving the GPU at the mercy of main memory bandwidth. We see the results when large datasets are processed: The GPU gets starved. Contrast that to non-HSA dGPU's, with, what, 4GB of VRAM now? You have a high initial latency, but after that, the higher bandwidth of the VRAM makes up the initial performance disadvantage.
Hence the downside of HSA: Everytime you can't fit a dataset into the cache, you have to access main memory again, which is slow (from a computational standpoint). dGPUs just get around this with massive amounts of VRAM. If you want a dedicated gaming APU, I really can't see how to do it without greatly expanding the size of the CPU cache, upward to the hundreds of MB, at a minimum, simply due to the relative slowness of main memory.
Replacing VRAM by system RAM is not a problem by itself. If the dGPU uses DDR3 and you replace that by DDR3, there is no bandwith problem. What you are reporting is the bandwith problem when a fast memory such as GDDR5 is replaced by slow DDR3 memory.
The PS4 APU uses GDDR5 memory instead slow DDR3.
This problem with slow DDR memory is only
temporary. AMD, Nvidia, Intel, and ARM will replace slow DDR system memory by fast stacked system memory. The first will be Intel, who the next year will release a KL CPU with 8/16GB of stacked RAM with a bandwith of 500GB/s. For the sake of comparison both the Nvidia 780Ti and the Titan Black have GDDR5 memory with only 336GB/s.
Next will be AMD with HBM stacked RAM
and finally Nvidia with HMC stacked RAM.
More PRO APUs surface, including a 3.9 / 4.2 GHz model.
http://www.cpu-world.com/news_2014/2014061601_New_AMD_A4_PRO_series_processors_surfaced.html
That is a Richland model.