After the deception at knowing details of the new AMD arch (the K in K12 doesn't mean rocKs but sucKs), Nvidia finally presents the long-time awaited CPU
http://blogs.nvidia.com/blog/2014/08/11/tegra-k1-denver-64-bit-for-android/
As expected it is a highly innovative design for a CPU. I think that it deserves the name of GCPU or CPU made from the perspective of a GPU maker!
I was expecting a VLIW design with transmeta-like translation from armv8 and Nvidia has presented an in-order design (yes, as cores in GPUs) but then has added an innovative DCO (Dynamic Code Optimization) technique to simulate OoO logic. The White paper claims a much larger instruction window (up to 10x more than usual OoO hardware).
This could open the doors for a new kind of higher-ILP CPUs beyond the traditional OoO superscalar designs. In fact Nvidia claims that Denver can execute up to 7 instructions per clock. For the sake of comparison, the ARM core used in Seattle peaks at 3 instructions. Haswell designs are 4-wide.
With the larger caches and wide issue operation, Denver can achieve IPC rates comparable to an Intel Core processor, but consuming only the power of a mobile processor. Amazing!
NVIDIA's performance goals for Denver are quite aggressive - it can actually be compared to desktop CPUs, with some benchmarks demonstrating performance on par with an Intel Haswell-based Celeron processor (Figure 4). It’s also extremely fast for a mobile processor as seen when compared to the Apple A7 processor on those same benchmarks. The performance levels shown will make Denver a snappy processor, suitable for super-smartphones, tablets, and Chromebooks.
Now waiting for the Boulder core... which will be a scaled up version of this.