Intel Kaby Lake: 14nm+, Higher Clocks, New Media Engine

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

bit_user

Polypheme
Ambassador
Why is that funny? Were you expecting 13%?

Given that they didn't change the architecture, then the most speedup they could get from a clock speed increase would be 12.9%. Performance scaling at 93% of the clockspeed delta is actually pretty good.

And that beats the incremental performance increases we've seen in all of Intel's desktop CPUs, since Sandybridge.
 

bit_user

Polypheme
Ambassador
Intel's revenues are down, because people aren't upgrading their PCs as often. If Intel knew of any more incremental architectural improvements they could implement (that were power & cost effective), they'd have done it by now. In that sense, I don't really agree with people who say they're holding back.

However, I do agree that Intel is being somewhat cautious. I think they got burned too many times, by things like i860 and Itanium. Thanks to that and Pentium 4, they almost missed the boat, on x86-64. It's understandable why they're in such an incremental mindset, especially when they dominate the market. They've got everything to lose, by doing something bold and risky.

Maybe some upstart, like Soft Machines, will shake things up a little bit.
 

InvalidError

Titan
Moderator

If increasing IPC beyond current levels was so easy to achieve, AMD would have caught up with Intel years ago and would be at risk from getting leapfrogged by AMD's Zen.

But that isn't happening. Why? Because CPUs are approaching the practical limits of how much ILP/IPC they can extract from typical applications' instruction stream. When you approach the practical limits of anything, improvements become incrementally smaller.

The only thing Intel is holding back is increasing the core count in mainstream CPUs. For the average mainstream users, this makes no difference since next to no mainstream software makes meaningful use of more than two or three threads anyway. It makes no sense from Intel and AMD's side of things to seed the market with more powerful chips until there is enough mainstream software capable of leveraging the extra processing power to justify lowering the entry cost of 4C8T and beyond.

Maybe Zen will up the pressure a bit but I would not bet too much on that. Maybe i7-6700 performance near i5-6600k prices. Enough bang-per-buck to make Intel fans think twice but still ~$100 more than what AMD's CPUs currently cost.
 
Retro gaming...so I assume your user ID is a reference to Day of the Tentacle?

 
There's no denying that I'm both very excited about and nervous about Zen.

It's a bit telling that they're showing benchmarks matching it against Broadwell when they'll actually be competing against Kaby Lake, 2 generations newer. Still--if they can even match Broadwell while keeping their current pricing profile, it's back to being a fight.

Think of something matching an i7-4790K for $200. That beats a Kaby Lake i5 while being $30 under the price point, sticking it to the man, and assuring a 3-5 year socket upgrade possibility.

 
"The desktop PC refresh cycle is lengthening from 3-4 years to 5-6 years. But while the mainstream PC segment is contracting (Intel noted the majority of PCs are five years or older)".
Seriously? Perpahs that has something to do with the fact that Neither Intel nor software devs are bringing anything new to the table?
 

srmojuze

Commendable
Jul 28, 2016
26
0
1,540


Certainly for someone who got a Skylake Pentium or Skylake Core i3 when the time comes a Kaby Lake K (unlocked) Core i7 will give you a real kick in CPU power while (Lord willing) being able to reuse current components such as mobo, DDR4 RAM etc.

What is concerning though is a somewhat minimal improvement. Let's go back to the graph they showed showing the 8x to 10x performance-per-watt improvement. That's good and all that but that curve is not even a straight line anymore, it's showing a clear limiting function/ asymptote. If Cannon Lake is 10nm and that's also not an architecture improvement per se (it's mainly die shrink)...

TL;DR Intel is far from dead but the glory days of high profit margins and huge leaps in performance and performance/watt is for the most part behind us. I personally think silicon at sub-10nm is at a dead end. The silicon atom itself is said to be 0.2nm.

Thus we are at the end of a current age/epoch and at the cusp of a new post-silicon one. Don't get me wrong, a Core i7 Kaby Lake running at 5ghz on all 4C/8T will be absolutely beastly compared to a 2.5ghz Sandy Bridge dual core. That's exciting - however Sandy Bridge Quads overclocked to 4ghz in most practical situations are holding their own to this day. And anything very CPU bound like 3D rendering, Lightmapping, physics, simulation, etc are all orders of magnitude faster on the latest GPUs: https://www.youtube.com/watch?v=gVe6FNd4yQc ...And that's with Nvidia Maxwell Titan X I believe Nvidia Pascal Titan X has been released and 2017's Nvidia Volta architecture will probably blow away any chance of CPUs being competitive at most CPU-intensive work, performance, performance-per-watt, performance-per-dollar, etc.

In the gaming world already we are seeing the attitude of "what's the least amount I can spend on my CPU so I can spend the most amount on my GPU". Video encode/decode is an area where if I am not mistaken GPU acceleration is also key. Consumer video encode/decode on the latest Intel CPUs is alright and useful, but again, as per above, where the CPU is really needed, GPGPU is only gaining further distance.

Xeon and Servers are of course a different kettle of fish and that is apparently 50% of Intel's business - but in terms of enthusiast and mainstream non-ARM PCs, things are drying up.



While I personally wouldn't deride Intel the thing is their communication, choice of benchmarks and comparisons, are not wrong per se, but overall shows transitioning out of sub-10nm silicon to something else will now be their biggest challenging while fending off the huge momentum on the ARM side of things... in the foreseeable future.



That would be my current understanding. Since Cannon Lake is ostensibly not an architectural revamp per se but die shrink, and also since people are saying 10nm should be skipped(!) for 14nm-->7nm, altogether Intel will still produce great chips but the glory days do appear to be behind us, at least in terms of silicon.

That said it is exciting yet very scary to see what comes out post-sub-10nm-silicon.

On the business side of things, perhaps one thing not being noted is that Intel's business and marketing strategy is very much geared towards Tick-Tock. It is what got them out of their funk when Athlon 64-bit dual cores reigned supreme - as I understand "Core" was born out of the Pentium 3 architecture/philosophy(?) and Tick-Tock has brought us to where we are now. The fly in the ointment is how Intel has used this strategy in relation to pushing their chips through OEMs - the refresh rate, marketing/hype cycles are all related to the physical manufacturing cycles.

The challenge Intel faces now is if they don't skip 10nm they face quite a bit of costs since within a very short time they have to hit 7nm. But if they ~do~ skip 10nm that means they might face delays, get "out of sync" with the business/ sales/ marketing/ hype cycles and Intel and OEMs may suffer sales - think about Windows 10 - that appears to have not led to the desired to PC purchases, and in any case Microsoft started giving away Windows 10 - and 8GB RAM, an SSD and Windows 10 is more than enough to rejuvenate a 1-5 year old PC. "Yoga" form factors are interesting but the Surface form factor is where the sales are at - yet this is low-margin for Intel, AFAIK.

So all in all I shall end my very long post with, "the end of the silicon age is upon us". All-in-all it's been a fun ride in nanometer-land (from apparently 800nm in 1989) through these 30 years.
 

InvalidError

Titan
Moderator

At the moment, the two most likely candidates are InGaAs and InP. Most other materials that have been considered have too many process complications or costs to be economically viable.


Intel has many fabs with multiple production lines each. It could very well decide to upgrade only enough lines to produce the minimum volume requirement of 10nm chips and wait for 7nm for broader fab upgrades.

Given the huge backlog of TSMC, GloFo and other foundries at 14/16nm, enough so for their clients to have to continue designing lower-end chips on 28nm to meet demand, Intel is not going to run out of potential clients for its excess 10/14nm fabs any time soon if it decides to make them available for hire.
 
I remember an article ~6-7 years ago about Intel touting that they are able to scale the Core architecture to 80-cores. So I have to admit that I didn't expect that after all these years we would still be on quad core CPU's. I know the current state of software, but I'm wondering if the software isn't better threaded because the mainstream products from Intel haven't scaled higher and they don't want to put in the extra time and effort to meet a small amount of the market that buys the 2011v3 CPU's for gaming.

I doubt though that Intel will make a quick jump to 7nm considering the problems they have run in to with 10nm. I would assume that the 10nm issues would only be worse and harder to overcome at 7nm.

Maybe it's time for "quantum computing"?
 

InvalidError

Titan
Moderator

This must be the chip you are thinking about:
http://www.cnet.com/news/intel-shows-off-80-core-processor/

Those aren't Core2Duo cores on there since the 80 cores chip has almost the same transistor count as the Core2. This particular chip does not even use the x86 instruction set.

Intel does make massively multi-core chips based on simplified x86 cores, look at Xeon Phi. Those chips are generally useless for the typical consumer though due to low individual thread performance and most mainstream software's strong dependence on a few high performance threads.
 
I find this launch REALLY interesting, unlike most other people. As someone who is still in Haswell (probably the same for people with 3rd generation intel) Skylake was attractive because of new technologies (not performance), but wasn't enough to upgrade. Now, it is Skylake made a bit more attractive.
I guess it will be the same every time, the PAO method will probably make the O quite attractive for people who are just a bit outdated.
 

InvalidError

Titan
Moderator

I'll take your 88 threads and raise by 200: the new Xeon Phi which will be launching soon have up to 72 quad-threaded cores for a total of up to 288 threads per chip.
http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html
 

TJ Hooker

Titan
Ambassador

Hmm, interesting. Although the original comment about 80 cores was with regard to the Core architecture, and Intel uses a different architecture for Phi.
 

InvalidError

Titan
Moderator

AFAIK, Intel's 80-cores chip (the one demoed in 2007) wasn't even x86-based at all. The Xeon Phi on the other hand does run quad-threaded Atom-like x86 cores.
 

bit_user

Polypheme
Ambassador
Just remember, these threads are each 1/4th of a 1.6 GHz Silvermont Atom (with AVX-512), rather than 1/2 of a 2.2 GHz Broadwell. Depending on the workload, the Broadwell might be a better choice.

Also, I was intrigued by this statement by Al Gara, chief exascale architect at Intel and one of the designers of IBM’s BlueGene family of massively parallel supercomputers:
The size of that [CPU] is really a cost optimization, the number of cores we put onto a chip. But one of the net impacts of this kind of architecture is that we cannot get quite enough memory into a package because of the size of the CPU die. That doesn’t mean there isn’t a solution. If we actually portioned this into a smaller CPU die, it won’t have any effect on writing MPI and MPI ranks since each one of these would have had multiple MPI ranks within it even after we have divided it up.
We have to be really disciplined in this because the thing that we will tend to think – and I know at Intel we always want to make things bigger – is that it is attractive to add more cores to this. What happens is when you add more and more cores, you end up needing more and more power, and because your system has a finite power limit, what will happen is that you will get fewer of these as you add more cores here, and as you get fewer of these blocks, you will end up constraining yourself. You will not be in an optimal solution. You may have as many cores, because most of the power is driven proportional to the cores, but you will get much less memory capacity per core than you would have wanted.
Source: http://www.nextplatform.com/2015/08/03/future-systems-intel-ponders-breaking-up-the-cpu/
(includes an interesting historical analysis of GB/sec per GFLOPS)

Another way of looking at it might be that 72 cores is where a cache-coherent, unified memory model really starts to break down. Their option of partitioning the chip into 4 separate memory domains suggests that 18 cores is still in the sweet spot.
 
It was 80 cores in a single CPU, but I didn't remember if they were full core or not. The link provided is likely the story i was thinking of, but from a different source and thus the source I had at the time was probably light on information.
I also remember someone from Intel saying that they didn't want to engage AMD in a "core race" similar to the "ghz race" they were in back in the P4 days. I believe it was in response to a statement from AMD knocking Pentium D's as not being true dual cores compared to what they were building and just two P4's mashed into the package.

It was more interesting when Intel and AMD were in busy on-upping each other. Hopefully Zen brings it back, since that is when Intel really innovated and kicked up the performance. Intel needs a fire lit under them, lol.
 

bit_user

Polypheme
Ambassador
@InvalidError is correct. The 80-core chip was this:

https://en.wikipedia.org/wiki/Teraflops_Research_Chip

It's the predacessor to Larrabee, which eventually emerged as Xeon Phi (2 generations later).

https://en.wikipedia.org/wiki/Xeon_Phi#Background

It wasn't feasible to put 80 Core2-gen cores on a single chip until probably 14 nm. Even now, I'm not sure you could do it. Consider that the first gen Xeon Phi (Knights Corner) @ 22 nm had only ~62 modified Pentium cores.
 

InvalidError

Titan
Moderator

Cache-coherent memory becomes a nightmare much sooner than that, which is why ccNUMA or similar application-appropriate schemes are necessary in any form of massively parallel system to avoid saturating interconnects with cache snoops instead of useful data.

It has only become largely forgotten because the glue logic/uncore in single-socket CPUs has plenty of bandwidth to conceal the problem up to much higher core counts, making it a non-issue for most applications running on single-socket systems.
 

srmojuze

Commendable
Jul 28, 2016
26
0
1,540
I am not familiar with Xeon Phi and similar "80-core" or so CPUs but the Wikipedia artcile above mentions Xeon Phi is for HPC and Nvidia Tesla is a competitor.

So in terms of HPC and certainly in terms of enthusiast applications that require so many "cores" - I would put forward that GPGPU is where the real action is at, because (I'm guessing) the increase in "cores" and parallelism in OpenCL, CUDA, etc. is far more efficient/easier than Intel's x86/x64 or similar architectures.

Again in the enthusiast space and professional space (eg. 3D rendering of feature-length movies/ effects for feature-length movies etc.) A 32-core Core i7 or Xeon would be nice but Nvidia Quadro, Telsa, Grid, etc. are very compelling alternatives.

At the end of the day this reiterates Intel's challenges, since even if they could have a "64-core Skylake" chip at 14nm or even 7nm the application side is "dry".



Yeah fair enough but to really appreciate it I think the jump from Sandy Bridge, Ivy Bridge or Haswell 2-core up to Kaby Lake 4-core will be the real value, kick-in-the-pants upgrade.

A fast 2-core to a somewhat faster 2-core won't be that big an upgrade.

Even 2-core/2-thread is enough for most things, 2-core/4-thread is nice, but 4-core/4-thread is the sweet spot. 4-core/8-thread is somewhat of a luxury in the sense that if you really need significant CPU compute you'd go to a Xeon where single and dual socket you'd be able to get AFAIK quite reasonable prices for a 8 core and 16 core setup, etc.
 

srmojuze

Commendable
Jul 28, 2016
26
0
1,540


Now that Intel ostensibly is going to manufacture ARM, I wonder how much currently of their spare capacity they're selling, how that affects their own chips, and I wonder how this will change if indeed their margins of high-end Intel CPUs dries up (as I understand they are).
 

InvalidError

Titan
Moderator

Intel's high-end margins are doing better than ever - Intel's high-end sales have improved 20% year-on-year despite price increases all around. It is the low to mid-range that are having faltering sales issues due to longer upgrade cycles and people ditching their low-end PC/laptop for tablets, smartphones, 2-in-1, chromebooks, etc.
 

bit_user

Polypheme
Ambassador
I don't follow the distinction you're drawing between ccNUMA and ???. In a meaningful sense, Knights Landing does have ccNUMA.

I could have dropped the word "unified", as I'm talking about all forms of cache-coherent memory. NUMA only solves the bandwidth problem - doesn't make cache coherency any easier or more scalable.

BTW, two further points:

    ■ Knights Landing is organized into two-core tiles, with one L2 block per tile. So, at 18 cores, we're talking about only 9 tiles.
    ■ The scalability obviously has to do with the relative speed of the cores (and how memory-intensive the app is) vs. the speed of the interconnect. Knights Landing uses a 2-D mesh interconnect, as opposed to the ring model employed by the their Core-architecture CPUs. This is also a factor.


Therefore, there's no one-size-fits-all limit on the scalability of cache coherent memory systems. What we can say is that Intel seems to feel that Knights Landing users might see a knee in the performance curve, somewhere above 18 cores.
 
Status
Not open for further replies.