Skylake: Intel's Core i7-6700K And i5-6600K

Page 7 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Zen comparable to Sandy Bridge? Well, we'd need to wait and see on that one. Even if it was on a per-core basis, it's an octo-core CPU, so it would theoretically blow past SB and IB hexa-core CPUs in CineBench. Assuming anybody pays attention to CineBench, that is.

Assuming that Excavator is approx. 10% faster than Steamroller which is probably 10% faster than Piledriver which is probably... you get the point... another 40% on top of Excavator's performance would make a Zen core about the speed of a Bulldozer module. Big assumption, mind, though until Steamroller there was a decode bottleneck so it can't be too far from reality.

Knowing how Excavator performs would make IPC projections a lot more meaningful. We still don't know if that's instructions per core or cycle, what tests were being performed and so on. AMD seems to alternate between comparing their products to roughly equivalent Intel models and their own so without them to say "it's x% faster than ix-xxxx at yyyyyy" we can speculate until the cows come home. One thing we can definitely agree on is that getting rid of CMT is a good thing for now.
 


It makes total business sense for Intel to hold back, and go incremental, whilst having no competitors currently. And that is without a doubt what they are doing. But i do agree all chip makers are starting to hit the fundamental limits of silicon, but the effects will really start to be felt once 10nm is reached imo.
AMD have suffered from design flaws in my opinion and not limits of silicon or manufacturing processes just yet.
 


Yes i do have plenty of sources but my claim is NOT that they are obligated, its just a moors law type observation that has held up since the invention of computers. Some claim the doubling is as low as 12 months.

Ray Kurzweil made the observation and prediction decades ago, its similar to moors law.


http://www.godandscience.org/images/computingpower.jpg
http://www.razorfish5.com/volume2/images/book/fullsizeGraphs/graph1.png
http://declineofscarcity.com/wp-content/uploads/2012/03/supercomputers.jpg
http://4.bp.blogspot.com/_urKgj-8wIt8/S9RFy4raqmI/AAAAAAAABHI/rjqPQ3fnOg0/s1600/supercomputing.png
http://www.shellypalmer.com/wp-content/images/2015/08/kurzweil-cps-per-1k-curve.jpg


 

Moore's law is about the transistor count. Transistor count alone does not double performance. Back when performance doubled roughly every 18 months, there were also massive clock frequency increases from die shrinks, architectural tweaks and material improvements. The performance doubling was achieved through a combination of these three factors.

You cannot just throw transistors at CPUs and expect to gain performance since every transistor you add will make the timing margins worse along its signal path and cause achievable clocks frequencies to drop if the transistors you added are in the critical path - this is why L1 caches have hardly grown since the original Pentium's 16KB from 20 years ago: the L1 latency is so critical to performance that CPU designers cannot afford to trade latency for size, same goes for the renamed register file, branch prediction tables and many other structures. If you add a pipeline stage to cut a critical path in half, the extra execution latency cycle will cause other instructions' data dependencies to stall the pipeline for one extra clock cycle, increasing the already tricky scheduling work the CPU needs to do to keep its execution units busy, reducing the number of instructions the CPU can issue and retire per cycle. If you doubled the transistor count in a CPU core by making all structures twice as deep/wide to hopefully do twice as much work per clock, you would end up with a CPU that runs at half the clock frequency, comes nowhere near doubling IPC to compensate and spends twice as much die space per core.

The only thing that can change that is broad adoption of multi-threaded programming by developers to justify mainstream CPUs with more cores and SMT but until tools to eliminate most of the additional design effort required to write decent multi-threaded software become available and widely adopted, most developers will only bother with writing non-trivial threaded code when they have no other choice.
 
Oh boy!!! my pc is really dated X58 Chipset with triple channel memory and a crummy 6core I7 cpu I need to upgrade!!!!!
 
I think people are a bit naive when it comes to understanding the fallback in graphics. For one thing, with the new Skylake processors, like all new gens, they have higher IPC and more transistors. Transistors take up space in the CPU, and I can recall seeing an images from Brodwell CPUs where the iGPU took up 80% of the proessor's inner volume. Broadwell was suited for iGPU gaming on small ITX machines and for budget builds who want a solid CPU without buying a dedicated GPU. That is why the clock rates were so low on it also, to save power and be small and efficient.

Anyone using these CPUs from Skylake should be using a dedicated GPU. It could just be they ran out of space on the die and had to cut back on the iGPU. First everyone says, "Broadwell, no one cares about iGPU at all" and now they say, "Intel! Why cut back on the GPU!" Hypocrites.
 


The best part of post like these is that people seem to focus so much on the CPU. yes it is the core of the system but there is more to it than that. SATA and other busses are bottlenecks to these super fast CPUs and that's what people forget to look at. With a few of the new technologies coming up, such as DX12 lowering CPU cycles and allowing more of the GPU power to be utilized and 14nm GPUs that should boost power could put a benefit in having PCIe 3.0.

Then we have M.2, faster USD/Thundebolt buses and faster memory.

All of these advancements can benefit in many way. Of course it is to be seen but I would rather have the extra power so I don't have to upgrade sooner.
 

But you will want to upgrade next year anyway since Skylake-E might have PCIe 4.0

Personally, I have made my peace with "future proofing" since by the time I genuinely need an upgrade, I can build a whole new PC with the money I would have had to spend on the previous build to make it usable for maybe two years longer.
 


Seriously, if Intel really were holding back, AMD would have no problems catching up, but they continue to fall behind. It is possible Intel is not taking any chances with radical design changes because they don't need to, but it is not a matter of holding back. It's a matter of playing it safe.
 

There is no playing it safe there, it is simply a matter of the architecture having reached a point of maturity where there are no further possible tweaks that can yield major performance increase.

You can see the very same pattern with ARM-based CPUs: early designs were pure in-order cores with minimal caches that thoroughly sucked on instructions per clock and overall performance. Then ARM CPU designers added a full cache hierarchy, superscalar out of order execution, branch prediction and all the other stuff that x86 already had but was too power-hungry to integrate in mobile CPUs in the early years and integrating those tweaks increased performance tenfold. Now that ARM designers integrated all the low-hanging proven CPU performance tweaks, ARM CPU performance is leveling off. By the time the 14-16nm generation is through, ARM CPUs will be on a very similar incremental upgrade cycle to what x86 CPUs have been on for the past five years.

There is no foul play here, just the architecture maturing, approaching the theoretical maximum of its per-thread instruction-level parallelism with each refinement.
 


I understand that. I was saying that they might not have done anything radical, as in, changing their architecture, but they aren't holding back on their current path.
 
I mean anybody who isn't happy with the performance from these processors is crazy. This wasn't what you were expecting? Aw, that's too bad, I guess you'll have to use an inferior architecture.
 


I frequently use those cases for Mini-ITX buids, one of which is my current file server, so let me be the one to tell you that your idea isn't going to work. You have an extremely limited amount of space on the inside for a cooler, much less if you want to go with a 2.5 Inch HDD instead of a M.2 SSD. The max TDP you can expect is 65W with 45W being the more realistic target. These cases are not suited for a something like the 7800 or any of Intel's big CPUs. I have a 7600 running in mine only because it was left over after I upgraded my HTPC to a 7650, I have the iGPU turned down and it's running a server OS with almost zero load on the iGPU in order to save thermal room to provide better CPU performance. Realistically your going to have to use a low power i3 inside it in order to reach your goals, which isn't going to have very good graphics performance. Intel would need to release an i3 with crystalwell graphics in order to hit that target market, which I hope they do.
 
This is a bit late but a response for those criticizing the lower iGPU performance.

That is an engineering decision in order to maximize CPU performances while staying within a cost margin. There is a saying that "Nothing is Free", and it applies in CPU design even more so. They are working within a physical space limitation, you can only have the die be so large before it starts costing astronomical pricing. How would you guys like to pay $1000 or more for a desktop CPU? Didn't think so.

So within that same physical space they need to make some decisions, how many cores, how much cache, what kinds of interconnects, how many iGPU processing units and whether they should add off-die cache or not. Each one of these decisions effects the size, thermal budget and cost of the product. Four cores not six or eight, 8MB of L3 cache not 4, 12 or 16, so on and so forth. The result is that you tailor your product for a specific market segment, in this case they are aiming for the desktop enthusiast market which virtually always has a dedicated external GPU. The previous C chips were aimed at those looking to make a semi-powerful system without a dedicated GPU and were more of a technology demonstration then a real product, expect to only really see them in OEM systems. To make them they had to lower the clock speeds and expand the cost to account for the additional physical space and external eDRAM used. These chips don't have that iGPU nor eDRAM so they can raise the clock speed and lower the cost for a better desktop CPU.

Personally I would prefer if there was no iGPU and instead that space used for more cache or lower cost, but virtually every OEM these days, which account for most system purchases, demands there be an iGPU in order for them to sell a "value" version of the box without a dedicated GPU. It's the reality we live in.
 
What should really happen is that Intel should tell them, NO IGP, go buy a "value" GPU and stop whining.





 

In terms of general CPU or execution pipeline architecture, the only truly radical change I can think of in the past 15 years is Netburst's trace cache, one of very few things Intel salvaged from the P4 when they put Core2 together and uOps fusion they added on top to make the trace cache more efficient. All architectures since are mostly mix-and-match of already well-established concepts, just trying to get the best balance of everything that can be crammed in on a given process.

I doubt we are going to see architectural changes anywhere near that magnitude again for as long as mainstream software continues depending heavily on single-threaded performance.

And at the other end of the spectrum, you have the UltraSparc T-series, Power7 and Xeon Phi which have adopted simple in-order cores with four or eight threads each for extreme thread-level parallelism to keep their cores' execution units busy, at the expense of horrible single-thread performance that will make third-gen Atom look fast.
 
What should really happen is that Intel should tell them, NO IGP, go buy a "value" GPU and stop whining.

I wish it where that simple. The OEM's are customers and when your customer makes a request, you attempt to fulfill it. Especially if those same customers are paying you tens of millions of USD for that request. As much as we want otherwise, we are second class citizens in Intel's minds.
 
And at the other end of the spectrum, you have the UltraSparc T-series, Power7 and Xeon Phi which have adopted simple in-order cores with four or eight threads each for extreme thread-level parallelism to keep their cores' execution units busy, at the expense of horrible single-thread performance that will make third-gen Atom look fast.

The UltraSparc is now out of order, has been since the T2.

https://en.wikipedia.org/wiki/SPARC_T5

They just happen to be insanely wide, very large chips that cost a ton of cash. 16 core, 8 threads per core, 3.6Ghz and 478 mm2 die size with at on of I/O built in for multi-socket systems. Expensive and extremely niche in their target market.
 

The T2 and T3 were in-order. The biggest change from T1 to T2 is add an integer and float execution pipeline to each core, making them look a lot like AMD's modules: eight threads per core divided between two integer execution units sharing a single floating unit. The T4 was the first superscalar out-of-order T-series.

The T4/T5 (Sparc S3 core) pipeline is considerably simpler than x86 chips:
image18.png

haswellexec.png

Only a 40 instructions deep and two issue ports wide pick queue vs 60 instruction deep pick queue with eight issue ports backed by a 192 instructions deep reorder buffer. Haswell is the insanely wide chip here: eight execution ports dedicated to one or two threads for Haswell vs two execution units shared by eight threads for the Sparc S3 core. Two instructions wide decoders? Intel and AMD chips do three or four instruction decodes per cycle per core dedicated to one quarter as many threads. The only thing wide about the T5 is the total thread count.

Yes, the Sparc S3 core is capable of superscalar out-of-order execution but only the bare minimum necessary to enable a single thread to make reasonable use of its already limited execution units. Haswell is a monster by comparison. Such is the cost of extracting more instruction-level parallelism out of a dry rock to provide just a little more of the single-threaded performance most desktop software crave.
 


I wish you were my teacher.
 
What the heck Intel? So, you provide great integrated graphics into Broadwell, then nerf it for Skylake? I guess you had to find a way to help sell your 'paper launch' of Broadwell. I really hope Xen makes you guys wake up; although it more than likely won't.

Ever considered that this is the first of the Skylake line, and better GPU performance will come in other models that WILL come out? This is more for the enthusiasts I bet that will dump Video Cars into their systems.
 


Yeah. Idk why people are complaining about poor IGPU performance. Since this is a overclocking CPU, not a CPU like the -R models.
 
Status
Not open for further replies.