News Intel 14nm Rocket Lake-S Leaked: New Core Architecture, Xe Graphics, PCIe 4.0

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I'm talking about specific benchmarks that don't need any kind of upstream or downstream complexity,that you can just feed all the instructions into the CPU and the output needs no sorting.
Regardless of whether your algorithm extensively exercises a new architecture's deeper re-order buffers, wider execution units, etc. or not, the extra transistors are still baked into critical paths and dictating maximum attainable clock frequencies.

Since one of the things Intel did with Ice Lake is double throughput of some AVX instructions, things that can heavily leverage those will obviously see some substantial gains. For more typical workloads though, the IPC improvement barely offsets the clock frequency deficit.
 
  • Like
Reactions: bit_user
Regardless of whether your algorithm extensively exercises a new architecture's deeper re-order buffers, wider execution units, etc. or not, the extra transistors are still baked into critical paths and dictating maximum attainable clock frequencies.
Are they though? Isn't that why we have AVX offset? They can completely shut down or circumvent AVX when it's not running to get you the highest clocks.
For more typical workloads though, the IPC improvement barely offsets the clock frequency deficit.
Yes absolutely, in games or other very serial loads I don't see it being much faster if at all.
But in all the DC benchmarks that all the sites are focusing on this CPU will have huge gains.
 
Are they though? Isn't that why we have AVX offset? They can completely shut down or circumvent AVX when it's not running to get you the highest clocks.
Not using AVX does not magically reduce the scheduler's and other resources' increased complexity, these things are still there regardless of what instructions you use. The AVX offset serves a different purpose: keeping the TDP manageable when using exceptionally power-hungry instructions.
 
  • Like
Reactions: bit_user
Not really. Skylake is so hot because it runs at crazy frequencies. I invite you to run a 9900K at 3.6-3.8Ghz and you'll be surprised it can fit in 95W TDP. If this is based on Tigerlake and brings a big IPC boost (25-30%) then they can run it at 4Ghz with 8 cores with reasonable TDP (105-125W), which is decent given 14nm.
Of course the 9900k will fit into the 95W TDP @ 3.6GHz, that is its base clock and Intel determines TDP based on base clock. If this is a back ported design to 14nm, there is NO WAY they will be able to add the logic & increase base clock. This will be a huge die if they do and will require Intel's industrial chiller to run at 4GHz base clock.
 
  • Like
Reactions: bit_user
If this is a back ported design to 14nm, there is NO WAY they will be able to add the logic & increase base clock.
Intel's 10nm designs already cannot match Coffee/Comet so yeah, 14nm back-ports are pretty much guaranteed to be considerably worse for most typical workloads. That's kind of the whole point of Intel offering Comet-Lake U for people who prioritize processing power over Ice Lake-U's power efficiency.
 
  • Like
Reactions: bit_user
You get 25% more allocators and 25% more ports and that will show in benchmarks.
Going wider isn't free, clock-wise. Those ports have to be scheduled and, in a bigger core, electrons have farther to move from say a register file to an execution pipeline.

According to what we have seen with sunny cove 25% more units got them about 18% better results in multithreading benchmarks.
At 14 nm? Or are we comparing 14 nm with 10 nm? Apples & Oranges, you know...

You don't need to change any path length you don't get any influence on possible clocks and no nothing,you just get faster in DC/workstation loads without any negatives except if you count the core size as a negative.
We'll see about that.

If the performance is there power draw is secondary.
Well, if a CPU requires water cooling, then it will lose most of the mainstream users. So, that's the point where power becomes a deal-breaker.
 
Kick it old school and read a Conroe review comparing it to previous generation Pressler. Both 65nm, Conroe with much lower clock speeds and power usage, and it still crushed Pressler across the board.
That's a bad analogy. The Netburst architecture was designed to clock high, first and foremost. Pentium M and Conroe were about IPC. So, in that comparison, you're looking at the effects of two different design philosophies.

Fast forward to today, and the lessons of Conroe and its successors have not been forgotten. Today, both Coffee Lake and Willow Cove are both efficiency-oriented designs. So, you wouldn't expect to see the same kind of improvement as happened back then.
 
What you are talking about is the amount of instructions you can get when running all sorts of different things.
No, I don't think he was.

You're taking an oversimplified view of CPU design and function. When you have more execution ports, the CPU needs to do more work to schedule them. That adds complexity.

In 7zip for example they state a 75% increase,and that's because now it fits into cache and there is no complexity to it,
Bigger caches are more complex. It takes longer for signals to propagate around a physically larger structure, and if it has more sets, then lookups and eviction logic also becomes more involved.

But, caches are about one of the easiest performance wins you can get, by simply throwing transistors at the problem. There's a reliable, but diminishing benefit that CPU architects can count on.

The win usually is just a few %. Getting a 75% benefit from a cache change is either a case of hitting the jackpot, or there's just some poorly-written code that was previously thrashing it.
 
The win usually is just a few %. Getting a 75% benefit from a cache change is either a case of hitting the jackpot, or there's just some poorly-written code that was previously thrashing it.
One of the changes from Skylake to Ice Lake is that Intel doubled the throughput of some AVX instructions, so algorithms that depend heavily on those can hypothetically see up to a 100% speedup provided they were the sole bottleneck.

Once you step away from software that consists almost entirely of AVX-friendly number-crunching, the benefits become much smaller.
 
PCIe 4 off the CPU, PCIe 3 off the chipset. Tracks well with our previous reporting that Intel had issues implementing PCIe 4.0 in the chipset for Socket 1200 boards (both Comet and Rocket use LGA1200) https://www.tomshardware.com/news/i...lans-then-nixes-pcie-40-support-on-comet-lake

Also the argument about the socket is a fallacy. Just because Z490 and Z590 will be using the same LGA1200 socket it doesn’t mean they will have the same chipset or have the same limitations. Look no further than X570 and X470 boards that both use the same AM4 socket but have totally different chipsets and PCIe capabilities. Also that rumour about the Z490 boards makes no sense to begin with. Given that Comet lake cpus don’t have PCIe4 lanes how can anyone expect for the chipset to have?! I have never seen a platform where the chipset lanes are faster than the direct CPU lanes!

If this is based on Tigerlake and brings a big IPC boost (25-30%) then they can run it at 4Ghz with 8 cores with reasonable TDP (105-125W), which is decent given 14nm.
The IPC figure of 18% that we have for Sunny Cove is the IPC improvement averaged across several workloads. For some workloads, like tile-based rendering (such as Cinebench/Cinema 4D), the improvement is actually around 25%. So another 5-10% average IPC over Sunny Cove could very well be at least 12% increase for those same workloads, resulting to 40% IPC improvement over Skylake. And let’s not forget that an all-core 4.8GHz that we see in Comet Lake is only 20% faster than an all-core 4GHz turbo. Also since it is the 14nm+++ process it will still probably come with very high single core speeds of around 5GHz despite the increase in core complexity.
 
Ironically, the best argument for PCIe 4.0 would be to double the speed of the DMI link without increasing lane count (and therefore pin-count and cost). In fact, I thought Intel might even do that while keeping the graphics link @ 3.0. Intel went the exact opposite direction, however.

This is entirely independent of what flavor the chipset-connected lanes are. I mean, they already doubled DMI bandwidth while keeping the chipset-connected lanes at 3.0. They could've just done it another, possibly cheaper way.
The output from the cpu is most likely x4 PCIe4 but is converted into x8 PCIe3 link. In any case I actually very much PREFER this approach as this way you get a ton of PCIe3 lanes for various peripherals like you would have on an HEDT board. What a lot of people ignore is that if you put a PCIe3 device into a PCIe4 slot, you are essentially losing from the available bandwidth. For example, if you put a PCIE3x4 peripheral into as PCIE4x4 slot, the whole thing will behave as PCIE3x4. There is no way for the slot to intelligently operate at PCIe3x4 but its link to the CPU to use PCIE4x2 (which is equivalent in bandwidth to PCIe3x4) and leave the other PCIE4x2 worth of bandwidth for another purpose. So if you have a PCIe4x4 link and you want to use PCIe3 M.2 drives, the situation would be functionally equivalent to having a PCIe3x4 link like it has been the case for Z270, Z370, Z390 and Z490 boards. So you can only use one M.2 PCIe3 drive at full speed – if you used two (simultaneously) they would saturate the channel. With Intel’s approach this problem is solved and you can have two PCIe3x4 M.2 drives running simultaneously at full speed from the chipset.

Realistically, given the cost of PCIe4 drives (around double the price for the same capacity of PCIe3 drives), people will not be buying them to use as secondary storage drives and connect them to chipset lanes. They will only be buying 1 to connect directly to the CPU (and Intel will be offering a direct x4 PCIe4 link for that purpose) and use as their main drive to install their OS. For secondary drives a PCIe3 NVMe drive is more than enough – after all most people are still using SATA SSDs or hard drives as secondary drives.
 
Realistically, given the cost of PCIe4 drives (around double the price for the same capacity of PCIe3 drives), people will not be buying them to use as secondary storage drives and connect them to chipset lanes.
That is nothing more than the early-adopter/bleeding-edge tax. Give it a year or two and 4.0 SSDs will cost about the same or may even be cheaper than 3.0 ones due to 4.0 mass-manufacturing as 3.0 is getting phased out.
 
Remember Broadwell? Their desktop platform basically skipped it and went straight to Skylake. They could surprise us, and do it again.
Yep, I thought of mentioning that, but it's a bit different in that unlike desktop Broadwell, a full lineup is apparently planned for 10th-gen Comet Lake. Also, Comet Lake will apparently require new motherboards running a new 400-series chipset. And if the rumor cited in this article were to be believed, those would be replaced months later by another set of motherboards running a 500-series chipset? That seems rather unlikely. It seems a lot more likely that we won't be seeing those processors until well into next year, as prior leaks have suggested.
 
Also the argument about the socket is a fallacy. Just because Z490 and Z590 will be using the same LGA1200 socket it doesn’t mean they will have the same chipset or have the same limitations.
I would point to Sandybridge vs. Ivy Bridge. Both share the same socket (in fact, I run a i7-2600K CPU in an Ivy Bridge-capable HD77KC motherboard). With a Sandybridge CPU in it, the CPU-connected x16 slot is only 2.0, but it switches to PCIe 3.0, when the board is used with an Ivy Bridge CPU.
 
The output from the cpu is most likely x4 PCIe4 but is converted into x8 PCIe3 link.
That makes no sense. Why would they make the chipset x8 3.0, if the CPU's DMI link were x4 4.0? That just means you have to put additional logic on the board, to convert it. That's just wasting money & board realestate.

In any case I actually very much PREFER this approach as this way you get a ton of PCIe3 lanes for various peripherals like you would have on an HEDT board. What a lot of people ignore is that if you put a PCIe3 device into a PCIe4 slot, you are essentially losing from the available bandwidth.
I was only talking about their DMI link, which is hard-wired between the CPU and chipset. The chipset has a switch and can downgrade all the way to 1.0. But, all of its subordinate links could remain 3.0, in the current widths, and you'd get the same bandwidth to the CPU with either a x4 4.0 or x8 3.0 DMI link.
 
Also, Comet Lake will apparently require new motherboards running a new 400-series chipset. And if the rumor cited in this article were to be believed, those would be replaced months later by another set of motherboards running a 500-series chipset?
What I'm saying is that they could completely cancel Comet Lake and its boards. Right now, there are supply issues, and probably some softening of demand - at least at the upper end.
 
Performance gain from the larger uop buffer should be free though, I'd imagine, since the CPU is avoiding unnecessary work. Decoding instructions is pretty costly in terms of power.
Caches aren't too much of a problem, at worst you have to re-arrange a few things to better accommodate an extra latency cycle in the pipeline.

Things are different for structures like the re-order buffer. Skylake has a 224 entries deep re-order buffer while Ice Lake bumps it to 352 - that's the CPU looking ahead up to 352 instructions to find things to keep execution units busy with. The re-order buffer is where all data dependencies and jockeying for execution units happens, including balancing throughput between hardware threads and handling speculative execution based on branch predictor output. The more resource reservations, result retirements, execution ports, in-flight instructions, etc. you have to track and manage, the more things you need to cross-check to determine the most efficient scheduling order. If you cannot complete all of that extra work within one clock cycle, you either have to increase the scheduler's latency by one cycle or lower clocks.

Just in case you are about to reply that they can just add pipeline stages to raise clocks, both AMD and Intel tried that with Buildozer and Netburst. Both ended up in their respective manufacturers' hall of shame for failing to deliver performance and being most effective as space heaters. Deep pipelines have a too high latency and power cost for branch mispredictions and wasted work from speculative execution.
 
  • Like
Reactions: Chung Leong
What I'm saying is that they could completely cancel Comet Lake and its boards. Right now, there are supply issues, and probably some softening of demand - at least at the upper end.
Maybe, but it sounds like some 400-series boards have already been designed, and I can't imagine Intel cancelling them at this point. I am also not aware of any rumors suggesting that the lineup would be cancelled or cut down, which is something you would have probably heard about by now.

Desktop Comet Lake also sounds like a reasonably viable product that wouldn't make much sense to cancel. Restoring Hyperthreading won't really increase manufacturing costs, but will increase heavily-multithreaded performance substantially. It's at the very least more than what the 9th-gen processors offered over 8th-gen, and should be enough to give Intel all-around competitive performance relative to AMD's existing lineup. Maybe AMD will pull ahead with their 4000-series processors later in the year, but I can't see Intel rushing a release just to counter that.

The suggestion that Rocket Lake desktop CPUs will be coming this year seems about as plausible as that rumor suggesting Ryzen 3000 CPUs would launch in January of last year at suspiciously low prices. Hey, maybe it could happen, but I wouldn't on count on Rocket Lake coming to desktops within the next 12 months.
 
Maybe, but it sounds like some 400-series boards have already been designed, and I can't imagine Intel cancelling them at this point.
Intel could pull a reverse-97 on the 400: the *97 chipsets were launched to support Broadwell and Broadwell never materialized in any meaningful way on the desktop so most ended up only hosting Haswell-Refresh CPUs. Here, the *400 chipsets may launch but have no meaningful availability of CPUs for 'em until the 11th-gen launches.
 
  • Like
Reactions: bit_user
Not liking the argument because it disproves your point doesn't make it factually inaccurate.
We're not disagreeing on the facts of what happened with Conroe, but on whether your analogy applies to this scenario.

I can't definitely say it doesn't, not knowing what the Willow Cove architecture looks like, but I can tell you that Intel hasn't departed from prioritizing IPC in a way that would make it apply here. As a matter of fact, starting with Conroe, there have been IPC improvements in every generation since. So, the idea that they magically found so much more IPC, just by scrounging in between the couch cushions is hard for me to swallow.

To unlock more IPC potential, they have to go to a smaller process node, as they did with Ice Lake. Otherwise, clock speeds are going to suffer, and it's hard to believe there's enough additional IPC to be had that they can make up for it and then some. We're certainly not looking at another Core 2-style revolution.
 
We're not disagreeing on the facts of what happened with Conroe, but on whether your analogy applies to this scenario.

I can't definitely say it doesn't, not knowing what the Willow Cove architecture looks like, but I can tell you that Intel hasn't departed from prioritizing IPC in a way that would make it apply here. As a matter of fact, starting with Conroe, there have been IPC improvements in every generation since. So, the idea that they magically found so much more IPC, just by scrounging in between the couch cushions is hard for me to swallow.

To unlock more IPC potential, they have to go to a smaller process node, as they did with Ice Lake. Otherwise, clock speeds are going to suffer, and it's hard to believe there's enough additional IPC to be had that they can make up for it and then some. We're certainly not looking at another Core 2-style revolution.
At a die size of ~177mm2 Intel could add all the additional logic into a 14nm chip. Will the die increase in size, yes, but the original quad core Nehalema was 260mm2. Even adding more cores on to 14nm isn't out of the question as going from 4+2 Kaby Lake to 6+2 Coffee Lake only increase die size by about ~26mm2 and again another ~26mm2 from Coffee Lake to Coffee Lake Refresh. Does Intel want to add all that extra silicon to the newest CPU, probably not. The larger the die the greater the chance of defects and usually lower clock speeds. Not to mention they can make less CPUs out of a wafer. However, if that is what they have to do to remain competitive because 10nm just isn't working, then that is what they will do.
https://www.anandtech.com/show/13400/intel-9th-gen-core-i9-9900k-i7-9700k-i5-9600k-review - this has the numbers for the die sizes I was quoting.