News Panther Cove will reportedly arrive with big IPC improvements, support for Intel APX

federal

Distinguished
May 23, 2009
5
5
18,515
Intel returning to "tick tock" releases? They've kind of caught up to AMD and TSMC after a years long mad scramble (assuming 18a really has been proven at this point and is entering early production). Maybe this is a sign that they're feeling more confident and settling in for a more predictable and sustainable future. For their sake, I hope not too confident...
 

ThomasKinsley

Notable
Oct 4, 2023
385
384
1,060
It's not great that Intel is pinning all its hopes on 18A. I guess we'll see how that works out. On a side note, I always felt it was better to buy during Intel's "tick" cycle. The toks seemed like a wasted year to me.
 

Mama Changa

Proper
Sep 4, 2024
83
56
110
Cougar cove is the all new P core for Panther Lake, it is not a tweak of Lion cove. Panther cove would have to be for Nova Lake in late 2026, and then would be on 16A or 14A I presume, as Panther Lake is on 18A along wth Clear Water Forest.

What would you prefer they pin their hopes on? 18A if successful is more advanced than anything TSMC has and will put Intel more than a generation ahead of AMD that only gets N3 on Zen 6 and N2 for Epyc maybe in late 2026 early 2027. AMD won't get n2 for desktop until Zen 7 at which point Intel will be on 14A.
 
  • Like
Reactions: rtoaht

Thunder64

Distinguished
Mar 8, 2016
202
286
18,960
Cougar cove is the all new P core for Panther Lake, it is not a tweak of Lion cove. Panther cove would have to be for Nova Lake in late 2026, and then would be on 16A or 14A I presume, as Panther Lake is on 18A along wth Clear Water Forest.

What would you prefer they pin their hopes on? 18A if successful is more advanced than anything TSMC has and will put Intel more than a generation ahead of AMD that only gets N3 on Zen 6 and N2 for Epyc maybe in late 2026 early 2027. AMD won't get n2 for desktop until Zen 7 at which point Intel will be on 14A.

Because Intel has executed so well with processes this past decade, including most recently canceling 20A and fabbing ARL at TSMC.

I’ll believe it when I see it.
 

Kondamin

Proper
Jun 12, 2024
122
78
160
I doubt things will be going as smooth as they have for the next couple of years that things will grind to a halt at 2nm and that were in for decade of growing cowos for those that need more computing power.

So this is probably going to be an 18a+ product maybe with a n2w gpu
 
  • Like
Reactions: bit_user

rluker5

Distinguished
Jun 23, 2014
903
575
19,760
I doubt things will be going as smooth as they have for the next couple of years that things will grind to a halt at 2nm and that were in for decade of growing cowos for those that need more computing power.

So this is probably going to be an 18a+ product maybe with a n2w gpu
That seems like a reasonable expectation for standard EUV seeing as how the world was hung up with DUV at 14nm and barely got to 10nm and renamed it. And it already seems like the same thing is happening with EUV.
But the high NA EUV is a sightly smaller step from EUV than EUV was from DUV so there should be some more room for improvement for fabs using high NA.
https://www.anandtech.com/show/17415/asmls-highna-update-coming-to-fabs-in-2024-2025
 

rluker5

Distinguished
Jun 23, 2014
903
575
19,760
Also Intel will need a more efficient node so they can realistically run those higher IPC designs at reasonable clockspeeds.
Remember Rocket Lake? It's a good example of increasing IPC too much without offsetting the extra power needed with a more efficient process.
 
  • Like
Reactions: bit_user

Kondamin

Proper
Jun 12, 2024
122
78
160
Kewl. What is IPC and why do I want it?
You are a member since 2017 but don’t know IPC?

Instructions per clock, the amount of work the processor can do per tick of the clock.
Like a truck carrying 5 tonnes of goods @70speed is able to transport more goods per unit of time than a sports car carrying a quarter even if it’s going 300speed because it needs to make more round trips.

In cases where you just need to transport a quarter of a tonne that raw speed is better
In other cases you want a truck.

Ideally we would have operating systems and chips that could figure out if a task is more suited for a slow ipc heavy core or it’s better handled by a fast low ipc core

Let’s say the fast one reading out data from a series of sensors and placing it in a spreadsheet the other one doing transformations with the total collected data.
 
  • Like
Reactions: JRStern

Kondamin

Proper
Jun 12, 2024
122
78
160
Also Intel will need a more efficient node so they can realistically run those higher IPC designs at reasonable clockspeeds.
Remember Rocket Lake? It's a good example of increasing IPC too much without offsetting the extra power needed with a more efficient process.
Rocket lake would have been good if intel introduced it the moment they realised they had a problem and hit 14nm++
They should have added all that to coffee lake when we were still used to getting 4 core CPUs
 

ottonis

Reputable
Jun 10, 2020
220
190
4,760
That's not exactly surprising news as Intel has literally no other choice than to keep on massively improving power-efficiency while further enhancing the performance beyond ARM/Qualcomm, Apple Silicon, and AMD. So, a redesign of the P-core is definitely in place in order to keep pace with the competition and ensure the x86 architecture stays relevant for a few more years.
The only thing that irritates me a little is the question of why Intel considers it necessary to let the competition look so far in advance into its cards... and this has most certainly to do with Intel's current financial and technological struggles (see the abandoned 20A process node etc), so they may feel the urge to feed the stock analysts with some sort of positive outlook.
Well, considering that Panther Cove is in early developmental stages, AMD and Co. will certainly carefully listen what Intel is up to and maybe incorporate similar or even better strategies by themselves.
 

bit_user

Titan
Ambassador
On a side note, I always felt it was better to buy during Intel's "tick" cycle. The toks seemed like a wasted year to me.
There's a reason for the tick-tock cadence. For the ticks to deliver real gains, they need the additional density afforded by node improvements, which historically couldn't proceed on a yearly basis. Also, new nodes tend to have lower yields, so porting the same basic microarchitecture to a new node, before making a bunch of changes that increased transistor usage (and thereby die area) also made a lot of sense. There were probably other benefits, like gaining experience with the node and filling out the cell libraries for it.

Even though Intel says its designs are now based more on standard EDA tooling (somewhat decoupling them from choice of node), I think it's still too demanding to expect major architectural improvements every generation.

FWIW, Raptor Lake was sort of a "tock". Leaving aside the degradation issues (pretty hard to overlook, I know), it was a pretty epic one, at that. Meteor Lake couldn't outperform it, which is why we got a second year of it.
 
Last edited:

bit_user

Titan
Ambassador
Kewl. What is IPC and why do I want it?
As @Kondamin said, it stands for "Instructions Per Clock", although I hate that name. As currently used, it doesn't actually represent a specific instruction count. Rather, it's used to represent the relative clock-normalized performance between two different CPUs.

It roughly corresponds to the microarchitectural sophistication of a core, but can be influenced by things like cache sizes/architecture, memory speed & latency, as well as new instructions (e.g. AVX-512). In the case of APX, one benefit is coming from eliminating some memory stores & loads of intermediate values, so it's really an oxymoron to say you're "increasing IPC" by eliminating some of the instructions! ; )

In a simplistic way, performance of a CPU is modeled as: IPC * clock_speed. It's sort of nonsensical, in the abstract, but becomes more meaningful when comparing the performance of two different CPUs. Once you account for the difference in real clock speeds, pretty much the rest of the performance difference is regarded as a difference in IPC.

IPC is also a squirrely figure, because it's classically measured on single-threaded code, yet you almost never get linear scaling across multiple cores.

It stands for "Intel's Poor Choices".
: D
 
Last edited:

bit_user

Titan
Ambassador
The article said:
The most interesting tidbit about Panther Cove is the addition of Intel APX. APX stands for Intel Advanced Performance Extensions and serves as an extension of the entire x86 instruction set. According to Intel, APX adds more registers and various new features that improve general-purpose performance, without significantly increasing power consumption or silicon area.

Specifically, APX doubles the amount of general-purpose registers from 16 to 32, which allows the compiler to keep more values in registers. APX also adds new conditional forms of load, store, and compare/test instructions to help offset the performance issues of out-of-order CPUs, which take advantage of branch predictors. These conditional forms purportedly cut down on the number of branches that may incur misprediction penalties.
Here's the article where APX was announced. It goes into a little more detail:

It really should've gotten its own article, rather than being tacked onto the AVX10 announcement.

Regarding the increase in general-purpose registers, APX catches up to 64-bit ARM and RISC-V. For those who like to parrot the claim that "ISA doesn't matter", APX is literally Intel contradicting you! We will see just how much ISA doesn't matter, once we can compare between "APX on" vs. "APX off".
 
  • Like
Reactions: -Fran-

bit_user

Titan
Ambassador
The only thing that irritates me a little is the question of why Intel considers it necessary to let the competition look so far in advance into its cards...
When we're talking about new technologies, like APX and AVX10, Intel needs to give software developers lots of advanced notice about when CPUs implementing them might launch. Otherwise, launch day will come and Intel will have these fancy new chips with virtually no software that can showcase their benefits!

Regarding their product roadmap, some of that is flirting with customers, partners, and investors to keep them interested. Intel's competitors can probably deduce a lot of this info from leaks and rumors, so I think they're not giving away too much with their announcements. IMO, the biggest risk they have from being public about their roadmaps is potentially drawing attention to how far behind they really are, when they suffer delays or cancel products.

Well, considering that Panther Cove is in early developmental stages, AMD and Co. will certainly carefully listen what Intel is up to and maybe incorporate similar or even better strategies by themselves.
According to some details shared around the launch of Raptor Lake, the CPU development pipeline is something like 24 to 30 months. Whatever AMD will put up against Panther Cove is likely already locked down, by now. Otherwise, don't you think AMD would've added AVX10.1 to Zen 5? For a CPU that already implements a fairly recent/complete subset of AVX-512, adding support for AVX10.1 is a rather trivial tweak. Yet, even having more than 1 year since the announcement of AVX10 wasn't enough time for AMD. I'll bet Zen 6 will support it, though.
 
Hm... Reading on APX, it'll definitely be interesting to see how any apps re-compiled to take advantage of it perform compared to "traditional X86" ones. Better control on registers is kind of big and softening misprediction penalties can be really important for future uArchs.

The differences, at a high level, read subtle, but important enough to be more relevant than the whole AVX XYZ-bit push of the last decade. Intel should've done this earlier, but good it's here at least.

Regards.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
Hm... Reading on APX, it'll definitely be interesting to see how any apps re-compiled to take advantage of it perform compared to "traditional X86" ones.
One interesting decision they made was just to have an extra instruction prefix byte. This means you can mix it with standard x86-64 code inside the same process, which opens the door for optimized runtime libraries.

JIT-compiled code, such as what powers the client-side web experience, is another avenue for potentially reaping immediate benefits.
 
  • Like
Reactions: -Fran-

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
According to every test I've seen pcores are way ahead of the competition in both efficiency and performance. How much better can they get?
 

bit_user

Titan
Ambassador
With Lunar Lake, Intel has the most efficient x86 CPU ...
I didn't see anything in the video to substantiate that claim. I went to the section marked "Efficiency Testing", but I don't see any graphs or data being presented. Please provide timestamps of anything in that ~1 hour video you'd like us to see, because there's no way I'm watching the whole thing and even trying to search through and parse the transcript is a bit of a chore, when I don't even know what I'm looking for.

Now, the reason I'm interested is that I've been curious about Lunar Lake, and the lone review on this site really left a lot to be desired. So, I thought I'd head on over to NotebookCheck and see what they found. They reviewed the Core Ultra 7 258V, which seems like it should be within the efficiency sweet spot, as it's a few steps down from the highest spec Lunar Lake model.

In single-thread efficiency (using an external monitor), they got 5.36 points/W on CB 24. This indeed edges out the best AMD model they tested, which was the Ryzen AI 9 365 at 4.01 points/W, but came in below all of the Snapdragon X models, which ranged from 6.32 to 8.32 points/W, and the Apple M3, which scored a whopping 12.7 points/W!!

In multi-thread efficiency, it got 17.7 points/W, as compared with Ryzen AI HX 370's 19.7 points/W. Snapdragon X mostly scored higher, with a range of 17.3 to 22.2 points/W. Again, the Apple M3 smashed everyone with 28.3 points/W!

When tweaking with different configurations, the Ryzen AI 9 HX 370 got 25.2 points/W, when limited to just 15 W. The best result showed by the 258V was 19.3 points/W in "whisper mode".

Apple and Qualcomm's efficiency is even more impressive, when you consider that two Snapdragon X models and the Apple M3 beat it on single-threaded performance. So, it really does need that qualifier you used of being the most efficient x86 CPU. We should probably also mention something about lightly-threaded workloads, because the stock Ryzen AI HX 370 beat even Lunar Lake's "whisper mode" on multi-threaded efficiency!

Caveats:
  1. Since Notebook Check only analyzed one model, we can't really make pronouncements about the efficiency of the entire model range.
  2. Cinebench is only one workload. In the article, they did test performance on others, but only have efficiency data for CB24 singe & multi.
  3. I should also note that the Ryzen AI 9 HX 370 consistently outperformed the 258V on the web benchmarks, so I have to wonder how its efficiency would compare on those.

FWIW, I didn't dig into the iGPU (but this post is already long enough).
 
Last edited:
  • Like
Reactions: ottonis

JRStern

Distinguished
Mar 20, 2017
172
65
18,660
You are a member since 2017 but don’t know IPC?

Instructions per clock, the amount of work the processor can do per tick of the clock.
Like a truck carrying 5 tonnes of goods @70speed is able to transport more goods per unit of time than a sports car carrying a quarter even if it’s going 300speed because it needs to make more round trips.

In cases where you just need to transport a quarter of a tonne that raw speed is better
In other cases you want a truck.

Ideally we would have operating systems and chips that could figure out if a task is more suited for a slow ipc heavy core or it’s better handled by a fast low ipc core

Let’s say the fast one reading out data from a series of sensors and placing it in a spreadsheet the other one doing transformations with the total collected data.
Thanks.
I know about fifty IPCs and couldn't make it out in context, and thought it was supposed to be a standard convention in any writing to spell it out at first use.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
I didn't see anything in the video to substantiate that claim. I went to the section marked "Efficiency Testing", but I don't see any graphs or data being presented. Please provide timestamps of anything in that ~1 hour video you'd like us to see, because there's no way I'm watching the whole thing and even trying to search through and parse the transcript is a bit of a chore, when I don't even know what I'm looking for.

Now, the reason I'm interested is that I've been curious about Lunar Lake, and the lone review on this site really left a lot to be desired. So, I thought I'd head on over to NotebookCheck and see what they found. They reviewed the Core Ultra 7 258V, which seems like it should be within the efficiency sweet spot, as it's a few steps down from the highest spec Lunar Lake model.

In single-thread efficiency (using an external monitor), they got 5.36 points/W on CB 24. This indeed edges out the best AMD model they tested, which was the Ryzen AI 9 365 at 4.01 points/W, but came in below all of the Snapdragon X models, which ranged from 6.32 to 8.32 points/W, and the Apple M3, which scored a whopping 12.7 points/W!!

In multi-thread efficiency, it got 17.7 points/W, as compared with Ryzen AI HX 370's 19.7 points/W. Snapdragon X mostly scored higher, with a range of 17.3 to 22.2 points/W. Again, the Apple M3 smashed everyone with 28.3 points/W!

When tweaking with different configurations, the Ryzen AI 9 HX 370 got 25.2 points/W, when limited to just 15 W. The best result showed by the 258V was 19.3 points/W in "whisper mode".

Apple and Qualcomm's efficiency is even more impressive, when you consider that two Snapdragon X models and the Apple M3 beat it on single-threaded performance. So, it really does need that qualifier you used of being the most efficient x86 CPU. We should probably also mention something about lightly-threaded workloads, because the stock Ryzen AI HX 370 beat even Lunar Lake's "whisper mode" on multi-threaded efficiency!

Caveats: Cinebench is only one workload. In the article, they did test performance on others, but only have efficiency data for CB24 singe & multi. I should also note that the Ryzen AI 9 HX 370 consistently outperformed the 258V on the web benchmarks, so I have to wonder how its efficiency would compare on those.

FWIW, I didn't dig into the iGPU (but this post is already long enough).
There are multiple battery life tests where the Intel cpus (with smaller battery than the competiton) are on the top of the chart. I posted this a few days ago

Here is one, minute 9:51


And here is a battery life test, the 2 Intel laptops have smaller battery than the amd laptops, 5:15 and 6:31 minute mark