News Intel's Arrow Lake performance fix is now available — another update coming next month

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
So now even CPUs need a zero-day patch...
Ehhh, my X570 platform/Zen3 was pretty flaky running at DDR4/3600 until like 3 BIOS revisions (I bought my 5950X very early), not really that unusual anymore.

Technically, though, 3600 is out of spec for the infinity fabric at the ratio the DOCP sets it on the cpu though, I guess that's not an entirely fair comparison.
 
  • Like
Reactions: KyaraM
So some things I noticed while looking up further information:
  1. Intel's post makes it sound like the PPM issue was related to 24H2 as opposed to a general Windows issue (I'm still on 23H2 due to reviewer reports upon launch):
    Symptoms: Unusual CPU scheduling behaviors; artificial performance increases when cores are manually disabled or affinitized; high run-to-run variation in benchmarks; reduced single-threaded scores or performance; intermittent DRAM latency spikes (~1.5-2.0x expected); and unexplained performance differences between Windows 11 23H2 and 24H2.
  2. It looks like the Intel ME firmware is actually finally being updated with BIOS updates. While looking through Asus, ASRock and Gigabyte BIOS updates they all referred to the BIOS update including IME firmware. Seems like this is the proper solution and one Intel really should have implemented long ago.
  3. edit: Asus has the 0x114 microcode and it includes ME 19.0.0.1827 (19.0.0.1854v2.2 or newer is required per the Intel update)
 
Last edited:
Even if Intel fixes their CPU problems , They need to be punished for releasing beta version of their CPU ... Where were the Testers and the Engineering samples ? what are you doing intel ? did you even BOTHER to test your products before releasing them . NO THANK YOU. you need to be punished.
I hope ALL skip this generation. intel really needs a hard lesson.
Was hoping intel could pull of a miracle with Arrow Lake which they don't even manufacture, and still messed it up. Bought 9800X3D, it has been out of the box stable and insanely fast, what a role reversal for AMD, to replace my 13700K. Intel lost a lifelong customer.
 
Last edited by a moderator:
I mean, seriously, we need all these tweak options just to say hello?
Can't I just ask ChatGPT or Copilot or Clippy to inspect my system and fix everything and print a report?
 
More lies from Intel.

"Intel announced that its first wave of updates to address the gaming performance issues plaguing the Arrow Lake..."

About a month ago Intel said that they will have a fix early December:

A: Back then Intel said "fix", now they are calling it "first wave of updates"
B: It's past Early December
C: The fix has already been tested and it does Eff All

Keep lying, Intel, and we will keep believing you.
 
How much more refinement do you think is even still possible? Raptor Lake is already using like their 5th gen 10nm DUV node! Canon Lake used the first gen, Ice Lake was second gen (10 nm+), Tiger Lake was 3rd gen (SuperFin), Alder Lake is 4th gen (Enhanced SuperFin), and Raptor Lake is 5th gen (Enhanced SuperFin+).


Using Intel 7 wouldn't have enabled them to provide the substantial IPC increases of Skymont and Lion Cove. The resulting cores would've been too big and power-hungry, leading to a "Rocket Lake" situation.

Also, Intel 7 was lagging on efficiency for laptop products, which is why they felt a need for Meteor Lake to swerve so sharply in the direction of improving efficiency.

I would agree that maybe Intel should've tried harder to use Intel 3 for Lunar Lake and Arrow Lake. I honestly wonder what the performance and efficiency differences would've been, compared to TSMC N3B.
The cores wouldn't have been too big as they don't have twice the transistors or anything, and 100mhz could save them at least 30mv if not 50 that's why I said lower the clocks a little. Maybe even 200mhz lower. But 3nm does sound like a better option.
 
Sounds like intel had someone on the inside screwing things up intentionally if the upto 30% is true.
That was due to a missing PPM package, which is a small thing that has big consequences:

"Missing, incomplete, or malfunctioning PPM updates can negatively impact CPU frequency ramp speed, core parking, C-State entry/exit/latency, and other functional DVFS or power management capabilities."

A lot of the things on their list probably happen quite routinely, but are normally caught in pre-release testing. That's probably the main thing lacking, here. Given how their release schedule was sliding, until less than a month before it actually launched, I'd guess they were working right up to the last minute and had no time left for internal validation.
 
  • Like
Reactions: P.Amini
....except their own slide performance was about what the original reviews were getting...so it was intended to be that bad...
I hadn't taken intel's response that reviewers numbers were matching theirs as it was intended. Similar to what TCA_ChinChin said, marketing wanted it launched by a certain date and wouldn't budge. I'd wager the engineers knew it needed more time, I've been there on product launch schedules.
 
The cores wouldn't have been too big as they don't have twice the transistors or anything,
My rough estimate of Raptor Cove is about 274.7 MTr, based on an estimated area of 8.52 mm^2 and transistor density estimates I borrowed from Alder Lake.

Lion Cove is 4.53 mm^2 and roughly 775.0 MTr, based on applying density figures from Apple's M3, which is made on the same node (N3B).

While these numbers are very fuzzy, I think it's quite clear that Lion Cove has a lot more transistors than Raptor Cove. If you have a better source, please share. From what I've seen, Intel has been very stingy about sharing transistor counts since about the Skylake era.

But, what makes CPUs on smaller nodes faster isn't just the fact of having more transistors. It's also the fact that the transistors are packed more closely together and each require less energy. If you take a design targeted at a denser node and back-port it, you lose frequency because the signal takes longer to propagate. It also becomes hotter, because the transistors are less efficient and you're exposed to more wire resistance. Finally, in order to recover some top-end frequency, they have to insert more buffers and split more things across more clock cycles, which means inserting even more transistors into the design and losing a little IPC.

These are the ingredients which resulted in Rocket Lake's poor efficiency and high power consumption. It's hard to truly decouple a design from a manufacturing node, because there's a definite area, frequency, and power window they want to hit and the microarchitecture of the core is fundamentally based on that.
 
"Intel's Application Performance Optimizer (APO) boosts game performance in several game titles. This software utility is now automatically installed by default in Windows"
-Don't have to install an app to make the CPU work properly on AMD

You do if you're using a dual CCD AMD CPU. From the Techpowerup 9900X review:

Just two days ago, AMD notified us that "Ryzen 9 9900X and 9950X have Windows Game Mode core parking optimizations installed by the AMD PPM Provisioning File Driver," and that "Windows game mode must be enabled," "Xbox Game Bar must be enabled and up to date in the Microsoft Store," "the legacy Control Panel Power Options must be set to the default 'Balanced' scheme," and that "sometimes Windows does not apply the correct provisioning after the CPU installed has changed. You can try uninstalling then re-installing the AMD Chipset Driver as a workaround, but a fresh install of Windows is ideal." That's a lot of hoops to jump through for a dual CCD processor. Yes, we tested both 9900X and 9950X with those optimizations enabled. Previously this was required only for the dual CCD X3D models, so that games could be pushed onto the cores with 3DV cache. On Zen 5, AMD is using the same mechanism to improve game thread allocation, probably to put them on the cores with the highest default clocks. The difference should be pretty small though. I doubt it's more than a few percent and I probably would rather get rid of Game Bar instead.


Note the part where AMD recommends reinstalling Windows if the driver doesn't appear to be working after initial install. That's much worse than just having to use an automatically installed app.
 
You do if you're using a dual CCD AMD CPU. From the Techpowerup 9900X review:

Note the part where AMD recommends reinstalling Windows if the driver doesn't appear to be working after initial install. That's much worse than just having to use an automatically installed app.
Yup. AMD CPUs, particularly certain X3D models (dual-CCD) must have the AMD Chipset Driver installed. Even worse, once installed for a dual-CCD CPU, if you swap to a single-CCD CPU (like 7800X3D and 9800X3D) you lose performance and have to completely reinstall the OS! Paul covered this in an article previously.
 
Yup. AMD CPUs, particularly certain X3D models (dual-CCD) must have the AMD Chipset Driver installed. Even worse, once installed for a dual-CCD CPU, if you swap to a single-CCD CPU (like 7800X3D and 9800X3D) you lose performance and have to completely reinstall the OS! Paul covered this in an article previously.
yeah that's bad, which is why the 9800X3D is ideal
 
  • Like
Reactions: newtechldtech
My rough estimate of Raptor Cove is about 274.7 MTr, based on an estimated area of 8.52 mm^2 and transistor density estimates I borrowed from Alder Lake.

Lion Cove is 4.53 mm^2 and roughly 775.0 MTr, based on applying density figures from Apple's M3, which is made on the same node (N3B).

While these numbers are very fuzzy, I think it's quite clear that Lion Cove has a lot more transistors than Raptor Cove. If you have a better source, please share. From what I've seen, Intel has been very stingy about sharing transistor counts since about the Skylake era.

But, what makes CPUs on smaller nodes faster isn't just the fact of having more transistors. It's also the fact that the transistors are packed more closely together and each require less energy. If you take a design targeted at a denser node and back-port it, you lose frequency because the signal takes longer to propagate. It also becomes hotter, because the transistors are less efficient and you're exposed to more wire resistance. Finally, in order to recover some top-end frequency, they have to insert more buffers and split more things across more clock cycles, which means inserting even more transistors into the design and losing a little IPC.

These are the ingredients which resulted in Rocket Lake's poor efficiency and high power consumption. It's hard to truly decouple a design from a manufacturing node, because there's a definite area, frequency, and power window they want to hit and the microarchitecture of the core is fundamentally based on that.
Wikichip fuse has the relative node densities but most of ARL is n5, n6 and sram. There would not be that big of a difference in total chip size as you are implying, even on Intel 7 it wouldn't be that big. Also Intel used to make node specific chip designs back in the rocket lake days that didn't transfer between nodes well. Now they are making node agnostic designs. How close to Intel 2 is n3b? And yet it works. On very short conversion time. Intel 7 is likely more like Intel 2 than n3b.
And once again shaving the top clocks can improve power consumption drastically. I bet in efficiency measured per unit compute a tuned Raptor can beat a stock Arrow, just wouldn't be quite as productive. Also the node wasn't the only efficiency improvement on ARL. But they are still fairly close in efficiency even though the node difference looks huge on paper.
 
most of ARL is n5, n6 and sram. There would not be that big of a difference in total chip size as you are implying,
The main point of my post was how it's not just raw transistor counts that matter. Backporting Ice Lake to 14nm to form Rocket Lake is exactly the sort of thing you're talking about here. That didn't work well. Building Lion Cove on Intel 7 would be even worse.

Also Intel used to make node specific chip designs back in the rocket lake days that didn't transfer between nodes well. Now they are making node agnostic designs.
As I explained, they can't be truly node-agnostic, because a lot of the decisions at the microarchitecture level are tradeoffs to try and balance power, cost, and performance. Those are directly tied to what node it's being made on.

I think what Intel really meant is that they're using a standard backend toolchain, so they can take a design targeted at one of the IFS nodes and instead target it at a comparable TSMC node. Retargeting to a substantially different node isn't something these tools can simply paper over. It might technically work, but that doesn't mean it'll hit the requisite sweet spot on that node.

How close to Intel 2 is n3b? And yet it works. On very short conversion time.
That's misleading, I think. Lunar Lake was always targeted at TSMC N3B. The compute tile of Arrow Lake has the same cores. So, the decision to use N3B for Arrow Lake probably involved taking the work they did for Lunar Lake and just changing the core counts + layout, as well as adapting the ring bus to interface with their existing I/O tile from Meteor Lake (which they always planned to reuse).

I expect most of the work they did to fab Arrow Lake on Intel 20A just went into the trash.

shaving the top clocks can improve power consumption drastically.
Intel isn't dumb. If they could've simply used Intel 7, it would've been much more profitable for them. Trust they had good reasons for using a smaller node.

So, here's what Intel showed about perf/W scaling on Lion Cove (TSMC N3B) vs. Redwood Cove (Intel 4):

2KrGtKU6cUPA3Psp3X3Hk9.jpg


Redwood Cove has similar IPC to Raptor Cove, but presumably better power-efficiency, due to using the Intel 4 node.
 
Last edited:
The main point of my post was how it's not just raw transistor counts that matter. Backporting Ice Lake to 14nm to form Rocket Lake is exactly the sort of thing you're talking about here. That didn't work well. Building Lion Cove on Intel 7 would be even worse.
Even if Lion Cove wasn't a lot larger than Raptor Cove (and it's possible this would be the case since AVX512 and HT aren't in client LC) Skymont would be significantly larger than Gracemont.
 
For many, it is a big deal.
It can be, but I also understand that for significant parts change it's always better to get a fresh windows install, habit back in the days of Win ME and stupid driver crashes... I am not saying that's a good thing, but at least there's a known solution to get the perfect performance, while for the current state of Arrow Lake, it's deja vu of the fix after fix of degradation issue of the Raptor Lake gen... RPL at least, for degradation you need time for it to creep in, while performance fix is immediately noticeable whether it did work, and as the salvation product from the float sand trap of RPL, also as the new Ultra branding, it is a bad and boardering disaster launch product
 
while for the current state of Arrow Lake, it's deja vu of the fix after fix of degradation issue of the Raptor Lake gen...
If you go look at TPU's initial updated testing almost everything is to fix Win 11 24H2 (and still hasn't fixed everything vs 23H2) rather than something to do with ARL in particular. Perhaps the BIOS/IME update in January will change this, but as of right now it seems to just be more problems with 24H2 which is hardly limited to Intel/ARL.

This is a part of their conclusion:
It seems that Intel did not "fix Arrow Lake," but they "fixed Arrow Lake on 24H2".

Given the very real world performance increase from CDPR updating CP2077 on their own I get the impression a chunk of the performance (or at least outliers) are in how games handle scheduling.
 
If you go look at TPU's initial updated testing almost everything is to fix Win 11 24H2 (and still hasn't fixed everything vs 23H2) rather than something to do with ARL in particular. Perhaps the BIOS/IME update in January will change this, but as of right now it seems to just be more problems with 24H2 which is hardly limited to Intel/ARL.

This is a part of their conclusion:


Given the very real world performance increase from CDPR updating CP2077 on their own I get the impression a chunk of the performance (or at least outliers) are in how games handle scheduling.
That actually didn't matter, it's the timing for all these negativity, for most of intel's core customers where their intention of Core Ultra branding didn't dig that deep into the details, and the negativity due to the problems are enough to sway them away.

And for the game optimization part, yes it is how threads are handled, but the games and engines are out before ARL is out, so in their internal testing, either they have to use the launch day performance as in their promotion materials, or they need to announce the setting details to make the CPU perform as is. It's just happen to be worse than it is when their promotional material isn't bright with overall regression compared to Raptor Lake in the first place.
 
The main point of my post was how it's not just raw transistor counts that matter. Backporting Ice Lake to 14nm to form Rocket Lake is exactly the sort of thing you're talking about here. That didn't work well. Building Lion Cove on Intel 7 would be even worse.


As I explained, they can't be truly node-agnostic, because a lot of the decisions at the microarchitecture level are tradeoffs to try and balance power, cost, and performance. Those are directly tied to what node it's being made on.

I think what Intel really meant is that they're using a standard backend toolchain, so they can take a design targeted at one of the IFS nodes and instead target it at a comparable TSMC node. Retargeting to a substantially different node isn't something these tools can simply paper over. It might technically work, but that doesn't mean it'll hit the requisite sweet spot on that node.


That's misleading, I think. Lunar Lake was always targeted at TSMC N3B. The compute tile of Arrow Lake has the same cores. So, the decision to use N3B for Arrow Lake probably involved taking the work they did for Lunar Lake and just changing the core counts + layout, as well as adapting the ring bus to interface with their existing I/O tile from Meteor Lake (which they always planned to reuse).

I expect most of the work they did to fab Arrow Lake on Intel 20A just went into the trash.


Intel isn't dumb. If they could've simply used Intel 7, it would've been much more profitable for them. Trust they had good reasons for using a smaller node.

So, here's what Intel showed about perf/W scaling on Lion Cove (TSMC N3B) vs. Redwood Cove (Intel 4):
2KrGtKU6cUPA3Psp3X3Hk9.jpg

Redwood Cove has similar IPC to Raptor Cove, but presumably better power-efficiency, due to using the Intel 4 node.
Why isn't the M4 also another Rocket Lake?

I'm not saying Intel 7 was the best possible choice for Arrow Lake, my original point in using it was that there was a lot of combined failures in the execution of bringing Arrow Lake to market and that ARL's performance in typical client use could have been better even using the worst case scenario of Intel 7(and its associated drawbacks) if they would have kept the memory controller on the compute tile. Would it have been perfect? No. But as a worst case it could have still been better than whatever benefits were achieved by using the underperforming tile layout they chose.

Why not keep latency low for the latency sensitive stuff and just tile off things like the iGPU and SOC stuff?
Maybe it is a step towards using AMD's model for compute chiplets with the somewhat coherent memory access. I think client is the wrong market for that though. Intel already makes different stuff for servers so keep that there and maybe release one of the server chips for HEDT if there is sufficient demand.

The uplift in average thread IPC in ARL combined with CUDIMMs and the extra 50% L2 on the fast cores should have been enough to make RPL look like Skylake when compared to Arrow Lake but instead we got bungled node execution, bungled CPU component layout and bungled software and firmware deployment and wound up with a year or two of a CPU competing with the refresh of a previous generation. This seems like the sort of thing that the chief executive officer of a company might have to take responsibility for.

a little rant
Even if ARL on Intel 7 had to drop the clocks 10% from RPL (just turning off the HT should have brought them enough thermal and power headroom for the IPC increase and the e-cores may already have a lot thermal and power density headroom on RPL for the IPC increase to ARL so a 10% clock reduction if produced on Intel 7 is probably more than needed) the Lion Cove would have been faster, the slowest threads would have been like Alder P, the memory latency would have been DDR4 levels with over DDR5 bandwidth and the L2 would have had a lot more hits if they would have kept the memory controller with the cores and RPL would only be able to compete in games that liked 32 slow threads over 24 faster ones and didn't care about memory latency. I can't think of any but there are probably a few. ARL also would have been snappier and more responsive with typical client pc tasks.