News 13th and 14th Gen Intel CPU instability also hits servers — W680 boards with Core i9 K-series chips are crashing

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
Intel's own Turbo Boost technology, running within specified boost limits will boost according to how heavy the workload is and will only boost to 5.6 on light workloads while the heaviest of workloads will run at close to base clocks.
That is not true though. The CPU will try to boost to 5.7 even in the heaviest of workloads. Of course power limit might get in the way depending on how you configured it.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
In your own words, you said:
"actually he says that the most stable settings was to set a max multiplier of 53, for 5.3Ghz"​

If it's the max multiplier, then it's not a fixed multiplier, as you're now claiming.

And no, I don't believe for 1 second that it will override the PL settings!
In the video he doesn't specify what settings they used exactly. There are multiple ways to over or under clock Intel chips, changing the Turbo ratios, boosting specific cores, putting an all core limit etc. I assume they left the option at AUTO and just typed in a 5.3, which basically makes the CPU run at 5.3 under all scenarios, multi or single, again assuming power and amp limits don't get in the way.
 

bit_user

Titan
Ambassador
I assume they left the option at AUTO and just typed in a 5.3, which basically makes the CPU run at 5.3 under all scenarios, multi or single, again assuming power and amp limits don't get in the way.
Well, if it would normally run up to 5.7 GHz, except for power limits, then 5.3 GHz isn't an overclock. Right?? Terry is calling it an overclock.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
Well, if it would normally run up to 5.7 GHz, except for power limits, then 5.3 GHz isn't an overclock. Right?? Terry is calling it an overclock.
Not enough data to answer that. Technically, if you either exceed maximum clockspeeds (5.7 ghz), maximum TDP (253w) or maximum amps (307) it's an overclock. So running 5.3 is not an overclock as long as you hit these clocks within the above TDP and amperage.

Now for my personal opinion, the CPU shouldn't be freaking crashing at 5.3 ghz even if it exceeds TDP and amp limits. Especially since we've been exceeding tdp and amp limits for the last 3-4 generations of Intel chips and it has been fine.
 
In the video he said 50% of Intel chips used in servers crash once every week running 24/7, or something like that, right? If the mobos are running 253w then this shouldn't be happening, not even once a month. If they are running with no power limits then sure, that's a different thing.
While the Asus boards can run 253W, and potentially more, the Supermicro ones can't and he said the failure rates were the same between the two. So even if somehow the Asus ones were over TDP that couldn't be the problem that's manifesting here.
 
That is not true though. The CPU will try to boost to 5.7 even in the heaviest of workloads. Of course power limit might get in the way depending on how you configured it.
Yeah technically it will even try to boost to 6ghz on any and all workloads, my point was more like what it will be able to boost to not what it will try to boost to.
 
Last edited:
Well, if it would normally run up to 5.7 GHz, except for power limits, then 5.3 GHz isn't an overclock. Right?? Terry is calling it an overclock.
5.8 actually, and that's the turbo 3 clock

Since you decided to pretend to not know anything about computers all of the sudden go ahead and read this.
"For workloads that predominantly rely on CPU, short duration workloads are bound by the ‘turbo’ operations, whereas sustained workloads are bound by the ‘base’ parameters."
https://www.intel.com/content/www/us/en/support/articles/000098324/processors.html
Which processor specification is more important for performance, ‘base’ or ‘turbo’?
All aspects of the processor are important for performance.

Intel has multiple compute engines within the processor, with the CPU being analogous to the traditional processor. For workloads that predominantly rely on CPU, short duration workloads are bound by the ‘turbo’ operations, whereas sustained workloads are bound by the ‘base’ parameters. Many workloads tend to be a mixture of some ‘turbo’ and some ‘base’ operations. Intel processors provide compelling end user experiences that utilize various aspects of the processor. Performance hybrid architecture, multiple compute engines, multiple IO ports, and IPs intelligently share resources within a single chip while operating across a wide dynamic operating range of power and frequency. All aspects of the processor ultimately contribute to the desired spectrum of performance in the real-world.
 

bit_user

Titan
Ambassador
Since you decided to pretend to not know anything about computers all of the sudden go ahead and read this.
"For workloads that predominantly rely on CPU, short duration workloads are bound by the ‘turbo’ operations, whereas sustained workloads are bound by the ‘base’ parameters."
It's funny you claim I'm acting like I don't know anything, because that article is aimed at people who are really clueless. It's oversimplifying to the point of being irrelevant to this discussion.

Troll harder, I guess. Because, so far, you're only making a fool of yourself.
 
Jul 12, 2024
20
19
15
Not only did Intel do architecture-level tweaking, as mentioned by @Eximo , but they also used a modified process node. That's the main way the were able to achieve higher boost clocks.
Raptor-Lake-Slides_28_crop.png

It could also be that modified process node that's the culprit behind these apparent cases of accelerated silicon aging.
I thought about process node as well. Perhaps Intel haven’t ironed out all the problems with it’s foundry and those high performance chips suffer from it. It’d be interesting, as you mentioned, to see Xeon error rates.
 

bit_user

Titan
Ambassador
I thought about process node as well. Perhaps Intel haven’t ironed out all the problems with it’s foundry and those high performance chips suffer from it. It’d be interesting, as you mentioned, to see Xeon error rates.
Speaking of which, I wonder just what's the reach of this problem. Could it affect even Emerald Rapids?
 

DS426

Upstanding
May 15, 2024
254
190
360
Come on folks, even if OC is supported on these boards, I really doubt anyone except a number that I can count on my hands are doing it. While these are consumer-grade CPU's and not server-grade like Xeons, if the application is server / professional use, who's going to bother squeeking out a few hundred extra MHz and risk instability, increase power consumption, etc. for a little more performance?

To me, this just shows that there's still a root-cause problem that Intel still hasn't figured out. Unfortunately for big blue, this nightmare is still continuing to get worse.
 
"Level1Techs highlights one server provider that is charging more than $1,000 more for its Core i9-14900K-based servers compared to its Ryzen 9 7950X-powered servers for labor and onsite repair services alone ($139 vs. $1,280 for the 7950X and 14900K)."

Who the hell compares a $139 price to a $1280 saying the latter is "$1000 more expensive" rather than saying the latter is "9 times more expensive"? Come on!
 

TJ Hooker

Titan
Ambassador
Intel's specs have the exact same limitation as the B series of just memory overclocking. It's the same physical chipset as Z690 so motherboard manufacturers can circumvent that and enable everything if they want to. I just assume Intel just doesn't care about that as tends to be their attitude towards motherboard manufacturers.
Do you have a link to the specs where it says it only supports memory OC?

Also:
"While the last couple of W-series chipsets allowed users to overclock the memory on supported processors, Intel has opened this up to unlocking the core frequency when combined with its K-series processors on W680."

https://www.anandtech.com/show/17308/the-intel-w680-chipset-overview-ecc-for-alder-lake-workstations
 
Jul 12, 2024
20
19
15
Speaking of which, I wonder just what's the reach of this problem. Could it affect even Emerald Rapids?
I had to look up the code name. I’m so confused with Intel’s naming scheme, everytime I have to check what it actually is.
I checked the lithography for i9-13900 and i9-14900. The first is Intel 10 and second Intel 7. Perhaps forced to be competitive with AMD and their ZEN’s power/wat efficiency they pushed those CPU’s beyond what the node is capable at the moment? Intel 14 (and 12?, correct me if I’m wrong), was very mature and Intel really pushed those processors to the limit. Xeon might be not affected as much, due to lower clocks, but those run 24/7 and utilisation is high. It’s odd only i9 is reported, but not so much for Xeon’s.
 
Apr 11, 2024
8
10
15
I had to look up the code name. I’m so confused with Intel’s naming scheme, everytime I have to check what it actually is.
I checked the lithography for i9-13900 and i9-14900. The first is Intel 10 and second Intel 7. Perhaps forced to be competitive with AMD and their ZEN’s power/wat efficiency they pushed those CPU’s beyond what the node is capable at the moment? Intel 14 (and 12?, correct me if I’m wrong), was very mature and Intel really pushed those processors to the limit. Xeon might be not affected as much, due to lower clocks, but those run 24/7 and utilisation is high. It’s odd only i9 is reported, but not so much for Xeon’s.

Both the 13900 and 14900 are on a tweaked version of Intel 7. The 12th gen CPUs were on the original Intel 7 process. So far the problem seems to mainly be with the higher clocked CPUs (i9s but some i7s as well), so Xeons don't seem to be effected as they run lower max clocks.
 
Do you have a link to the specs where it says it only supports memory OC?

Also:
"While the last couple of W-series chipsets allowed users to overclock the memory on supported processors, Intel has opened this up to unlocking the core frequency when combined with its K-series processors on W680."

https://www.anandtech.com/show/17308/the-intel-w680-chipset-overview-ecc-for-alder-lake-workstations
https://www.intel.com/content/www/us/en/products/sku/218834/intel-w680-chipset/specifications.html

https://ark.intel.com/content/www/us/en/ark/products/218834/intel-w680-chipset.html

contrast with (they're the same physical chipset):
https://ark.intel.com/content/www/us/en/ark/products/218833/intel-z690-chipset.html

https://www.intel.com/content/www/us/en/products/sku/218833/intel-z690-chipset/specifications.html

edit: As I said before Intel clearly doesn't care about it being unlocked. It may also be setup that way on purpose so that the big enterprise customers can choose whether or not to allow overclocking.
 
Last edited:
  • Like
Reactions: TJ Hooker

bit_user

Titan
Ambassador
Both the 13900 and 14900 are on a tweaked version of Intel 7. The 12th gen CPUs were on the original Intel 7 process. So far the problem seems to mainly be with the higher clocked CPUs (i9s but some i7s as well),
I believe the Sapphire Rapids Xeons share the same process node as Alder Lake (12th gen) and Emerald Rapids Xeons share the same node as Raptor Lake (13th and 14th gen).

Xeons don't seem to be effected as they run lower max clocks.
Clock speed is probably not the real issue. The issue is probably voltage, which correlates with clockspeed within the same CPU architecture. Because the i7 and i9 K-series run at the highest clockspeeds, among the Raptor Lake CPUs, they're also pushing the most voltage among those CPUs and experiencing most of the problems. I don't know what sorts of voltages various Emerald Rapids CPUs are pushing.

Emerald Rapids launched at the end of last year, before these issues gained publicity.
 
That is also some userbenchmark kind of crap. The 13900k and 14900k are vastly faster and more efficient than the 12900k. They are not repackaged, just stop.
except they are?

the 13000 lineup was a very minor adjustment to the 12000... they basically made some tweeks to the memory controller and with higher clocks.

the 14000 lineup was a direct copy of the 13000 lineup, just overclocked even more.

If you set all 3 chips to the same speed (5ghz) you'll see the IPC is identical meaning whatever changes made between the generations was minor/insignificant.
 
except they are?

the 13000 lineup was a very minor adjustment to the 12000... they basically made some tweeks to the memory controller and with higher clocks.

the 14000 lineup was a direct copy of the 13000 lineup, just overclocked even more.

If you set all 3 chips to the same speed (5ghz) you'll see the IPC is identical meaning whatever changes made between the generations was minor/insignificant.
All 3 of them use the same p (golden cove) & e (gracemont) cores. 13th gen added clock speed and e cores over 12th gen. 14th gen just increased clock speed over 13th gen. You are correct that in ST applications they have equal performance at the same clock so IPC is the same. However, there were uArc tweaks to 13th gen that allowed for higher clocks and other minor changes. Overall the uArc changes between the generations wasn't enough to make them a new core.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
except they are?

the 13000 lineup was a very minor adjustment to the 12000... they basically made some tweeks to the memory controller and with higher clocks.

the 14000 lineup was a direct copy of the 13000 lineup, just overclocked even more.

If you set all 3 chips to the same speed (5ghz) you'll see the IPC is identical meaning whatever changes made between the generations was minor/insignificant.
First of all, 13th gen has extra cache, lower latency on the cache, decoupled cache frequency from ecores, and also higher clocks at the same voltage. Why would you run all 3 chips at the same speed? That's the whole point, that you CANT run the 12900k at 5.5ghz that the 13900k can achieve.
 
Clock speed is probably not the real issue. The issue is probably voltage, which correlates with clockspeed within the same CPU architecture. Because the i7 and i9 K-series run at the highest clockspeeds, among the Raptor Lake CPUs, they're also pushing the most voltage among those CPUs and experiencing most of the problems. I don't know what sorts of voltages various Emerald Rapids CPUs are pushing.

Emerald Rapids launched at the end of last year, before these issues gained publicity.
The highest clocked EMR SKU has a 4.2ghz boost on an 8 core part so the voltage required for that, even if all 8 cores are running 4.2, is relatively low. I imagine the VF curve on the Xeon enterprise parts is also probably much more finely tuned than the desktop parts.
 
  • Like
Reactions: KyaraM
All 3 of them use the same p (golden cove) & e (gracemont) cores. 13th gen added clock speed and e cores over 12th gen. 14th gen just increased clock speed over 13th gen. You are correct that in ST applications they have equal performance at the same clock so IPC is the same. However, there were uArc tweaks to 13th gen that allowed for higher clocks and other minor changes. Overall the uArc changes between the generations wasn't enough to make them a new core.
Except for that pesky little detail called Raptor Cove.
 
  • Like
Reactions: KyaraM and bit_user
Except for that pesky little detail called Raptor Cove.
They called it Raptor Cove lets be honest here it is just Golden Cove with a few minor uArc changes. If it were really that different you would see IPC increases beyond 2% or so. I remember seeing benchmarks of the 3 generations with them all set to the same clock speed and in ST applications they all had identical performance. The only reason for MT performance increase was due to more e-cores on the 13th and 14th gen. Therefore you can say that Raptor Cove doesn't even equal a Tick as there really wasn't any IPC increase.