News 13th and 14th Gen Intel CPU instability also hits servers — W680 boards with Core i9 K-series chips are crashing

TheHerald · Jul 12, 2024

TerryLaze said:
Intel's own Turbo Boost technology, running within specified boost limits will boost according to how heavy the workload is and will only boost to 5.6 on light workloads while the heaviest of workloads will run at close to base clocks.

That is not true though. The CPU will try to boost to 5.7 even in the heaviest of workloads. Of course power limit might get in the way depending on how you configured it.

TheHerald · Jul 12, 2024

bit_user said:
In your own words, you said:

"actually he says that the most stable settings was to set a max multiplier of 53, for 5.3Ghz"

If it's the max multiplier, then it's not a fixed multiplier, as you're now claiming.

And no, I don't believe for 1 second that it will override the PL settings!

In the video he doesn't specify what settings they used exactly. There are multiple ways to over or under clock Intel chips, changing the Turbo ratios, boosting specific cores, putting an all core limit etc. I assume they left the option at AUTO and just typed in a 5.3, which basically makes the CPU run at 5.3 under all scenarios, multi or single, again assuming power and amp limits don't get in the way.

bit_user · Jul 12, 2024

TheHerald said:
I assume they left the option at AUTO and just typed in a 5.3, which basically makes the CPU run at 5.3 under all scenarios, multi or single, again assuming power and amp limits don't get in the way.

Well, if it would normally run up to 5.7 GHz, except for power limits, then 5.3 GHz isn't an overclock. Right?? Terry is calling it an overclock.

TheHerald · Jul 12, 2024

bit_user said:
Well, if it would normally run up to 5.7 GHz, except for power limits, then 5.3 GHz isn't an overclock. Right?? Terry is calling it an overclock.

Not enough data to answer that. Technically, if you either exceed maximum clockspeeds (5.7 ghz), maximum TDP (253w) or maximum amps (307) it's an overclock. So running 5.3 is not an overclock as long as you hit these clocks within the above TDP and amperage.

Now for my personal opinion, the CPU shouldn't be freaking crashing at 5.3 ghz even if it exceeds TDP and amp limits. Especially since we've been exceeding tdp and amp limits for the last 3-4 generations of Intel chips and it has been fine.

thestryker · Jul 12, 2024

TheHerald said:
In the video he said 50% of Intel chips used in servers crash once every week running 24/7, or something like that, right? If the mobos are running 253w then this shouldn't be happening, not even once a month. If they are running with no power limits then sure, that's a different thing.

While the Asus boards can run 253W, and potentially more, the Supermicro ones can't and he said the failure rates were the same between the two. So even if somehow the Asus ones were over TDP that couldn't be the problem that's manifesting here.

TerryLaze · Jul 12, 2024

TheHerald said:
That is not true though. The CPU will try to boost to 5.7 even in the heaviest of workloads. Of course power limit might get in the way depending on how you configured it.

Yeah technically it will even try to boost to 6ghz on any and all workloads, my point was more like what it will be able to boost to not what it will try to boost to.

TerryLaze · Jul 12, 2024

bit_user said:
Well, if it would normally run up to 5.7 GHz, except for power limits, then 5.3 GHz isn't an overclock. Right?? Terry is calling it an overclock.

5.8 actually, and that's the turbo 3 clock

Since you decided to pretend to not know anything about computers all of the sudden go ahead and read this.
"For workloads that predominantly rely on CPU, short duration workloads are bound by the ‘turbo’ operations, whereas sustained workloads are bound by the ‘base’ parameters."
https://www.intel.com/content/www/us/en/support/articles/000098324/processors.html

Which processor specification is more important for performance, ‘base’ or ‘turbo’?
All aspects of the processor are important for performance.

Intel has multiple compute engines within the processor, with the CPU being analogous to the traditional processor. For workloads that predominantly rely on CPU, short duration workloads are bound by the ‘turbo’ operations, whereas sustained workloads are bound by the ‘base’ parameters. Many workloads tend to be a mixture of some ‘turbo’ and some ‘base’ operations. Intel processors provide compelling end user experiences that utilize various aspects of the processor. Performance hybrid architecture, multiple compute engines, multiple IO ports, and IPs intelligently share resources within a single chip while operating across a wide dynamic operating range of power and frequency. All aspects of the processor ultimately contribute to the desired spectrum of performance in the real-world.

bit_user · Jul 12, 2024

TerryLaze said:
Since you decided to pretend to not know anything about computers all of the sudden go ahead and read this.
"For workloads that predominantly rely on CPU, short duration workloads are bound by the ‘turbo’ operations, whereas sustained workloads are bound by the ‘base’ parameters."

It's funny you claim I'm acting like I don't know anything, because that article is aimed at people who are really clueless. It's oversimplifying to the point of being irrelevant to this discussion.

Troll harder, I guess. Because, so far, you're only making a fool of yourself.

tommo1982 · Jul 12, 2024

bit_user said:
Not only did Intel do architecture-level tweaking, as mentioned by @Eximo , but they also used a modified process node. That's the main way the were able to achieve higher boost clocks.

It could also be that modified process node that's the culprit behind these apparent cases of accelerated silicon aging.

I thought about process node as well. Perhaps Intel haven’t ironed out all the problems with it’s foundry and those high performance chips suffer from it. It’d be interesting, as you mentioned, to see Xeon error rates.

bit_user · Jul 12, 2024

tommo1982 said:
I thought about process node as well. Perhaps Intel haven’t ironed out all the problems with it’s foundry and those high performance chips suffer from it. It’d be interesting, as you mentioned, to see Xeon error rates.

Speaking of which, I wonder just what's the reach of this problem. Could it affect even Emerald Rapids?

DS426 · Jul 12, 2024

Come on folks, even if OC is supported on these boards, I really doubt anyone except a number that I can count on my hands are doing it. While these are consumer-grade CPU's and not server-grade like Xeons, if the application is server / professional use, who's going to bother squeeking out a few hundred extra MHz and risk instability, increase power consumption, etc. for a little more performance?

To me, this just shows that there's still a root-cause problem that Intel still hasn't figured out. Unfortunately for big blue, this nightmare is still continuing to get worse.

Blas · Jul 12, 2024

"Level1Techs highlights one server provider that is charging more than $1,000 more for its Core i9-14900K-based servers compared to its Ryzen 9 7950X-powered servers for labor and onsite repair services alone ($139 vs. $1,280 for the 7950X and 14900K)."

Who the hell compares a $139 price to a $1280 saying the latter is "$1000 more expensive" rather than saying the latter is "9 times more expensive"? Come on!

TJ Hooker · Jul 12, 2024

thestryker said:
Intel's specs have the exact same limitation as the B series of just memory overclocking. It's the same physical chipset as Z690 so motherboard manufacturers can circumvent that and enable everything if they want to. I just assume Intel just doesn't care about that as tends to be their attitude towards motherboard manufacturers.

Do you have a link to the specs where it says it only supports memory OC?

Also:
"While the last couple of W-series chipsets allowed users to overclock the memory on supported processors, Intel has opened this up to unlocking the core frequency when combined with its K-series processors on W680."

https://www.anandtech.com/show/17308/the-intel-w680-chipset-overview-ecc-for-alder-lake-workstations

tommo1982 · Jul 12, 2024

bit_user said:
Speaking of which, I wonder just what's the reach of this problem. Could it affect even Emerald Rapids?

I had to look up the code name. I’m so confused with Intel’s naming scheme, everytime I have to check what it actually is.
I checked the lithography for i9-13900 and i9-14900. The first is Intel 10 and second Intel 7. Perhaps forced to be competitive with AMD and their ZEN’s power/wat efficiency they pushed those CPU’s beyond what the node is capable at the moment? Intel 14 (and 12?, correct me if I’m wrong), was very mature and Intel really pushed those processors to the limit. Xeon might be not affected as much, due to lower clocks, but those run 24/7 and utilisation is high. It’s odd only i9 is reported, but not so much for Xeon’s.

rfdevil · Jul 12, 2024

tommo1982 said:
I had to look up the code name. I’m so confused with Intel’s naming scheme, everytime I have to check what it actually is.
I checked the lithography for i9-13900 and i9-14900. The first is Intel 10 and second Intel 7. Perhaps forced to be competitive with AMD and their ZEN’s power/wat efficiency they pushed those CPU’s beyond what the node is capable at the moment? Intel 14 (and 12?, correct me if I’m wrong), was very mature and Intel really pushed those processors to the limit. Xeon might be not affected as much, due to lower clocks, but those run 24/7 and utilisation is high. It’s odd only i9 is reported, but not so much for Xeon’s.

Both the 13900 and 14900 are on a tweaked version of Intel 7. The 12th gen CPUs were on the original Intel 7 process. So far the problem seems to mainly be with the higher clocked CPUs (i9s but some i7s as well), so Xeons don't seem to be effected as they run lower max clocks.

thestryker · Jul 12, 2024

TJ Hooker said:
Do you have a link to the specs where it says it only supports memory OC?

Also:
"While the last couple of W-series chipsets allowed users to overclock the memory on supported processors, Intel has opened this up to unlocking the core frequency when combined with its K-series processors on W680."

https://www.anandtech.com/show/17308/the-intel-w680-chipset-overview-ecc-for-alder-lake-workstations

https://www.intel.com/content/www/us/en/products/sku/218834/intel-w680-chipset/specifications.html

https://ark.intel.com/content/www/us/en/ark/products/218834/intel-w680-chipset.html

contrast with (they're the same physical chipset):
https://ark.intel.com/content/www/us/en/ark/products/218833/intel-z690-chipset.html

https://www.intel.com/content/www/us/en/products/sku/218833/intel-z690-chipset/specifications.html

edit: As I said before Intel clearly doesn't care about it being unlocked. It may also be setup that way on purpose so that the big enterprise customers can choose whether or not to allow overclocking.

bit_user · Jul 12, 2024

rfdevil said:
Both the 13900 and 14900 are on a tweaked version of Intel 7. The 12th gen CPUs were on the original Intel 7 process. So far the problem seems to mainly be with the higher clocked CPUs (i9s but some i7s as well),

I believe the Sapphire Rapids Xeons share the same process node as Alder Lake (12th gen) and Emerald Rapids Xeons share the same node as Raptor Lake (13th and 14th gen).

rfdevil said:
Xeons don't seem to be effected as they run lower max clocks.

Clock speed is probably not the real issue. The issue is probably voltage, which correlates with clockspeed within the same CPU architecture. Because the i7 and i9 K-series run at the highest clockspeeds, among the Raptor Lake CPUs, they're also pushing the most voltage among those CPUs and experiencing most of the problems. I don't know what sorts of voltages various Emerald Rapids CPUs are pushing.

Emerald Rapids launched at the end of last year, before these issues gained publicity.

ingtar33 · Jul 12, 2024

TheHerald said:
That is also some userbenchmark kind of crap. The 13900k and 14900k are vastly faster and more efficient than the 12900k. They are not repackaged, just stop.

except they are?

the 13000 lineup was a very minor adjustment to the 12000... they basically made some tweeks to the memory controller and with higher clocks.

the 14000 lineup was a direct copy of the 13000 lineup, just overclocked even more.

If you set all 3 chips to the same speed (5ghz) you'll see the IPC is identical meaning whatever changes made between the generations was minor/insignificant.

jeremyj_83 · Jul 12, 2024

ingtar33 said:
except they are?

the 13000 lineup was a very minor adjustment to the 12000... they basically made some tweeks to the memory controller and with higher clocks.

the 14000 lineup was a direct copy of the 13000 lineup, just overclocked even more.

If you set all 3 chips to the same speed (5ghz) you'll see the IPC is identical meaning whatever changes made between the generations was minor/insignificant.

All 3 of them use the same p (golden cove) & e (gracemont) cores. 13th gen added clock speed and e cores over 12th gen. 14th gen just increased clock speed over 13th gen. You are correct that in ST applications they have equal performance at the same clock so IPC is the same. However, there were uArc tweaks to 13th gen that allowed for higher clocks and other minor changes. Overall the uArc changes between the generations wasn't enough to make them a new core.

TheHerald · Jul 12, 2024

ingtar33 said:
except they are?

the 13000 lineup was a very minor adjustment to the 12000... they basically made some tweeks to the memory controller and with higher clocks.

the 14000 lineup was a direct copy of the 13000 lineup, just overclocked even more.

If you set all 3 chips to the same speed (5ghz) you'll see the IPC is identical meaning whatever changes made between the generations was minor/insignificant.

First of all, 13th gen has extra cache, lower latency on the cache, decoupled cache frequency from ecores, and also higher clocks at the same voltage. Why would you run all 3 chips at the same speed? That's the whole point, that you CANT run the 12900k at 5.5ghz that the 13900k can achieve.

jeremyj_83 · Jul 12, 2024

TheHerald said:
Why would you run all 3 chips at the same speed?

To see about IPC increases in gen over gen.

thestryker · Jul 12, 2024

bit_user said:
Clock speed is probably not the real issue. The issue is probably voltage, which correlates with clockspeed within the same CPU architecture. Because the i7 and i9 K-series run at the highest clockspeeds, among the Raptor Lake CPUs, they're also pushing the most voltage among those CPUs and experiencing most of the problems. I don't know what sorts of voltages various Emerald Rapids CPUs are pushing.

Emerald Rapids launched at the end of last year, before these issues gained publicity.

The highest clocked EMR SKU has a 4.2ghz boost on an 8 core part so the voltage required for that, even if all 8 cores are running 4.2, is relatively low. I imagine the VF curve on the Xeon enterprise parts is also probably much more finely tuned than the desktop parts.

thestryker · Jul 12, 2024

jeremyj_83 said:
All 3 of them use the same p (golden cove) & e (gracemont) cores. 13th gen added clock speed and e cores over 12th gen. 14th gen just increased clock speed over 13th gen. You are correct that in ST applications they have equal performance at the same clock so IPC is the same. However, there were uArc tweaks to 13th gen that allowed for higher clocks and other minor changes. Overall the uArc changes between the generations wasn't enough to make them a new core.

Except for that pesky little detail called Raptor Cove.

TheHerald · Jul 12, 2024

jeremyj_83 said:
To see about IPC increases in gen over gen.

And what is the goal of that?

jeremyj_83 · Jul 12, 2024

thestryker said:
Except for that pesky little detail called Raptor Cove.

They called it Raptor Cove lets be honest here it is just Golden Cove with a few minor uArc changes. If it were really that different you would see IPC increases beyond 2% or so. I remember seeing benchmarks of the 3 generations with them all set to the same clock speed and in ST applications they all had identical performance. The only reason for MT performance increase was due to more e-cores on the 13th and 14th gen. Therefore you can say that Raptor Cove doesn't even equal a Tick as there really wasn't any IPC increase.

News 13th and 14th Gen Intel CPU instability also hits servers — W680 boards with Core i9 K-series chips are crashing

Respectable

Respectable

Titan

Respectable

Judicious

Titan

Titan

Titan

Great

Titan

Commendable

Distinguished

Titan

Great

Judicious

Titan

Glorious

Glorious

Respectable

Glorious

Judicious

Judicious

Respectable

Glorious

Share this page