News 13th and 14th Gen Intel CPU instability also hits servers — W680 boards with Core i9 K-series chips are crashing

D

Deleted member 2731765

Guest
The data from Level1Techs YT channel ALSO shows error logs at Oodle game telemetry data. Intel's 13th and 14th Gen CPUs represent a major portion of the error logs.

Intel accounted for 1,431 decompression errors (out of 1584 over 90 days), while AMD, only had four such errors, which is significantly lower than Intel.

Breakdown shows that more than 70% of Intel's CPUs were prone to errors compared to 30% of AMD.


EKJEqyg.png
 
Go to 17:40 in the video... "Closer look at configurations"
the 14900k has a base clock for p-cores of 3.2Ghz and the servers where running them at 5.3Ghz, well actually he says that the most stable settings was to set a max multiplier of 53, for 5.3Ghz , so they had them clocked even higher than 5.3...
just because people have servers doesn't make them any less prone to being dumb.
Excessive overclocking has never been stable, just because more mature nodes can take more abuse without blowing up right away doesn't mean that they can remain stable as well.

Also I don't remember the whole video but I don't think he mentions if they set the bios according to intel baseline/default?
 

PCWarrior

Distinguished
May 20, 2013
215
100
18,670
The data from Level1Techs YT channel ALSO shows error logs at Oodle game telemetry data. Intel's 13th and 14th Gen CPUs represent a major portion of the error logs.

Intel accounted for 1,431 decompression errors (out of 1584 over 90 days), while AMD, only had four such errors, which is significantly lower than Intel.

Breakdown shows that more than 70% of Intel's CPUs were prone to errors compared to 30% of AMD.


EKJEqyg.png
That’s not what he said or what this chart shows. Go to 5:07 of the video and rewatch. This chart simply says that from the cpus that reported have errors 70% are Intel and 30% are AMD. It definitely doesn’t say that 70% of Intel cpus and 30% of AMD cpus have problem or are prone to errors. And the distribution of 70%-30% of reported errors can also be attributed to market/user share as explicitly said just 8 seconds later.
 
or they know but can't publically say as it would lead to class action lawsuit for defective product.
Doubtful, AMD released a whole generation of CPUs with a fault thermal protection that was so faulty that it would outright burn out and cause the CPU to blow up like it's 1999, those where the days that AMD didn't have any thermal protection at all.
If that didn't lead to a class action then just having crashes when overclocking will definitely not lead to one, no matter what the cause turns out to be.
 
  • Like
Reactions: 35below0

TJ Hooker

Titan
Ambassador
The server-based motherboards used by these 13900K and 14900K servers are focused entirely on stability and running the chips within specifications, with no way to overclock these chips.
The W680 chipset does support overclocking. It may be odd to do so on a platform targeted at stable, professional use (and I would hope the motherboard vendors aren't juicing the settings by default like they were on their regular consumer boards), but there's nothing preventing it.

Edit: Unless motherboard OEMs are disabling OC on their W680 boards.
 
Last edited:
  • Like
Reactions: rluker5

Eximo

Titan
Ambassador
This is one of those use cases where lower core count and high clock speeds were likely the point from the beginning for the people that bought it. Not much sense in using a consumer CPU if you can't clock it like one.
 

TJ Hooker

Titan
Ambassador
Go to 17:40 in the video... "Closer look at configurations"
the 14900k has a base clock for p-cores of 3.2Ghz and the servers where running them at 5.3Ghz, well actually he says that the most stable settings was to set a max multiplier of 53, for 5.3Ghz , so they had them clocked even higher than 5.3...
just because people have servers doesn't make them any less prone to being dumb.
Excessive overclocking has never been stable, just because more mature nodes can take more abuse without blowing up right away doesn't mean that they can remain stable as well.

Also I don't remember the whole video but I don't think he mentions if they set the bios according to intel baseline/default?
14900K is specified to boost to 5.6 GHz (up to 6 GHz, on select cores if temp is kept in check). Are you arguing that Intel's own Turbo Boost technology, running within specified boost limits, constitutes "excessive overclocking"?
 
Asus W680 boards definitely run maximum TDP and have a decent VRM setup for a client workstation board. The primary difference over desktop being that no sort of multicore enhancement type thing exists nor do unlimited power profiles. W680 is also locked down similarly to the B series chipsets so while it has overclocking options they're not like the Z series (though they're the same chip).

Supermicro lists their own TDPs on their workstation boards and have very limited VRM so I imagine those TDP listings are maximum operation.

Given that these issues have cropped up on Supermicro boards the only thing that would really make sense immediately is VF curve on the chips themselves. If it was something endemic with the die/architecture it wouldn't be predominantly 13900K/14900K+ since every RPL SKU using RC die is the same B0 stepping.
 
In what ways is W680 locked down compared to Z series? Everything I can find indicates the chipset fully supports OC.
Intel's specs have the exact same limitation as the B series of just memory overclocking. It's the same physical chipset as Z690 so motherboard manufacturers can circumvent that and enable everything if they want to. I just assume Intel just doesn't care about that as tends to be their attitude towards motherboard manufacturers.
 
  • Like
Reactions: TJ Hooker

bit_user

Titan
Ambassador
Go to 17:40 in the video... "Closer look at configurations"
the 14900k has a base clock for p-cores of 3.2Ghz and the servers where running them at 5.3Ghz, well actually he says that the most stable settings was to set a max multiplier of 53, for 5.3Ghz , so they had them clocked even higher than 5.3...
I knew you were going to spin this, I just wasn't sure how.

As I'm sure you know, the "Base clock" is defined as the guaranteed minimum steady state, all-core clock speed for a heavy real-world workload (at PL1=125W). Typically, all-core workloads will run much faster than that, since most workloads aren't nearly as stressful as what they use to estimate base clocks.

just because people have servers doesn't make them any less prone to being dumb.
Excessive overclocking has never been stable,
What you quoted them doing is limiting the maximum all-core clock speed. As for whether this constitutes overclocking depends on what the stock limit is. Now, I didn't have the easiest time determining the highest all-core clock a stock Raptor Lake would use, but it would seem to be 5.7 GHz, based on this:

"up to 5.7 GHz when any non-favored core is active up to when all cores are active."

Source: https://skatterbencher.com/2023/12/...0-mhz/#Intel_Core_i9-14900K_Stock_Performance

TechPowerUp experimentally observed the same, on P-core only workloads:

It's pretty disappointing that's all you could muster. It looks like you at least managed to fool @scottslayer .
 
Last edited:

bit_user

Titan
Ambassador
Doubtful, AMD released a whole generation of CPUs with a fault thermal protection that was so faulty that it would outright burn out
That only actually happened a handful of times, before they issued a BIOS fix. If Intel could trade these problems for AMD's, I'm sure they'd do it in a heartbeat.

just having crashes when overclocking will definitely not lead to one, no matter what the cause turns out to be.
The article is pretty clear that these systems are not being overclocked. In fact, that's basically the whole point of it!
 

bit_user

Titan
Ambassador
The W680 chipset does support overclocking. It may be odd to do so on a platform targeted at stable, professional use (and I would hope the motherboard vendors aren't juicing the settings by default like they were on their regular consumer boards), but there's nothing preventing it.

Edit: Unless motherboard OEMs are disabling OC on their W680 boards.
The article specifically cites ASUS and Supermicro motherboards. I believe ASUS indeed allows overclocking of K-series CPUs on at least some of their W680 boards, but Supermicro is another matter entirely!

Here's the user manual for Supermicro's X13 series motherboards. I defy you to find any evidence they support overclocking!
 

bit_user

Titan
Ambassador
This is one of those use cases where lower core count and high clock speeds were likely the point from the beginning for the people that bought it. Not much sense in using a consumer CPU if you can't clock it like one.
Not until Q4 of last year did Intel release an E-series Xeon for the LGA1700 socket. In fact, their unprecedented decision to allow ECC memory on consumer-oriented Alder Lake & Raptor Lake models telegraphed that they wouldn't.


Now, what I want to know is whether even the Xeon E-24xx models are experiencing these problems, given that they show every indication of being based on the Raptor Lake stepping B0 dies (with E-cores disabled).
 
It could also be that modified process node that's the culprit behind these apparent cases of accelerated silicon aging.
This doesn't really seem likely as the problems aren't across the stack. If it's silicon related it would have to be something specific that the high end SKUs do triggering it.
Now, what I want to know is whether even the Xeon E-24xx models are experiencing these problems, given that they show every indication of being based on the Raptor Lake stepping B0 dies (with E-cores disabled).
I wonder how much uptake there has been with these since they don't work in W680 boards and the boards they do work in don't support other LGA 1700 SKUs.
The article specifically cites ASUS and Supermicro motherboards. I believe ASUS indeed allows overclocking of K-series CPUs on at least some of their W680 boards, but Supermicro is another matter entirely!
I'd be surprised if any other W680 boards support overclocking since it's not part of Intel's spec. All of the non-Asus boards I'm aware of are enterprise so I wouldn't expect them to buck Intel at all.
 

TheHerald

Prominent
Feb 15, 2024
646
186
560
Go to 17:40 in the video... "Closer look at configurations"
the 14900k has a base clock for p-cores of 3.2Ghz and the servers where running them at 5.3Ghz, well actually he says that the most stable settings was to set a max multiplier of 53, for 5.3Ghz , so they had them clocked even higher than 5.3...
just because people have servers doesn't make them any less prone to being dumb.
Excessive overclocking has never been stable, just because more mature nodes can take more abuse without blowing up right away doesn't mean that they can remain stable as well.

Also I don't remember the whole video but I don't think he mentions if they set the bios according to intel baseline/default?
Oh come on now, let's not stoup to userbenchmark levels. The base clocks are for 125w operation at something incredibly heavy like ycruncher or prime. Realistically you'll never see those clocks in normal - even server - workloads. If a 14900k keeps crashing even at 5.3 ghz it's dud or there is an underlying hardware problem.

Again, although I haven't experienced any issues on either 13 or 14900k the data cannot lie. This is serious
 

TheHerald

Prominent
Feb 15, 2024
646
186
560
I knew you were going to spin this, I just wasn't sure how.

As I'm sure you know, the "Base clock" is defined as the guaranteed minimum steady state, all-core clock speed for a heavy real-world workload (at PL1=125W). Typically, all-core workloads will run much faster than that, since most workloads aren't nearly as stressful as what they use to estimate base clocks.


What you quoted them doing is limiting the maximum all-core clock speed. As for whether this constitutes overclocking depends on what the stock limit is. Now, I didn't have the easiest time determining the highest all-core clock a stock Raptor Lake would use, but it would seem to be 5.7 GHz, based on this:
"up to 5.7 GHz when any non-favored core is active up to when all cores are active."​

TechPowerUp experimentally observed the same, on P-core only workloads:
boost-clock-analysis.png

It's pretty disappointing that's all you could muster. It looks like you at least managed to fool @scottslayer .
Yeap, the maximum all core boost is 5.7, even though Intel's site says 5.6. I have no idea how and why
 
Oh come on now, let's not stoup to userbenchmark levels. The base clocks are for 125w operation at something incredibly heavy like ycruncher or prime.
Guess what they are running in the video at the point I'm talking about?!
For 24 hour periods even.
14900K is specified to boost to 5.6 GHz (up to 6 GHz, on select cores if temp is kept in check). Are you arguing that Intel's own Turbo Boost technology, running within specified boost limits, constitutes "excessive overclocking"?
Intel's own Turbo Boost technology, running within specified boost limits will boost according to how heavy the workload is and will only boost to 5.6 on light workloads while the heaviest of workloads will run at close to base clocks.
What you quoted them doing is limiting the maximum all-core clock speed.
LOL, talking about trying to desperately spin something...
Setting a specific multiplier for a specific number of cores forces the cores to run at that multiplier all the time, it's all core overclocking, you bypass all of intel's boost features and self regulation and force the CPU to run badly.
 

bit_user

Titan
Ambassador
Setting a specific multiplier for a specific number of cores forces the cores to run at that multiplier all the time, it's all core overclocking, you bypass all of intel's boost features and self regulation and force the CPU to run badly.
In your own words, you said:

"actually he says that the most stable settings was to set a max multiplier of 53, for 5.3Ghz"

If it's the max multiplier, then it's not a fixed multiplier, as you're now claiming.

And no, I don't believe for 1 second that it will override the PL settings!
 
Last edited:

TheHerald

Prominent
Feb 15, 2024
646
186
560
Guess what they are running in the video at the point I'm talking about?!
For 24 hour periods even.
In the video he said 50% of Intel chips used in servers crash once every week running 24/7, or something like that, right? If the mobos are running 253w then this shouldn't be happening, not even once a month. If they are running with no power limits then sure, that's a different thing.