News Intel finds root cause of CPU crashing and instability errors, prepares new and final microcode update

Page 7 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

bit_user

Titan
Ambassador
If such a workload even did exist, and if that's why their processors failed (PC World had a mobo which turned out to be killing RPL CPUs for example) and if they were even telling the truth in the first place.
It wasn't just one mobo. They said it was a mix between Supermicro and (I think) ASUS boards, suggesting multiple examples of each.

These are the folks who claimed laptops were failing in the same manner. Have you heard any other reports of laptops failing? I haven't, and Intel has certainly denied it at every turn.
I'm trying to find the article where they mentioned it failing on laptops, because I'd like to know precisely what they said. It could be that they only had a couple instances where a laptop failed and maybe that was caused by something else? It's also not impossible that what Intel said is wrong.

However, the data they presented on their servers was quite comprehensive and demonstrable. I would not discount that, just because an off-hand remark about laptops also failing might have been incorrect.

It seems like you look for any excuse you can find to discount any data you don't like.

It is completely unbelievable that a single developer has some sort of workload that can kill a part that nobody else has.
I don't believe nobody else has, but it's probably not very common to use Raptor Lake desktop dies in server workloads like that. Maybe nobody else just dared to go public, like they did, or got picked up by an outlet like L1Techs.

Unless you think you're omniscient, how could you claim that nobody else has? There are undoubtedly people who would just quietly switch CPUs, after suffering a second such failure.

Especially when there are other examples of workloads which were triggering failures but weren't 100%.
I don't know what you mean by "triggering", here. Do you mean a workload on which a malfunction is detected, or do you mean a workload which actually causes the degradation?

The problem with most cases we've heard about is that they're just normal users who are doing fairly normal things with their PCs. That makes it very non-deterministic, and hard to know whether variation is due more to varying usage patterns or varying silicon.
 

YSCCC

Notable
Dec 10, 2022
445
341
1,060
If such a workload even did exist, and if that's why their processors failed (PC World had a mobo which turned out to be killing RPL CPUs for example) and if they were even telling the truth in the first place. These are the folks who claimed laptops were failing in the same manner. Have you heard any other reports of laptops failing? I haven't, and Intel has certainly denied it at every turn.

It is completely unbelievable that a single developer has some sort of workload that can kill a part that nobody else has. Especially when there are other examples of workloads which were triggering failures but weren't 100%.
I do think it’s about the failure is detectable in other scenarios. I have read that Wendell in one I forgot with whom podcast or YouTube discussion, that his 50% failure rate using non xmp memory and industrial mobo was using a special tailor made test suite for extreme stability for server workloads he/his clients will use, and he gustimate that for casual users gaming or so, most of those showing 1-2 errors in 24 hrs stress test might no appear during daily use of say, 6 hours, or only appear as a minor stutter in a split second. That being as minor an error doesn’t mean it isn’t degrading/degraded, just not significant enough for casual users to detect.

And that’s why I would say those who can’t bear and come out calling Intel are the industrial users mostly, as that really cost them productivity.

As per running out of spares, yes it could be they stocks less, but since these are still current and not like the Intel 7 nodes are at full capacity for others, running out of stock so soon really at least means the failure rate is much higher than they’ve predicted or prepared. That alone is a big issue, it doesn’t matter as per how many out there are just thrown away or replaced with AMD rig, or that 90% still works fine as they were used as YouTube only rigs….

I had two friends who owns a 13700K and KF respectively since Rpl release, they have one recently died and just say F it, time for a reason to tell wife to upgrade and got a 7800x rig. It’s not like everyone will go for a RMA.. so no numbers will really reflect the % of issues. One use case with strict failure sensitivity will have much higher or even 100% failure within 3 months, other use case with exact same parts could still be working for 6 months
 
  • Like
Reactions: bit_user

YSCCC

Notable
Dec 10, 2022
445
341
1,060
I know the point you're making and agree, but the 7950X (and obviously 9950X) is the only 16 core desktop part from AMD so it in eco mode would be the only logical comparison anyone could make!
Just to add in one more general thought, as I am not fan based determination, I would say I am not sure whether they do it on purpose or some technical reason, like one or two Compute die or so in Zen, they tend to have certain segment being a no brainer to choose, say 7800X3D for gaming, some are “pick your preferred brand” and some being a no brainer avoid. Nice thing of the like of TH is that at stock setting and not some weird artificially limiting scenarios we have the data on relevant performance to choose from between different options and if nothing like the degradation issue pops up, we know what should work reliably and safely for say, 3-5 years.

I chose to in socket upgrade from 12700KF to 14900k because MY have basically doubled, and Adobe have intel optimised well, so I got that for my photoshop usage and believed that it will be great enough for the gears I will be using in 5 years and “go F the power usage of 253W max”

It is usually kinda stupid and rare to have a whole generation to be complete failure at all price and use case performance standpoint
 

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
Okay, but then it's not a straight comparison. It'd be a comparison of those CPUs with a modified configuration vs. the i9-14900T.

The configuration is an intrinsic part of how it behaves. In some cases, it's as important as which model number you're using.
No it really isn't. The configuration is one of the only if not the only thing the end user can change. You can't change the number of cores, the cache, the IO die, the nm, the only thing you can change is the power limit. From within windows none the less.

Someone that wants a 14900t will obviously care about how the 7950x performs when limited to the same power. Why the heck wouldn't he? It doesn't even make sense to suggest otherwise. Who in their right mind would buy an inferior product just because the superior product has a higher out of the box tdp than the one he wants? It really sounds silly.
 

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
Just to add in one more general thought, as I am not fan based determination, I would say I am not sure whether they do it on purpose or some technical reason, like one or two Compute die or so in Zen, they tend to have certain segment being a no brainer to choose, say 7800X3D for gaming, some are “pick your preferred brand” and some being a no brainer avoid. Nice thing of the like of TH is that at stock setting and not some weird artificially limiting scenarios we have the data on relevant performance to choose from between different options and if nothing like the degradation issue pops up, we know what should work reliably and safely for say, 3-5 years.

I chose to in socket upgrade from 12700KF to 14900k because MY have basically doubled, and Adobe have intel optimised well, so I got that for my photoshop usage and believed that it will be great enough for the gears I will be using in 5 years and “go F the power usage of 253W max”

It is usually kinda stupid and rare to have a whole generation to be complete failure at all price and use case performance standpoint
Has your cpu failed?
 

bit_user

Titan
Ambassador
No it really isn't. The configuration is one of the only if not the only thing the end user can change.
It doesn't matter that they can change it, the test results for a given configuration are specific to that configuration. You can't pair efficiency data from one configuration with performance data from another, just because they both came from the same CPU. The configuration is as important a distinguishing factor as the model number itself.

As I've said before, TechPowerUp gets this. They usually test 3 different configurations and provide full data on each.

What they sadly don't do is provide comparisons against other CPUs in any but the stock configuration, which is unrealistic for the type of user who would use one of the other configs. An overclocker is going to want to compare overclocked performance of all CPUs, not one overclocked CPU vs. another at stock!

You can't change the number of cores, the cache, the IO die,
You can actually do things like disable the number of E-cores, disable SMT, and even take individual cores offline. As for the I/O die, you can change different DRAM and PCIe settings, depending on the board and its BIOS.
 

YSCCC

Notable
Dec 10, 2022
445
341
1,060
Has your cpu failed?
I under volted it within an hour and hard capped it to run cooler in my 30C environment. If that fails in less than a year, Intel will have 100% failure rate by now. So I have 1 friend who didn’t under volt and have failed in 1.5 years, thats a bad enough experience.

And does my own failed already be an issue? I already have to dail in a ton of settings, use a contact frame to not bend it like the 12700kf where temp behaviour worsened in 1 year time, and since July, updated bios 3 times and re dial in all tunings and follow the latest guidelines is bad enough for me. I should’ve been going to squeeze the extra performance, not tweaking and praying it to survive
 
  • Like
Reactions: bit_user
There isn't even a performance incentive, the difference from 5.7 to 5.6 ghz is 1%.

It's pretty obvious that a significant Intel incentive lately has been chasing headline GHz figures. The only reason the 14900KS existed was to say "Look!!! 6.2 GHz!!!"

Claims that Intel wouldn't have done XYZ just to gain a few % or MHz would charitably be described as shaky at best.

Everyone does this? Have you seen a presentation from competing companies? They have their clockspeeds front back and center. I can provide you with some links if you want
Exactly. So don't claim that there wasn't an incentive.

Facts are, lots of people from private consumers to commercial customers who had experience running K series processors perfectly well for years suddenly ran into loads of stability problems with 13th and 14th gen. Despite trying to initially point the finger at motherboard manufacturers Intel found three different mistakes in their microcode that were causing this and are replacing processors that exhibit this instability under warranty on the basis that they've been physically damaged. So even Intel aren't subscribing to a it's the users/nothing to see here stance.
 
  • Like
Reactions: Saldas and bit_user

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
It doesn't matter that they can change it, the test results for a given configuration are specific to that configuration. You can't pair efficiency data from one configuration with performance data from another, just because they both came from the same CPU. The configuration is as important a distinguishing factor as the model number itself.

As I've said before, TechPowerUp gets this. They usually test 3 different configurations and provide full data on each.

What they sadly don't do is provide comparisons against other CPUs in any but the stock configuration, which is unrealistic for the type of user who would use one of the other configs. An overclocker is going to want to compare overclocked performance of all CPUs, not one overclocked CPU vs. another at stock!


You can actually do things like disable the number of E-cores, disable SMT, and even take individual cores offline. As for the I/O die, you can change different DRAM and PCIe settings, depending on the board and its BIOS.
Disabling cores negatively affects your performance and makes no sense since you are literally just downgrading your cpu to a lower end model for no reason. It's not the same as power limiting where you are getting efficiency increases.
 

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
I under volted it within an hour and hard capped it to run cooler in my 30C environment. If that fails in less than a year, Intel will have 100% failure rate by now. So I have 1 friend who didn’t under volt and have failed in 1.5 years, thats a bad enough experience.

And does my own failed already be an issue? I already have to dail in a ton of settings, use a contact frame to not bend it like the 12700kf where temp behaviour worsened in 1 year time, and since July, updated bios 3 times and re dial in all tunings and follow the latest guidelines is bad enough for me. I should’ve been going to squeeze the extra performance, not tweaking and praying it to survive
Yeah, right. I've used 3 different mobos with 4 different cpus with no frame. No change in temperatures. In fact my 12900k has been on its socket without a frame for full 3 years now. Still nothing. But your 12700 bent within a year. Okay.
 

bit_user

Titan
Ambassador
Disabling cores negatively affects your performance and makes no sense since you are literally just downgrading your cpu to a lower end model for no reason. It's not the same as power limiting where you are getting efficiency increases.
Lots of people disable E-cores for gaming. There's a half-way solution you can take of disabling all but one E-core per cluster, so that each of the remaining E-cores has all of the L2 cache and ring bandwidth to itself. In some games, that has been shown to provide better performance than simply disabling all of the E-cores.

As for disabling P-cores, I generally agree that there's not usually a good reason for it. If it ever did help, it'd probably be due to a weird software quirk/bug.
 
  • Like
Reactions: Saldas

YSCCC

Notable
Dec 10, 2022
445
341
1,060
Yeah, right. I've used 3 different mobos with 4 different cpus with no frame. No change in temperatures. In fact my 12900k has been on its socket without a frame for full 3 years now. Still nothing. But your 12700 bent within a year. Okay.
It’s not night and day, but if I open hwinfo for checking there’s always a p core start thermal throttling for a second during load, and after changing to 14900k, the flat ruler with light show the bending, plus the paste pattern showing a pool in the center is showing me the effect. your cpu have no bending for whatever reason or you just didn’t check is another issue which I don’t care.
 
  • Like
Reactions: Saldas and bit_user

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
Lots of people disable E-cores for gaming. There's a half-way solution you can take of disabling all but one E-core per cluster, so that each of the remaining E-cores has all of the L2 cache and ring bandwidth to itself. In some games, that has been shown to provide better performance than simply disabling all of the E-cores.

As for disabling P-cores, I generally agree that there's not usually a good reason for it. If it ever did help, it'd probably be due to a weird software quirk/bug.
Nah, disabling them is bad. If some games don't work well with them (the new Warhammer game is a prime example) you just use something like capframe, with a single button you can move the while game to the pcores.
 

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
It’s not night and day, but if I open hwinfo for checking there’s always a p core start thermal throttling for a second during load, and after changing to 14900k, the flat ruler with light show the bending, plus the paste pattern showing a pool in the center is showing me the effect. your cpu have no bending for whatever reason or you just didn’t check is another issue which I don’t care.
The fact that the cpu is bent doesn't mean that the temperatures are worse. It entirely depends on your cooler, a lot of coolers are not flat but concave or convex in which case a bent ihs works better than a flat one.
 

YSCCC

Notable
Dec 10, 2022
445
341
1,060
The fact that the cpu is bent doesn't mean that the temperatures are worse. It entirely depends on your cooler, a lot of coolers are not flat but concave or convex in which case a bent ihs works better than a flat one.
Noctua literally have to make a special, high base curvature model for their latest cooler for lga1700, that is beyond normal curvature of IHS. But this is only a minor issue in real life, the whole point is how many issues the generation have
 
  • Like
Reactions: Saldas and bit_user

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
Noctua literally have to make a special, high base curvature model for their latest cooler for lga1700, that is beyond normal curvature of IHS. But this is only a minor issue in real life, the whole point is how many issues the generation have
No, Noctua does not HAVE to make anything. I'm using a Noctua non special cooler and it works insanely well.

But, since you mentioned it, and you made your bias even more obvious, you realize Noctua, Arctic, and a bunch of other companies are making specific brackets and offset mounts for the AMD chips, right? Do you realize that the cooler you just mentioned also has a specific AMD version called LBC? Ah, must have slipped your mind. Keep bashing Intel, don't allow facts get in your way.

As Theoden wisely said "what can Intel do against such reckless hate?"


Noctua themselves are saying the standard version works Good on both LGA1700 and AM5, but sure, they HAD to make a special version cause why not just make stuff up to sh** on intel for 0 reason?

EDDFFA54-029-F-4204-812-D-42-A8957-D882-A.png
 
Last edited:

YSCCC

Notable
Dec 10, 2022
445
341
1,060
No, Noctua does not HAVE to make anything. I'm using a Noctua non special cooler and it works insanely well.

But, since you mentioned it, and you made your bias even more obvious, you realize Noctua, Arctic, and a bunch of other companies are making specific brackets and offset mounts for the AMD chips, right? Do you realize that the cooler you just mentioned also has a specific AMD version called LBC? Ah, must have slipped your mind. Keep bashing Intel, don't allow facts get in your way.

As Theoden wisely said "what can Intel do against such reckless hate?"


Noctua themselves are saying the standard version works Good on both LGA1700 and AM5, but sure, they HAD to make a special version cause why not just make stuff up to sh** on intel for 0 reason?

EDDFFA54-029-F-4204-812-D-42-A8957-D882-A.png
Of course I am aware there is a LBC version, which means the IHS isn’t deforming as the previous generations used to be, which, no/ low deformation IMO isn’t a defect as the curvature is to complement the original bending issue, meanwhile the offset mounting is kind of an issue as the heat centre is offset and the traditional coolers are concentrated in the center. But that’s even more minor and the AM5 real issue is the unnecessarily thick IHS which hinders cooling a bit. But that is off topic of Intel, there are literally cases where without Liquid Metal as TIM, ppl tested that even open loop crazy cooling solutions will have the center portion hitting thermal throttle much easier, and that is kind of a design issue as the extra thick IHS of the AM5, that isn’t a deal breaker alone, but shows how much minor stuffs have impacted the platform throughout, which is unheard of in Intel history. As such, it can’t be argued that the 13 & 14th gen is a fine generation.
 
  • Like
Reactions: Saldas and bit_user

bit_user

Titan
Ambassador
But, since you mentioned it, and you made your bias even more obvious, you realize Noctua, Arctic, and a bunch of other companies are making specific brackets and offset mounts for the AMD chips, right? Do you realize that the cooler you just mentioned also has a specific AMD version called LBC? Ah, must have slipped your mind. Keep bashing Intel, don't allow facts get in your way.

As Theoden wisely said "what can Intel do against such reckless hate?"
LOL, methinks the lady doth protest too much.

Noctua themselves are saying the standard version works Good on both LGA1700 and AM5, but sure, they HAD to make a special version cause why not just make stuff up to sh** on intel for 0 reason?

EDDFFA54-029-F-4204-812-D-42-A8957-D882-A.png
Apparently, you don't understand the difference between "Good" and "Excellent". If you want merely "good" cooling, you wouldn't spend NH-D15 money on your CPU cooler.

FWIW, here's what they actually say about it:

"the NH-D15 G2 is available in a regular, standard version and two specialised variants: The regular NH-D15 G2 uses the same medium base convexity as most other Noctua heatsinks, which makes it a perfect all-rounder that provides optimal results on AM5 with the included offset mounting and on LGA1700 CPUs when utilising the included NM-ISW1 shim washers (or optional, so-called contact frames ) to reduce CPU deformation from ILM pressure. The HBC (High Base Convexity) variant is specifically optimised for LGA1700 processors that are used with full ILM pressure or have become permanently deformed in long-term use, providing excellent contact quality despite the CPU’s concave shape."

Source: https://noctua.at/en/noctua-release...gship-model-cpu-cooler-and-nf-a14x25r-g2-fans
 
  • Like
Reactions: Saldas and YSCCC
I'm trying to find the article where they mentioned it failing on laptops, because I'd like to know precisely what they said.
https://www.reddit.com/r/hardware/comments/1e13ipy/comment/lcyythb/

https://wccftech.com/intel-says-14t...-by-same-instability-issues-as-desktop-chips/

easy to find, and absolutely not an offhand remark

However, the data they presented on their servers was quite comprehensive and demonstrable.
What data did they release? I don't recall seeing any, but may have easily missed it.
 
Last edited:

TheHerald

Notable
Feb 15, 2024
1,289
355
1,060
Apparently, you don't understand the difference between "Good" and "Excellent". If you want merely "good" cooling, you wouldn't spend NH-D15 money on your CPU cooler.
Apparently neither do you cause you only get "good": on AM5 as well. Again, showing your bias as per the usual and ignoring it.
 

bit_user

Titan
Ambassador
Apparently neither do you cause you only get "good": on AM5 as well.
Yes, I obviously saw that. The point is they had to make special accommodations for both LGA1700 and AM5.

What I find funny about this is how a thinner AM5 IHS, like @thestryker wants, would create an even worse hot spot and a correspondingly greater need for an offset bracket. I wish we could see AM5 IHS-thinning data for both water block and conventional heatpipe-based heatsink, because as much as thinning helps water blocks, I could imagine it might produce worse results on at least some heatpipe-based coolers.

Again, showing your bias as per the usual and ignoring it.
My bias? You probably think everyone is biased who doesn't see the world exactly the same as you. Maybe you're the one who's biased?

Oh, and get ready for when Arrow Lake has a major hot spot issue from its CPU cores being all scrunched up together in a corner. If they're reusing other chiplets from Meteor Lake, here's what we should expect the layout to look like:

Rq2cx4YoVcvmZ36myFYuTd.jpg


I'll be sure to remind you of the aspersions you cast upon AM5, when this comes to light and people start making special bracket for Arrow Lake!
 
Last edited:

bit_user

Titan
Ambassador
So, your conjecture is that he's just an Intel-hater intent on spreading lies to damage them? Is there not a possibility in your mind that what he's saying is factually accurate? What if these were gaming laptops, tuned and not even running stock Intel settings?

What data did they release? I don't recall seeing any, but may have easily missed it.
I think it came out in L1Techs reporting on them exactly what they had tried and which configuration they found provided the best stability.
 
  • Like
Reactions: Saldas

bit_user

Titan
Ambassador
I'll be sure to remind you of the aspersions you cast upon AM5, when this comes to light and people start making special bracket for Arrow Lake!
Wow! What a coincidence? I didn't expect this statement to be proven right so soon!!

"Arrow Lake's new LGA 1851 form factor has shifted the CPU's temperature hot spot to a different position than it was previously. Der8auer reports on the Overclock forums, that the hotspot has shifted north compared to LGA 1700 Alder Lake and Raptor Lake CPUs, requiring new CPU cooler designs for ultra high-performance waterblocks and coolers to extract the maximum amount of heat from Intel's new Arrow Lake chips.

The hotspot on 1851 is quite a bit further north than it was on 1700. This means for ideal cooling a shift of the cooling center is required to fight the hotspot. It also means that rotating the block 180° would harm the performance."

https://www.tomshardware.com/pc-com...paring-water-blocks-for-core-ultra-200-series

Let's see you spin your way out of this, @TheHerald !
 
I think it came out in L1Techs reporting on them exactly what they had tried and which configuration they found provided the best stability.
I don't recall Wendell ever identifying his sources of information. Feel free to find out if you want to support your stance.
So, your conjecture is that he's just an Intel-hater intent on spreading lies to damage them? Is there not a possibility in your mind that what he's saying is factually accurate?
I'm suggesting he's an angry developer who saw an opportunity and ran with it. I have no doubt they saw some failures and that it created a negative experience for them. I also have no doubt some of them likely were due to Intel's CPU problem. When their claimed experience doesn't line up with any of the other reporting on the situation that makes them the outlier who isn't to be taken at face value.
What if these were gaming laptops, tuned and not even running stock Intel settings?
If they ran them outside of specifications in some fashion that ended up killing them that isn't really on Intel now is it?
 
What I find funny about this is how a thinner AM5 IHS, like @thestryker wants, would create an even worse hot spot and a correspondingly greater need for an offset bracket. I wish we could see AM5 IHS-thinning data for both water block and conventional heatpipe-based heatsink, because as much as thinning helps water blocks, I could imagine it might produce worse results on at least some heatpipe-based coolers.
I'm not sure thinning would make it worse, but there have been heatpipe coolers that flat out perform worse due to hotspot location already. I wish I could remember which cooler they were testing so I could link to it but Hardware Canucks had one which performed much better proportionally on Intel than AMD. I want to say it was one of the ones with larger heatpipes that were slightly more spread out (something like the Frost Commander as opposed to the Phantom Spirit).
Wow! What a coincidence? I didn't expect this statement to be proven right so soon!!
This was always going to be the case as soon as they showed the Compute tiles were going to be at an edge right? The only question is where the hotspot would be.

The notable thing is that Intel chose north while AMD does south, but maybe this means vapor chamber air coolers will become more common (or everyone's going to get into the offsets game).