News Intel finds root cause of CPU crashing and instability errors, prepares new and final microcode update

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Keep moving those goalposts. No reason for us to continue this back and forth as clearly you've got your agenda and I've got mine and nothing will change.

going to delete my last response then this one in a bit.
Actually it is Intel moving the goal post, the whole 2 generation of totl cpus, should be able to work out of the box and with enough self protection so it don’t self destruct within warranty gets into trouble within a year.

And that the goal post moved from user overclocking to unlimited PL, then to eTVB bug, and then to the VID spike past 1.55v and now this latest fix. It’s really convincing that this will really be finally fixing it.

In the history of PC parts, I didn’t recall any major hardware product having this much issues announced
 
  • Like
Reactions: bit_user
Is Intel going to offer an app to test if the chip is defective? I have had a couple of BSOD's, but really can't afford to be without my PC while I return it to Intel and wait for them to test it and then finally ship a replacement or return the original, if it tests OK. If there is a user test available, could someone please share a link to it? Thanks.
I think they currently accept like Intel diagnostic tool or someone tested that install NVIDIA driver 10 times in a row tests, do you recall the bsod code? Or I would say disable XMP (if any) first to see will it be ram failing first
 
BLAH, BLAH, BLAH, (Kirk interrupts: "No Blah, Blah Blah"), Star Trek TOS. This whole thread is getting kinda Nitpicky and Annoying. I knew it would. There is nothing wrong with good debates. That is how we can peacefully sway opinions. Maybe I am the one who is overreacting by calling this thread "Nitpicky and Annoying"? Probably.

I like all the information. I like the opinions. I hope everyone keeps posting on this.
 
Intel says it has found the "root cause" of the problems with their latest desktop CPUs. It's too early to say whether this actually fixes everything, isn't it? Nobody knows for sure if this is the "final, root cause" because this is actually the third or fourth "final, root cause" of the architecture's problems...😉 So it remains to be seen whether this is the solution, or yet another debacle along the way. We shall see.
 
  • Like
Reactions: bit_user
"Intel also took time to assure its customers once again that its existing mobile processors as well as upcoming codenamed Lunar Lake and Arrow Lake processors are not affected by this issue."


So.....if that is the case why is Dell issuing the microcode updates to HX processors? Surely Dell knows the difference between mobile and desktop parts yes? Intels has been shifty as heck (I'd have harsher words but Tom's) through this whole affair so barring any evidence to the contrary (I simply do not believe Intel is being fully truthful at this point and I observed Voltage above 1.5V using HWiNFO pre update) I'd say mobile processors ARE affected, specifically those that are allowed to be overclocked aka the HX series.
 
That's funny. I bought 2R memory just for the performance benefits over 1R. Maybe not if you overclock, where 1R DIMMs tend to clock higher, but I don't.

I don't actually need 64 GiB. 16 would be enough for me, currently. If I'm buying for the next 5 years, I'd get 32 GiB. I never expect I'll need 64 GiB.
2R is faster on DDR4, but im not so sure about DDR5. It restricts the speeds you can run too much
 
And for god sake the real power draw of 14900k, running unlimited pl, would have a tough time going through 320w even with the default vid and frequencies beside benchmarking AVX stuffs, maxing all cores 5.7 and with default avx offset would sit around be around 300w
Do you actually have the chip? If you are running it unlimited it shoots to over 350w at default. Im not where it stops cause im getting thermally throttled, but someone with a big AIO or a custom cooler canprobably go further.
 
Do you actually have the chip? If you are running it unlimited it shoots to over 350w at default. Im not where it stops cause im getting thermally throttled, but someone with a big AIO or a custom cooler canprobably go further.
Of course I owns it, but I tried once in my voltage limited to 1.45v and unlimited power setting, all core 5.4 with Avx only takes around 300w, and that only happens in cinebench, for those who only gaming or so, with whatever pl setting, say at msfs, the core stays at all core 5.7 at just 90( final under volt with cap at 1.4v) -110w.

When one can hit 350W+ it’s also when the voltage by that crazy Intel VF curve hitting 1.5v+, so there it goes. I do suspect do you actually use the CPU as the day to day usage where it degrades are at low wattage but with the crazy VID spikes.
 
Of course I owns it, but I tried once in my voltage limited to 1.45v and unlimited power setting, all core 5.4 with Avx only takes around 300w, and that only happens in cinebench, for those who only gaming or so, with whatever pl setting, say at msfs, the core stays at all core 5.7 at just 90( final under volt with cap at 1.4v) -110w.

When one can hit 350W+ it’s also when the voltage by that crazy Intel VF curve hitting 1.5v+, so there it goes. I do suspect do you actually use the CPU as the day to day usage where it degrades are at low wattage but with the crazy VID spikes.
In your previous post you said that even with the default vid and frequencies it only hits 320w on AVX workloads, which is not true. Now you are saying you restricted it in both clocks and voltages. And no, it doesn't need to hit anywhere near 1.5v for it to draw 350w. Run CBR23 at stock, you will hit 350w+ at 1.3v or even lower.
 
In your previous post you said that even with the default vid and frequencies it only hits 320w on AVX workloads, which is not true. Now you are saying you restricted it in both clocks and voltages. And no, it doesn't need to hit anywhere near 1.5v for it to draw 350w. Run CBR23 at stock, you will hit 350w+ at 1.3v or even lower.
I said at default all core 57 non avx and 54x all core during abc which was the default of a lot of motherboard with unlimited power and stock voltages. That was how the original reviews are done, stock multiplier and unlimited voltage, my own sample hits 1.49v in initial run of R23, 41k points, hwinfo 64 showed the max TDP was 310W, no thermal throttling in the cold winter test night with cooler full blown.

I later tried undercoating after initial run, and it got max 260w.

Of course, if initially ppl trusted Intel for safe regulations and they use the Intel XTU thing to do ai over clock they will exceed 57x and with even higher voltages suggested by XTU, but that would be of minor proportion and cough, still something “Intel suggested”
 
There was a guy on the news a couple of years ago. He bought a Porsche Carrera GT, a car that is notorious hard to drive and that didn't have traction control. Ended up crashing and dying. Sure you could argue it's porches fault for not including traction control but if you don't know how to drive, don't buy a car like that?

Now apply that analogy to unlocked power unlimited chips that are trying to boost to as high clockspeeds as possible. If you don't know what you are doing, don't buy them. Buy the locked parts.
If you are talking about Paul Walker from The fast and the furious, let me tell that that the car was probably modified by Roger Rodas' garage, "Always Evolving", and it's clear that they were not up to snuff.

About the issue, I'm looking forward to buy a discount MSI GT77 with Intel® Core™ i9-13980HX. Is there any official statement about this kind of processors, or even from MSI itself, any BIOS update? They are pretty expensive machines.
 
  • Like
Reactions: bit_user
If you are talking about Paul Walker from The fast and the furious, let me tell that that the car was probably modified by Roger Rodas' garage, "Always Evolving", and it's clear that they were not up to snuff.

About the issue, I'm looking forward to buy a discount MSI GT77 with Intel® Core™ i9-13980HX. Is there any official statement about this kind of processors, or even from MSI itself, any BIOS update? They are pretty expensive machines.
Official statement is that mobile chips aren’t affected, but then from above it seems fell have gave a microcode update, so your own decision on whether to trust that and spend your money or not
 
  • Like
Reactions: bit_user
If you are talking about Paul Walker from The fast and the furious, let me tell that that the car was probably modified by Roger Rodas' garage, "Always Evolving", and it's clear that they were not up to snuff.

About the issue, I'm looking forward to buy a discount MSI GT77 with Intel® Core™ i9-13980HX. Is there any official statement about this kind of processors, or even from MSI itself, any BIOS update? They are pretty expensive machines.
I know Intel keeps saying mobile chips aren't affected but they also said anything above 60w tdp is sus, so I'd err on the side of caution.
 
A lot of reading comprehension problems around these parts. I said the scale is overblown and the way a lot of people are acting is like every CPU is a ticking time bomb that are going to die.
Please cite some examples of where people are acting like that.

They sell a lot of PCs with them, including in house along with the retail products. It's not in their best interest to create angry customers.
Intel said it would be solved and if not, they could potentially sue Intel (or recoup some of their losses by other means).

The gravity of the situation was more the ticking time bomb aspect, than being inundated. That doesn't mean their numbers were wrong, as they did some curve-fitting and tried to predict their eventual liability via extrapolation.

Plus, if Dell stops selling Raptor Lake i9-14900K and HP or Lenovo doesn't, everyone who wanted a machine with that CPU will just buy it from one of their competitors. Furthermore, each new machine they sell has some time before it fails. So, the only situation where they would plausibly stop selling them is if Intel said a mitigation weren't possible. Otherwise, they'd be counting on Intel to deliver it before those new machines started failing.

And yet you're defending presuming you're right
talk about a massive amount of hypocrisy right here.
Right about what?

At least my stance is based upon what information is available
No, a lack of information is a lack of information. You can't treat an unknown as implying a negative, which is exactly what you're doing.

rather than using confirmation bias to yell the sky is falling.
It's not confirmation bias, because I didn't go searching for something that confirmed a particular presumption. The data I had seen was from that video, so I cited it. The only other statistical data I've seen is from Puget, but that's of limited applicability for the reasons I mentioned.

Edit: actually, here's another one: https://www.tomshardware.com/pc-com...have-4x-higher-return-rate-than-the-prior-gen

Yeah, it's only 4x, but the nasty thing about degradation is that the failure rate will increase with time! So, if we had their precise sales & returns data, we could try to extrapolate it.

If you have other data, you should cite it.

They've been acknowledging issues for a lot longer than 2 months.
As recently as 4 months ago, they were still trying to throw motherboard makers under the bus:
 
Last edited:
  • Like
Reactions: Ogotai
Actually it is Intel moving the goal post, the whole 2 generation of totl cpus, should be able to work out of the box and with enough self protection so it don’t self destruct within warranty gets into trouble within a year.

And that the goal post moved from user overclocking to unlimited PL, then to eTVB bug, and then to the VID spike past 1.55v and now this latest fix. It’s really convincing that this will really be finally fixing it.

In the history of PC parts, I didn’t recall any major hardware product having this much issues announced
That's just it, isn't it? If we look at the totality of how much has needed fixing, thus far, it's pretty astonishing. All the moreso that it's happening 1.5+ years into the product's life-cycle! For these defects to reach the field, at all, is bad enough. That they are only getting addressed at this late stage is what makes it seem like something is rotten, at Intel.

Intel says the new CPUs won't have this same defect, and I tend to believe them. However, without fixing their broken processes and understaffed/overworked QA department, how can we trust their new processors won't have other defects of similar severity? And, given their recent financial woes & unprecedented layoffs, how are we to believe they're actually fixing the structural problems which resulted in this situation?
 
That's just it, isn't it? If we look at the totality of how much has needed fixing, thus far, it's pretty astonishing. All the moreso that it's happening 1.5+ years into the product's life-cycle! For these defects to reach the field, at all, is bad enough. That they are only getting addressed at this late stage is what makes it seem like something is rotten, at Intel.

Intel says the new CPUs won't have this same defect, and I tend to believe them. However, without fixing their broken processes and understaffed/overworked QA department, how can we trust their new processors won't have other defects of similar severity? And, given their recent financial woes & unprecedented layoffs, how are we to believe they're actually fixing the structural problems which resulted in this situation?
That’s my who trust issue for intel at the moment, a major product line they dominated the market, 2 whole generations, and at literally the time of EOL they announce it’s finally fixed for those new CPUs, and step old ones to further degrade, while promoting their new gen, isn’t assuring in QA department for the new line, developed during the time of the troubles current gen are facing
 
  • Like
Reactions: bit_user
If you are talking about Paul Walker from The fast and the furious, let me tell that that the car was probably modified by Roger Rodas' garage, "Always Evolving", and it's clear that they were not up to snuff.
Oh, if that's the one then they would've disabled traction control, anyhow. They were out for a "fun" drive, which you generally do with it off, so you can do donuts, power slides, drifting, and other general hooning.

Traction control is good for cruising, rain, and novice drivers who don't know about things like lift-off oversteer and aren't familiar with the car's limits. However, most people who buy a supercar probably don't do so with the intention of leaving traction control on all the time. It would seriously limit the amount of fun you could have with the car at low speeds, which is actually where it's less dangerous.

Some people even claim that traction control is dangerous, because it gives the driver an impression that the car can correct for any mistakes they make. I think that's probably going too far, but the problem of overconfidence is real.

Some cars even have an option to disable ABS, but I think that's going too far. The slight improvement you get in stopping distance isn't worth the potential loss of control under braking, should you accidentally lock up the wheels and not realize it.
 
Full price still?
There must be enough people still buying them then. If nobody bought then prices would drop.
Most people shopping at Best Buy, and the like, would have no clue about these issues with Intel's chips, much less how to attempt to fix it. "Buyer beware" is almost, surely, an idea long gone from most consumer's minds nowadays. There are probably not many people even within Best Buy - other than maybe the bean counters - who know of the problem. Plus, they likely have some agreement with Intel about returns/rma's on PC's containing their chips anyway.
 
  • Like
Reactions: bit_user
Please cite some examples of where people are acting like that.
All over every public forum for months? You upvoted one on the first page of this thread so let's just drop this one.
Intel said it would be solved and if not, they could potentially sue Intel (or recoup some of their losses by other means).

The gravity of the situation was more the ticking time bomb aspect, than being inundated. That doesn't mean their numbers were wrong, as they did some curve-fitting and tried to predict their eventual liability via extrapolation.

Plus, if Dell stops selling Raptor Lake i9-14900K and HP or Lenovo doesn't, everyone who wanted a machine with that CPU will just buy it from one of their competitors. Furthermore, each new machine they sell has some time before it fails. So, the only situation where they would plausibly stop selling them is if Intel said a mitigation weren't possible. Otherwise, they'd be counting on Intel to deliver it before those new machines started failing.
Intel seemingly didn't tell OEMs about the oxidation in early RPL batches based on what information leaked out. What makes you think Intel promised them any solution before they did so publicly to everyone? We know OEMs sure don't run crazy power settings on their systems so the public statements about motherboards doesn't apply to them.

If these problems were actually happening at a high enough rate they would have done something about it.
Right about what?
In the statement you'd quoted I was referring to you and all the others who keep saying Intel doesn't have sufficient QA (which you're still doing). That's why you got the response you did.
No, a lack of information is a lack of information. You can't treat an unknown as implying a negative, which is exactly what you're doing.
This was a childish edit on my part that I shouldn't have put in, but by the time I reassessed it seemed disingenuous to remove it.
It's not confirmation bias, because I didn't go searching for something that confirmed a particular presumption. The data I had seen was from that video, so I cited it. The only other statistical data I've seen is from Puget, but that's of limited applicability for the reasons I mentioned.
Puget is very applicable for the vast majority of systems sold with these parts. It certainly isn't for any direct retail sales or servers however. To date I believe it's the only official data on the issue and everything else is varied levels of heresay.

Wendell himself has gone to great lengths to explain repeatedly that he's not being given specifics as to failures just that they're higher. Of course ADL also seems to be unusually good regarding failures so that adds another potential layer of context (it also may mean nothing, but without more information it's what we've got).
As recently as 4 months ago, they were still trying to throw motherboard makers under the bus:
Does this somehow invalidate that Intel acknowledged stability issues back in Feb?
That's just it, isn't it? If we look at the totality of how much has needed fixing, thus far, it's pretty astonishing. All the moreso that it's happening 1.5+ years into the product's life-cycle! For these defects to reach the field, at all, is bad enough. That they are only getting addressed at this late stage is what makes it seem like something is rotten, at Intel.

Intel says the new CPUs won't have this same defect, and I tend to believe them. However, without fixing their broken processes and understaffed/overworked QA department, how can we trust their new processors won't have other defects of similar severity? And, given their recent financial woes & unprecedented layoffs, how are we to believe they're actually fixing the structural problems which resulted in this situation?
Why would you believe them if you're willing to pass off your feelings as statements with such conviction?

You're so dead set on it being some level of incompetence which led to this making it into production. You don't seem to be thinking of this stance from the other side: if it was so easy they should have caught it before launch why did it take 6+ months for them to figure out the problem once they knew there was one? If the first is true then the second exposes Intel as being bad at one of their core competencies and nothing they do should be trusted.

edit: I'm sure you know my stance, but others might not:

Intel has handled the entire situation in the exact awful manner one expects from any publicly traded company. Deflect, blame and hope it goes away. I have never and would never defend their response and how they dagged their feet publicly at every turn.
 
  • Like
Reactions: adbatista
All over every public forum for months? You upvoted one on the first page of this thread so let's just drop this one.

Intel seemingly didn't tell OEMs about the oxidation in early RPL batches based on what information leaked out. What makes you think Intel promised them any solution before they did so publicly to everyone? We know OEMs sure don't run crazy power settings on their systems so the public statements about motherboards doesn't apply to them.

If these problems were actually happening at a high enough rate they would have done something about it.

In the statement you'd quoted I was referring to you and all the others who keep saying Intel doesn't have sufficient QA (which you're still doing). That's why you got the response you did.

This was a childish edit on my part that I shouldn't have put in, but by the time I reassessed it seemed disingenuous to remove it.

Puget is very applicable for the vast majority of systems sold with these parts. It certainly isn't for any direct retail sales or servers however. To date I believe it's the only official data on the issue and everything else is varied levels of heresay.

Wendell himself has gone to great lengths to explain repeatedly that he's not being given specifics as to failures just that they're higher. Of course ADL also seems to be unusually good regarding failures so that adds another potential layer of context (it also may mean nothing, but without more information it's what we've got).

Does this somehow invalidate that Intel acknowledged stability issues back in Feb?

Why would you believe them if you're willing to pass off your feelings as statements with such conviction?

You're so dead set on it being some level of incompetence which led to this making it into production. You don't seem to be thinking of this stance from the other side: if it was so easy they should have caught it before launch why did it take 6+ months for them to figure out the problem once they knew there was one? If the first is true then the second exposes Intel as being bad at one of their core competencies and nothing they do should be trusted.

edit: I'm sure you know my stance, but others might not:

Intel has handled the entire situation in the exact awful manner one expects from any publicly traded company. Deflect, blame and hope it goes away. I have never and would never defend their response and how they dagged their feet publicly at every turn.
They admitted there’s instability issue back then but not admitting that’s their issue, it’s everyone’s mishandling causing the trouble, they admitted it’s an intel’s problem only in July where they showed no vendor is immune to the degradation issues.

And as said before, it’s that this kind of serious issues which is able to slip past their accelerated wear test, and hit 2 generations of products yet finally trying to say it’s fixed one after another mitigation, at the launch of the next gen, isn’t reassuring for the new generation’s quality, and it still sounds like it’s possible to have it not really fixed, just delayed. The flaw itself isn’t a fatal mistake for Intel IMO, how it’s admitted to be their issue and how they subsequently handled it is kind of fatal to them.
 
Most people shopping at Best Buy, and the like, would have no clue about these issues with Intel's chips, much less how to attempt to fix it. "Buyer beware" is almost, surely, an idea long gone from most consumer's minds nowadays.
Computers are also a complex product and most people who don't know what they're buying probably consider it too complex to understand and have a basic expectation that whatever they buy will "just work", even if it's not the fastest or most cost-effective option. Heck, my own mother once bought a laptop without consulting me, and it turned out to be a pretty bad deal for her.

There are probably not many people even within Best Buy - other than maybe the bean counters - who know of the problem.
Right. So, the most telling detail would be how much they charged for the extended warranty, if this were right before Intel started talking about warranty extensions. We've seen other examples of service contract prices shooting up on those system, when they remained stable on AMD CPUs. That's because some actuary, somewhere, was sitting down and doing the math.
 
Puget is very applicable for the vast majority of systems sold with these parts. It certainly isn't for any direct retail sales or servers however. To date I believe it's the only official data on the issue and everything else is varied levels of heresay.
I cited another source where a retailer experienced 4x the return rate for Gen 13 CPUs as Gen 12. That wasn't filtered by K-series, either, which implies the K-series return rate should be even higher:

"According to data from Les Numeriques, only 1% of AMD processors were returned in 2020, while Intel had a 1.75% return rate then. So, if AMD’s return rate remained stable since then, we can extrapolate that the Raptor Lake chips have a return rate of 4% to 7% while Raptor Lake Refresh processors would have 3% to 5.25%. We should also note that these numbers only reflect return rates that went through the retailer channels, not those that went straight to Intel."

https://www.tomshardware.com/pc-com...have-4x-higher-return-rate-than-the-prior-gen

You're so dead set on it being some level of incompetence which led to this making it into production. You don't seem to be thinking of this stance from the other side: if it was so easy they should have caught it before launch why did it take 6+ months for them to figure out the problem once they knew there was one? If the first is true then the second exposes Intel as being bad at one of their core competencies and nothing they do should be trusted.
This is some weird logic. Just because a problem their QA failed to catch is tricky for them to both debug and mitigate doesn't mean the oversight by their QA team is excusable. All the QA team had to do was catch the symptom, which is often a lot easier than finding the root cause of a problem. Were it not so, then you'd see some of the best-paid positions and highest job qualifications being in QA, yet they tend to be among the lowest of R&D employees.

What they clearly should've done is some detailed testing of the CPU's internal voltage regulation & management, to make sure it was always staying within safe limits. That seems like it ought to be pretty near the top of the list, maybe just underneath ensuring all the instructions work correctly.

As for why it's taken so long for these mitigations to dribble out, people can & will speculate as they wish. It definitely gives the feeling of Intel trying to run down the clock, even if that's not the reality.

Intel has handled the entire situation in the exact awful manner one expects from any publicly traded company. Deflect, blame and hope it goes away. I have never and would never defend their response and how they dagged their feet publicly at every turn.
That's a BS, over-the-top cynical take that does nobody any good. I'll agree that companies will tend to do whatever they think they can get away with (though there are exceptions). However, it's only by holding them to a higher standard and ensuring they suffer the full injury they're due that we can realistically expect they & others will ever do better.