News Intel finds root cause of CPU crashing and instability errors, prepares new and final microcode update

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Does this issue affect mobile processors? Asking because my new laptop has the 14900HX.
They've been pretty consistent saying no (from this latest community update):
Intel® reaffirms that both Intel® Core™ 13th and 14th Gen mobile processors and future client product families – including the codename Lunar Lake and Arrow Lake families - are unaffected by the Vmin Shift Instability issue. We appreciate our customers’ patience throughout the investigation, as well as our partners’ support in the analysis and relevant mitigations.
 
Let’s be realistic they are moving on toward their ultra line of CPUs. Very convenient time for them to come out and declare past things fixed as if to be able to say move along we fixed things please buy our new products. But at this point not much else they can really do even if this is actually the fix.
 
  • Like
Reactions: bit_user
You know why that is, because the truth of the "crisis" is far less than the overblown news stories. Far less than the techtubers(who really only care about your views) make it out to be. Since a very small part of the market owns or owned the i9k sku silicon, and almost all that do actually know how to navigate a modern bios, the actual number of issues are far less than the outraged are reporting.
If it was a real problem across the whole set of k silicon for major retailers they would be so tangled up and strangled in CS that it would have been the real news.

It is bizarre that retail outlets such as Bestbuy and Dell are still selling systems with these chips at full price.
 
  • Like
Reactions: KyaraM and rluker5
Yikes. It absolutely is Intel's fault for having limits that they don't actively enforce by default. Intel has knowingly been letting this slide for years and when reviewers queried this with Intel they even said outright that exceeding the recommended limits is still considered within spec.

Now Intel will, finally, be actually enforcing the advertised limits. Took them partially bricking 1/2 generations to finally actually address this.
There was a guy on the news a couple of years ago. He bought a Porsche Carrera GT, a car that is notorious hard to drive and that didn't have traction control. Ended up crashing and dying. Sure you could argue it's porches fault for not including traction control but if you don't know how to drive, don't buy a car like that?

Now apply that analogy to unlocked power unlimited chips that are trying to boost to as high clockspeeds as possible. If you don't know what you are doing, don't buy them. Buy the locked parts.
 
  • Like
Reactions: Sluggotg
There was a guy on the news a couple of years ago. He bought a Porsche Carrera GT, a car that is notorious hard to drive and that didn't have traction control. Ended up crashing and dying. Sure you could argue it's porches fault for not including traction control but if you don't know how to drive, don't buy a car like that?
I guess a driver blaming Porsche for providing more power than they can handle when they mash their foot down is like Intel trying to blame the motherboard manufacturers when their microcode errors are demanding too much voltage and ignoring elevated temperatures.
 
  • Like
Reactions: Roland Of Gilead
I guess a driver blaming Porsche for providing more power than they can handle when they mash their foot down is like Intel trying to blame the motherboard manufacturers when their microcode errors are demanding too much voltage and ignoring elevated temperatures.
Again, you don't know how to drive, don't buy an unlocked motherboard and an unlocked cpu. That's the while point locked parts exist.
 
  • Like
Reactions: KyaraM and Sluggotg
It's interesting that Puget Systems, a manufacturer of high-end workstations, reported that at the same time they actually had more failures in AMD processors (though no higher than normal.) The reason? They never exceeded Intel's recommended power limits.
Oh, dear. We already discussed this ad nauseam, in other threads. Toms even reported on it with a specific article dedicated to it. Your characterization is simplistic to the point of being misleading.

Puget's customers are the types of corporate and professional users whose usage patterns don't necessarily match others' who've experienced these failures. It's unknown how much of a role that might play, but it's been well-established that certain workloads are more likely to trigger degradation than others.

Second, what Puget actually said was the pre-ship failure rate for AM5 CPUs was higher, but the rate of field failures was the second lowest of all the CPUs for which they provided data. Also, we don't know how the pre-ship AM5 failures were distributed across time and model numbers. It could've been mostly near launch, a bad batch, etc. Furthermore, they said their number of samples were pretty small, since they mostly sell Intel, which makes the data more susceptible to noise.

Third, the likelihood of these failures increases with time. So, you'd expect the rate of pre-ship failures of Raptor Lake to be low (although they're higher than Gen 12) and the field failure rate would only increase over time (indeed, it's higher for Gen 13 than Gen 14).

Finally, did you know the Puget CEO is on Intel's board of technical advisors? That makes him very much not a disinterested party (aside from the fact that he's also a customer).
 
Last edited:
There was a guy on the news a couple of years ago. He bought a Porsche Carrera GT, a car that is notorious hard to drive and that didn't have traction control. Ended up crashing and dying. Sure you could argue it's porches fault for not including traction control but if you don't know how to drive, don't buy a car like that?

Now apply that analogy to unlocked power unlimited chips that are trying to boost to as high clockspeeds as possible. If you don't know what you are doing, don't buy them. Buy the locked parts.
Horrible analogy. First, the K-series CPUs come with a warranty and they're supposed to work correctly, at stock settings, and yet it's been proven that they will even degrade in that scenario.

Second, Intel K-series differ in more respects than just whether or not overclocking is locked out. They also have higher stock clock speed limits and power limits. The CPU is supposed to work correctly within those limits. So, you really can't blame the victim, if they buy the fastest CPU and try to use it at the supported settings. There's no warning label on it that conveys the risk its users faced, even in that scenario.

Finally, Intel knows they're at fault, which is why they extended the warranty - even of OEM processors!

Stop victim blaming (leaving aside the Porsche guy, which is both irrelevant and probably had other factors at play).
 
Last edited:
I really wonder how such a catastrophic mistake could slip through all the quality control? Don't chipmakers do any rigorous stress-testing of their chips in order to eliminate just such problems BEFORE releasing a chip?
Not only that, but you'd expect them to do failure analysis of the failed Gen 13 Raptor Lake CPUs that had been returned under warranty.

The most charitable interpretation is that Intel simply did too much cost-cutting in their QA and returns departments. However, I can think of less flattering reasons they dragged their feet on this issue, for so long.
 
  • Like
Reactions: ottonis
Not only that, but you'd expect them to do failure analysis of the failed Gen 13 Raptor Lake CPUs that had been returned under warranty.
Why would they without a specific reason? You should know how expensive that is versus just replacing the part. They undoubtedly have criteria that has to be met before further investigation is done.

Now don't get me wrong their response has been awful and sweeping the early oxidation under the rug is arguably even worse. A lot of people seem to think this issue is somehow simple and given how long they've been discussing it publicly should be a giant clue that it isn't.
I really wonder how such a catastrophic mistake could slip through all the quality control? Don't chipmakers do any rigorous stress-testing of their chips in order to eliminate just such problems BEFORE releasing a chip?
There's a decent chance this could get through because guess what not every chip is failing. I'm so tired of people not grasping how overblown the scale this issue has been. If it was really imminent for even a majority of chips we'd be talking millions of failures. Wendell who had the first discussions about game servers dying said maybe 50% had issues that may have been related and that's a specific workload which was killing chips.

They clearly were playing too fast and loose to guarantee high clockspeeds. This is something Intel has been pretty notorious for since 10th Gen. The voltages were just a lot (relatively speaking) higher this go around.
 
  • Like
Reactions: KyaraM
guess what not every chip is failing. I'm so tired of people not grasping how overblown the scale this issue has been. If it was really imminent for even a majority of chips we'd be talking millions of failures.
Quit gaslighting. We've seen lots of people on here with failures. This is not some fairy tale.

If you don't have data on how big a problem it is, then you can't say it's overblown.
 
Quit gaslighting. We've seen lots of people on here with failures. This is not some fairy tale.

If you don't have data on how big a problem it is, then you can't say it's overblown.
Try reading I said scale of the issue.

It's obvious if a majority of chips were failing retailers would have pulled them from shelves, OEMs would have refused shipments and Intel would have been forced into a stoppage of sale and recall.

If you can't see that then you're the one with the blind spot.
 
Try reading I said scale of the issue.

It's obvious if a majority of chips were failing retailers would have pulled them from shelves, OEMs would have refused shipments and Intel would have been forced into a stoppage of sale and recall.

If you can't see that then you're the one with the blind spot.
I note even a 1% failure rate could easily be a good half million chips.

Fact is, we don't have the necessary data to quantify how bad the problem actually is. Regardless, it's clear there *is* a problem, and the fact that even the majority of the users don't use the CPU in such a way to trigger a problem does not absolve Intel and its partners of the fact there is, in fact, a problem.
 
why would retail unstock and OEMs make themselves unpopular with big blue, breaking their commitment contracts ... if intel is RMAing directly (and consumers are still buying)?

This is not like crashing into a tree. It would be more like a systematic motor failure due to incorrect injection valve control software. Yes, some people drove too fast, but the system is expected to self-protect to a large degree and not self-distruct if you put in a high-octane fuel. (all the people who actually know something about cars will please forgive me and move on)
 
  • Like
Reactions: bit_user
I note even a 1% failure rate could easily be a good half million chips.
The failure rates they were seeing early on could also have been within expected parameters and not raised any red flags so this is meaningless.
Fact is, we don't have the necessary data to quantify how bad the problem actually is.
Which also means there's no basis for the widespread fearmongering going on either.
Regardless, it's clear there *is* a problem, and the fact that even the majority of the users don't use the CPU in such a way to trigger a problem does not absolve Intel and its partners of the fact there is, in fact, a problem.
So what? Who's absolving Intel of anything?

I certainly see a lot of people making uninformed statements accusing Intel with zero evidence though.
 
Try reading I said scale of the issue.

It's obvious if a majority of chips were failing retailers would have pulled them from shelves, OEMs would have refused shipments and Intel would have been forced into a stoppage of sale and recall.

If you can't see that then you're the one with the blind spot.
Mate, TH's own @JarredWaltonGPU had to RMA his CPU due to the same issue, this is not some smear campaign against Intel.
 
Try reading I said scale of the issue.

It's obvious if a majority of chips were failing retailers would have pulled them from shelves, OEMs would have refused shipments and Intel would have been forced into a stoppage of sale and recall.
That's certainly an outcome we could've been headed for, without Intel's mitigations.

OEMs have to weigh the cost of a potential liability against the cost of losing immediate revenue. If Intel is telling them it's issuing mitigations, then it's much easier for them to stay the course.

As for retails not selling them, please explain why. Retailers basically have no skin in the game, other than product A using up shelf space that could instead be used for better-selling product B. Why would a retailer not sell a product (in the US, at least), where the burden really falls on the manufacturer, if the product is defective?
 
The failure rates they were seeing early on could also have been within expected parameters and not raised any red flags so this is meaningless.

Which also means there's no basis for the widespread fearmongering going on either.

So what? Who's absolving Intel of anything?

I certainly see a lot of people making uninformed statements accusing Intel with zero evidence though.
Actually the mere fact that right now, 2 months from the admission of such issues, they have ran out of i9 for replacement worldwide, left and right, that alone says it isn’t as minor in percentage. Not to say the more frequent reporting of RMA getting hold on somewhere and need emailing back and forth for more than a month to get respond.
 
  • Like
Reactions: bit_user
Which also means there's no basis for the widespread fearmongering going on either.
The basis is that real people were experiencing the failures and OEMs were reportedly saying they thought "between 10 and 25% of CPUs have a problem or are marginal in some way."

Source:


I certainly see a lot of people making uninformed statements accusing Intel with zero evidence though.
It's not zero evidence.

There's a basic contradiction in your statement. You're attacking people for over-hyping it without data, but you also lack the data to say that it is being over-hyped. All we can say is that we don't know how big it'd have been without Intel's mitigations and pressuring board makers to dial back their defaults. Absent the data, you don't get to presume an outcome in your favor.

P.S. it's no surprise @rluker5 is backing you on this. He tried to deny it even after Intel had publicly come out and admitted it! So, you've got a nice little Flat Earth faction going.
 
  • Like
Reactions: Ogotai
Horrible analogy. First, the K-series CPUs come with a warranty and they're supposed to work correctly, at stock settings, and yet it's been proven that they will even degrade in that scenario.

Second, Intel K-series differ in more respects than just whether or not overclocking is locked out. They also have higher stock clock speed limits and power limits. The CPU is supposed to work correctly within those limits. So, you really can't blame the victim, if they buy the fastest CPU and try to use it at the supported settings. There's no warning label on it that conveys the risk its users faced, even in that scenario.

Finally, Intel knows they're at fault, which is why they extended the warranty - even of OEM processors!

Stop victim blaming (leaving aside the Porsche guy, which is both irrelevant and probably had other factors at play).
Porsche carera gt comes with a warranty as well and it's supposed to work correctly. Still if you don't know how to drive, kaboom. It has no warning sticker either.
 

TRENDING THREADS