News Intel's CPU instability and crashing issues also impact mainstream 65W and higher 'non-K' models — damage is irreversible, no planned recall

Page 6 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
I am completely out of my league in this conversation, but Intel's statement is consistent with what greymaterial suggested when he said C0 dies are likely not affected.
It was meant as sarcasm.

To address what you're saying yes it's at the very least due to the die types. There are some anomalies in the stack where C0/B0 are both used for the same SKU like the 13400 and 14400. I assume these parts likely do not need a microcode update as they use ADL specifications despite using ADL and RPL die. Unless Intel releases a list of potentially affected SKUs nobody can know for sure regarding these specific ones.
 
What does any of this matter? If it was guaranteed to being affecting operation of every single part I'd agree, but there's no evidence to support that.

Money... you don't seem to understand how much a recall of all 65W+ 13th/14th Gen parts would cost them. It would cost them in the billions of dollars to do a recall like that and there's no evidence to support that being warranted.

You can just take a gander at how Intel was dragged kicking and screaming into replacing Pentiums in the 90s which had an unfixable silicon bug (predated microcode) where a recall was the only option.

Well, we have seen what a terminal failure can lead to with the recent crowdstrike issue.

Typically you can mitigate against CPU bugs by coding around the bugs. With an actual silicon degradation over time, which is what this issue is, there's no such possibility as it can fail randomly at any given time and cause crashes that are virtually impossible to pinpoint, mitigate or fix.

That is why there should be a recall for this, as there's no possibility to mitigate an already damaged CPU, and there's no way to know if any of the CPU's currently in use are damaged and to what degree. For all we know every single one of these CPU's could fail at some point in the future.

It's a total disaster for Intel and a recall costing billions of dollars might be cheaper than the damage this will do over time to Intel and their stock value.
 
  • Like
Reactions: artk2219
I have fairly good experience with this problem. I have owned both a Core i5-13500 and Core i7-14700 (the i7 was an upgrade from the i5 earlier this year). Both CPUs were installed in the same motherboard and same chassis. The motherboard was an ASUS ROG STRIX B760-I mini-ITX board. I was good about staying current with the BIOS. I did the minimal overclocking of the CPU provided by this BIOS for non-K parts. I also overclocked the RAM from 4800 -> 5600 MT/s in the BIOS. I am using Team Group DDR5 RAM. All components in this PC have been the same except the two CPUs mentioned above.

Anyway I have had zero problems with either of these CPUs. When I first heard about instability issues it seemed that the issues were related to overclocking/volting the CPU, so I immediately disabled any overclocking and installed a new BIOS back in April this year which included an update to the microcode version 123. Still things were stable in this PC.

Today I finally got around to installing the latest available ASUS BIOS for this motherboard which is version 1661 and includes microcode update 0x125. Again, I am not overclocking the CPU, but I am still overclocking the RAM to 5600. Again zero instability issues. I ran Cinebench R24 after this latest update for 10 minutes and got multi-core score of 1572. Geekbench 6 scores are slightly improved from before the latest BIOS update. The best score I got today is:
2936 (s)19224 (m)

Anyway, I gotta say, from my perspective this whole thing is overblown... I just haven't had any problems with these two supposedly affected CPUs. They are both 65W variants and boost up to over 156w and 225w respectively. They've been rock solid ever since I put this PC together from (all new) parts.
 
  • Like
Reactions: slightnitpick
Anyway, I gotta say, from my perspective this whole thing is overblown... I just haven't had any problems with these two supposedly affected CPUs. They are both 65W variants and boost up to over 156w and 225w respectively. They've been rock solid ever since I put this PC together from (all new) parts.
Maybe you don't run it hot very often, and that's the critical factor?
 
For right now they're not doing a general recall. Of course they'd rather not. But I'll betcha they also have Plans B, C, D, E, F, and G all mapped out as things develop over the next year or three.

Anyway my procrastination in getting a new workstation has paid off, hurrah! But as others have said now I'm nervous about the newer chips, too. When can I run Win11 at home? LOL
 
90 percent of gamers with these chips aren't currently running into any issues.
Its quite probable that 75+ percent of gamers will never have problems before they replace the CPU

Having 10-25 percent of CPUs cause weird errors on an intermittent basis is a big problem for everyone, even if most people will never encounter the issue.

We wouldn't have gotten to the point where NVidia is publicly blaming Intel for certain GPU errors and gaming companies adding "Your CPU is bad" error messages if this wasn't the worst CPU issue in decades.

Of course if Intel would just release a tool to check for bad CPUs and extend the warranty, it would probably blow over. Those FAQ entries and error messages would just ask you to run the tool and follow the RMA process if it detected a problem
 
I have fairly good experience with this problem. I have owned both a Core i5-13500 and Core i7-14700 (the i7 was an upgrade from the i5 earlier this year). Both CPUs were installed in the same motherboard and same chassis. The motherboard was an ASUS ROG STRIX B760-I mini-ITX board. I was good about staying current with the BIOS. I did the minimal overclocking of the CPU provided by this BIOS for non-K parts. I also overclocked the RAM from 4800 -> 5600 MT/s in the BIOS. I am using Team Group DDR5 RAM. All components in this PC have been the same except the two CPUs mentioned above.

Anyway I have had zero problems with either of these CPUs. When I first heard about instability issues it seemed that the issues were related to overclocking/volting the CPU, so I immediately disabled any overclocking and installed a new BIOS back in April this year which included an update to the microcode version 123. Still things were stable in this PC.

Today I finally got around to installing the latest available ASUS BIOS for this motherboard which is version 1661 and includes microcode update 0x125. Again, I am not overclocking the CPU, but I am still overclocking the RAM to 5600. Again zero instability issues. I ran Cinebench R24 after this latest update for 10 minutes and got multi-core score of 1572. Geekbench 6 scores are slightly improved from before the latest BIOS update. The best score I got today is:
2936 (s)19224 (m)

Anyway, I gotta say, from my perspective this whole thing is overblown... I just haven't had any problems with these two supposedly affected CPUs. They are both 65W variants and boost up to over 156w and 225w respectively. They've been rock solid ever since I put this PC together from (all new) parts.
Have you ever stressed your CPU for long time periods ?
Have you ever done real stress tests to check stability (Minutes of Cinebench does not mean your system is stable) ?
 
  • Like
Reactions: artk2219
Have you ever stressed your CPU for long time periods ?
Have you ever done real stress tests to check stability (Minutes of Cinebench does not mean your system is stable) ?
I get a used 13500t and a 13600t from ebay.
Gaming with these cpus with 50/65 watts (35w original) no problems a all. These days ago I have changed the cpu to burn 90w not single blue screen =] the 13600T has stock clock of 1.8ghz and runs 4.1ghz all day long ...
 
  • Like
Reactions: artk2219
Have you ever stressed your CPU for long time periods ?
Have you ever done real stress tests to check stability (Minutes of Cinebench does not mean your system is stable) ?
Stressing it doesn't mean anything. Cinebench isn't even particularly heavy. There are cpus that can do ycruncher for hours and crash on nvidia driver instalation, which is very light.

The issue is way more complicated than just "stress your cpu"
 
For right now they're not doing a general recall. Of course they'd rather not. But I'll betcha they also have Plans B, C, D, E, F, and G all mapped out as things develop over the next year or three.

Anyway my procrastination in getting a new workstation has paid off, hurrah! But as others have said now I'm nervous about the newer chips, too. When can I run Win11 at home? LOL
Switch to AMD. Simple, problem and crisis averted. It's not like we as consumers only have one CPU manufacturer to choose from. Yeah, maybe brand loyalty or something keeps you from it. But it is an option and it can run Windows 11 at home! Wooooo!!!
 
90 percent of gamers with these chips aren't currently running into any issues.
Its quite probable that 75+ percent of gamers will never have problems before they replace the CPU

Having 10-25 percent of CPUs cause weird errors on an intermittent basis is a big problem for everyone, even if most people will never encounter the issue.

We wouldn't have gotten to the point where NVidia is publicly blaming Intel for certain GPU errors and gaming companies adding "Your CPU is bad" error messages if this wasn't the worst CPU issue in decades.

Of course if Intel would just release a tool to check for bad CPUs and extend the warranty, it would probably blow over. Those FAQ entries and error messages would just ask you to run the tool and follow the RMA process if it detected a problem
Can you post a link for your references? I've seen and heard it's a larger group percentage. Plus the PC world isn't just gamers. Millions of other Intel CPUs from 14th and 14th are being used.
 
  • Like
Reactions: LolaGT and artk2219
I have fairly good experience with this problem. I have owned both a Core i5-13500 and Core i7-14700 (the i7 was an upgrade from the i5 earlier this year). Both CPUs were installed in the same motherboard and same chassis. The motherboard was an ASUS ROG STRIX B760-I mini-ITX board. I was good about staying current with the BIOS. I did the minimal overclocking of the CPU provided by this BIOS for non-K parts. I also overclocked the RAM from 4800 -> 5600 MT/s in the BIOS. I am using Team Group DDR5 RAM. All components in this PC have been the same except the two CPUs mentioned above.

Anyway I have had zero problems with either of these CPUs. When I first heard about instability issues it seemed that the issues were related to overclocking/volting the CPU, so I immediately disabled any overclocking and installed a new BIOS back in April this year which included an update to the microcode version 123. Still things were stable in this PC.

Today I finally got around to installing the latest available ASUS BIOS for this motherboard which is version 1661 and includes microcode update 0x125. Again, I am not overclocking the CPU, but I am still overclocking the RAM to 5600. Again zero instability issues. I ran Cinebench R24 after this latest update for 10 minutes and got multi-core score of 1572. Geekbench 6 scores are slightly improved from before the latest BIOS update. The best score I got today is:
2936 (s)19224 (m)

Anyway, I gotta say, from my perspective this whole thing is overblown... I just haven't had any problems with these two supposedly affected CPUs. They are both 65W variants and boost up to over 156w and 225w respectively. They've been rock solid ever since I put this PC together from (all new) parts.
Yeah, you're stable now, but will you be tomorrow or the next day? That sucks having that in the back of your mind every time you use your PC. Any little hiccup you'll wonder if the degradation has started.
 
Stressing it doesn't mean anything. Cinebench isn't even particularly heavy. There are cpus that can do ycruncher for hours and crash on nvidia driver instalation, which is very light.

The issue is way more complicated than just "stress your cpu"
But if you don't squeeze the CPU (games, rendering, whatever) it is unlikely it is damaged.
And to confirm stability, 10 minutes of cinebench are not enough.
In the end, to say "my CPU is rock solid", without heavy workloads and appropriate testing does means nothing.
 
  • Like
Reactions: artk2219
There's no way to test for silicon degradation. There's also no way to know how it will fail, since there's no way to even determine what is actually degraded inside the silicon.

That is why there should be a full recall, because there's simply no way to know if your CPU has degraded or not, what the % of degraded CPU's out there are, how many more have been degraded but still operate mostly properly etc...
 
  • Like
Reactions: bit_user
Switch to AMD. Simple, problem and crisis averted. It's not like we as consumers only have one CPU manufacturer to choose from. Yeah, maybe brand loyalty or something keeps you from it. But it is an option and it can run Windows 11 at home! Wooooo!!!
Yeah wish it was that simple. There are many reasons that people buy intel and switching to amd isn't really an option.
 
  • Like
Reactions: Guardians Bane
Yeah, you're stable now, but will you be tomorrow or the next day? That sucks having that in the back of your mind every time you use your PC. Any little hiccup you'll wonder if the degradation has started.
For me, the biggest concern isn't bluescreens, but rather it's data corruption. In the worst case, it could theoretically trash your filesystem, resulting in data loss. Another bad scenario is that it silently introduces corruption in some of the files you write, and you don't notice until later, when it's no longer possible to revert to an earlier version.

This is why I don't overclock and I prefer to use ECC memory. However, if the CPU is generating bad data, there's nothing you can do to mitigate against that.
 
I have a 13980HX in my laptop and that thing can pull 157W at load and 100W sustained I think. What am I supposed to do if the processor died and it’s past ASUS’ warranty period?

Intel either needs to tell OEMs to extend warranty or do a special exchange that will cover life of the motherboard
 
To stress test my AM5 system I run the benchmarks and games at 30C ambient (whole room fully saturated), benchmarks for 20-30mins and games for a few hours, all spanning over many weeks in my spare time before i officially say its stable. Despite being stable, there is the occasional random shut downs and BSOD every 1-2weeks that despite throwing a few grand at it, I haven't figured out why it happens.

Computers are a bit like engines, they have to work harder in the heat, and my system must be stable at 30C as that is the average realworld temps in summer when I use my system. Running tests at 15-20C is a bit like cheating. I also have over engineered the system so its under stressed.

As far as Intel goes, I wont touch there CPUs for probably another 5years plus. They have done so much damage staying quiet. I must protect myself and sorry, I just cant take that risk in a world where there is no protection. RMAs, and the rest of it dont work, putting up with faulty hardware for me anyway, is usually so damaging that a full refund doesn't even come close to truly paying the cost of the damage from my wasted time. No protection!
 
For me, the biggest concern isn't bluescreens, but rather it's data corruption. In the worst case, it could theoretically trash your filesystem, resulting in data loss. Another bad scenario is that it silently introduces corruption in some of the files you write, and you don't notice until later, when it's no longer possible to revert to an earlier version.

This is why I don't overclock and I prefer to use ECC memory. However, if the CPU is generating bad data, there's nothing you can do to mitigate against that.
Data corruption is usually a memory issue, never seen it happen from unstable cpu. Too high TREFI (over 65k) can lead to even your whole windows instalation being corrupted.
 
I have a 13980HX in my laptop and that thing can pull 157W at load and 100W sustained I think. What am I supposed to do if the processor died and it’s past ASUS’ warranty period?

Intel either needs to tell OEMs to extend warranty or do a special exchange that will cover life of the motherboard
Your cpu is a desktop die, 157w peak and 100w is peanuts. Nothing is going to happen to the cpu, I'd be more worried about your laptops vrms
 
To stress test my AM5 system I run the benchmarks and games at 30C ambient (whole room fully saturated), benchmarks for 20-30mins and games for a few hours, all spanning over many weeks in my spare time before i officially say its stable. Despite being stable, there is the occasional random shut downs and BSOD every 1-2weeks that despite throwing a few grand at it, I haven't figured out why it happens.

Computers are a bit like engines, they have to work harder in the heat, and my system must be stable at 30C as that is the average realworld temps in summer when I use my system. Running tests at 15-20C is a bit like cheating. I also have over engineered the system so its under stressed.

As far as Intel goes, I wont touch there CPUs for probably another 5years plus. They have done so much damage staying quiet. I must protect myself and sorry, I just cant take that risk in a world where there is no protection. RMAs, and the rest of it dont work, putting up with faulty hardware for me anyway, is usually so damaging that a full refund doesn't even come close to truly paying the cost of the damage from my wasted time. No protection!
Your pc is throwing bsods and crashes but you aren't touching intel cause they are not stable. Oh boy....
 
Data corruption is usually a memory issue, never seen it happen from unstable cpu.
Yes, but we haven't seen a faulty CPU situation like this. If a CPU is producing bad results, all bets are off.

Too high TREFI (over 65k) can lead to even your whole windows instalation being corrupted.
I don't know what that is. A web search tells me:

tREFI is the "Maximum average periodic refresh"

Yeah, bad RAM settings can quickly & easily cause it to become extremely unreliable.

Did I mention that I don't overclock and use ECC memory? Again, the issue I'd worry about here, is that there's nothing comparable I can do about a faulty CPU.
 
  • Like
Reactions: Nitrate55
I hope they get taken to the cleaners via a class action lawsuit, and if FTC doesn't do anything, then maybe WTO or EC will do something about it.
Intel still has the majority marketshare and mindshare. Their pockets are very very deep. Annual revenue was $54 billion in 2023 and profit was $22 billion. Even if the lawyers took $10 billion, it's just a drop in the bucket.

FTC and WTO don't handle issues like this. They deal with unfair trade practices. But this is a defective product case. It's a civil matter and Intel says they have a "fix," so I see no reason to intervene.
 
  • Like
Reactions: slightnitpick
Your pc is throwing bsods and crashes but you aren't touching intel cause they are not stable. Oh boy....
My old work laptop (Skylake HX model) was having about one BSoD per week, for over a year. It got so bad that a cataloged each and every crash, in order to try and find some kind of pattern. I ran all the hardware diags I could find, but apparently nothing was wrong with the hardware.

Eventually, some driver issue or whatever got sorted out (plus, Windows 11 got launched, so Microsoft stopped mucking about with Win 10) and it became incredibly stable, by the time I got upgraded. I was almost reluctant to give it up.

Point is: there are lots of reasons a PC can bluescreen. That users bluescreens might have nothing to do with the CPU or motherboard. They need to be debugged to find the root cause.
 
Annual revenue was $54 billion in 2023 and profit was $22 billion. Even if the lawyers took $10 billion, it's just a drop in the bucket.
Shareholders don't see it that way. Such huge costs could eat into their dividends.

Furthermore, profitable companies have layoffs all the time. All it takes is a down quarter, where the amount of profit was lower than expected. Such layoffs can impact future performance (although they generally try to minimize that). So, you really can't say there's no potential threat to Intel, here.

As I said before, I'm not arguing what's morally right. I'm just trying to share how I think Intel probably views the matter of doing a recall (and yes, my back-of-the-envelope math is on the order of $10B).
 
Last edited:
Status
Not open for further replies.