News Intel finally announces a solution for CPU crashing errors — claims elevated voltages are the root cause; fix coming by mid-August

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Gotcha.
Yeah, I am fully aware and believe that no one (possibly not even Intel, still) knows what is really going on.
But that doesn't mean that extrapolations can't be made. People are definitely doing it for clicks, but it at least looks like Wendell is trying to dive deeper and get more data.

What are we supposed to do if not extrapolate and try to gain more data?
Speculating is fine. Discrediting people with dubious hot takes is not something that should be done, specially when there's people affected by this which are being thrown under the bus if people accept Intel not doing anything (until this statement, at least) is ok. Keep in mind if FrameChasers "misinformation" claim is believed by people affected and don't do anything, they'll get a non-working CPU in the short to midterm. Also keep in mind warranties expire (more of a problem for the USA) and Intel just won't honor them if you're late, so... I don't even know why someone would even try to argue "this is not a problem; everything is a hoax" when there's pretty darn conclusive evidence already that it is not.

EDIT: Clarified text and ideas.

Regards.
 
Last edited:
Speculating is fine. Discrediting people with dubious hot takes is not something that should be done, specially when there's people affected by this which are being thrown under the bus if accept Intel not doing anything (until this statement, at least). Keep in mid if FrameChasers "misinformation" claim is believed by people affected and don't do anything, they'll get a non-working CPU in the short to midterm. Keep in mind warranties expire (more of a problem for the USA) and Intel just won't honor them if you're late, so... I don't even know why someone would even try to argue "this is not a problem; everything is a hoax" when there's pretty darn conclusive evidence already that it is not.

Regards.
Yeah, we're on the same page here.
In Intel's case, I believe they quickly took Alder, modified it, did some accelerated (read: not enough) testing and shipped it out the door.

Edit - Intel saying that it's a microcode bug supports my belief. What we don't know is whether or not the issue will truly be fixed in a month or so, when this update gets released.
 
  • Like
Reactions: helper800

wbfox

Distinguished
Jul 27, 2013
99
55
18,620
Robeytech's video on Intel's announcement (who sort of serves as Intel's mouthpiece):
View: https://www.youtube.com/watch?v=wkrOYfmXhIc



That is just speculation for the sake of speculation and verging on scaremongering. Intel officially stated what the root cause of the problem is and they really cannot lie about it. Also the reason they stated makes perfect sense given the experience of people who dealt with the issue and solved it like in this video from framechasers.
View: https://www.youtube.com/watch?v=afN6SaT21cQ
So since you obviously didn't watch the original video to see that the W680 population was used for the very reason that they use Intel's numbers and not the mobo's juiced up spec and that these have seen the very same issues AND that the rate of high RMAs have been occurring since last year, without Intel doing a single thing about it other than swapping out their B2B customer's bad chips with good to keep things quiet and hanging out all the average, unimportant customers, with a handful of hope... And here's a link to a vid that does the same thing. And since microcode can obviously deal with physical degradation caused by a motherboard vendor's settings.
 
  • Like
Reactions: bit_user
I have a 13900k that suffered from degradation as after using the system sporadically for 9 months unstability started. I manually configured the power, current and temp limits on the BIOS and also applied an small undervolt. The system is completely stable now. But clearly there was some damage induced previously...

Every cpu gets degraded just by using it. No cpu will be as good as new after even 1 day of usage. Everyone's cpu is degraded no matter what cpu they are using.

But same thing with the guy complaining about his cpu being degraded. He said it works fine with intel stock settings. So big nothing burger.

I dunno. Aren't K processors sold at a premium precisely because they can be run beyond stock settings?

If you like tweaking the performance of your hardware, look for the “K” designation at the end of the processor name. This indicates that the CPU is designed to be overclocked. Assuming you have the right hardware, such as a proper cooling solution and a motherboard that supports overclocking, you can enjoy the benefits of faster clock speeds with an unlocked CPU.

https://www.intel.co.uk/content/www/uk/en/gaming/resources/gaming-cpu.html
Every car engine "degrades" from day one too. But if I paid extra for a performance-tuned engine only for a manufacturer's faulty injection mapping or rev limit to cause it to wear out in a few thousand miles by the amount that should have taken tens of thousands, meaning it had to be run the same as a standard engine instead to prevent it destroying itself, and it now had the performance of one of those standard engines with ten times the mileage of mine...I'm not sure being told "every car engine degrades every day" would cut it.
 

setx

Distinguished
Dec 10, 2014
263
233
19,060
Do you understand what electromigration is? Well you don't, if you did you wouldn't ask. Every cpu degrades just by being used, even at super safe settings.

Higher voltages and amperages accelerate the process but there isn't a cut off, like under this voltage it stops degrading.
First you show us your incompetence, now that you can't even read.

I asked how you measured the degradation, not empty words again.

Not in any measurable way. I mean even if they have to drop clocks, how much? By 100 mhz? 200? Let's say 300 just for the sake. There will be a 3% drop in MT performance by that. It just doesn't matter even if it happens.
How fun. When you try to defend Intel you suddenly understand the concept of measurable. Now apply that to CPU degradation for CPUs without manufacturing defects, like Westmere that I mentioned, and measure anything: electromigration or whatever.
 

A Stoner

Distinguished
Jan 19, 2009
374
140
18,960
Sounds like a massive lawsuit is going to happen over this... The reason it took so long is likely they were trying to find a way to make it someone else' fault.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
First you show us your incompetence, now that you can't even read.

I asked how you measured the degradation, not empty words again.


How fun. When you try to defend Intel you suddenly understand the concept of measurable. Now apply that to CPU degradation for CPUs without manufacturing defects, like Westmere that I mentioned, and measure anything: electromigration or whatever.
Again, if you understood what electromigration is you wouldn't even be asking. Every conductor on the planet degrades just by using it. Your cpu is a conductor. I don't need to measure it, the work has been done the last 5-6 decades. Maybe more.
 

DrDocumentum

Reputable
Apr 10, 2020
12
20
4,515
I dunno. Aren't K processors sold at a premium precisely because they can be run beyond stock settings?


Every car engine "degrades" from day one too. But if I paid extra for a performance-tuned engine only for a manufacturer's faulty injection mapping or rev limit to cause it to wear out in a few thousand miles by the amount that should have taken tens of thousands, meaning it had to be run the same as a standard engine instead to prevent it destroying itself, and it now had the performance of one of those standard engines with ten times the mileage of mine...I'm not sure being told "every car engine degrades every day" would cut it.
Clearly the TheHeral guy is related to Intel. He is defending Intel's position with completely unreasonable arguments. Hi's that it is normal to get degradation on a frugal 9 month usage on a CPU. This is my first time happening (I am an IT guy and have been assembling my own systems from 1995) that a CPU degrades. I have two 12900Ks too and those are more than a year older than my 13900K and don't have any stability (or degradation) issues. Why? because Intel choose the greedy path in order to win the race against AMD and clocked their RL CPUs way higher than they are physically able to resist.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
Clearly the TheHeral guy is related to Intel. He is defending Intel's position with completely unreasonable arguments. Hi's that it is normal to get degradation on a frugal 9 month usage on a CPU. This is my first time happening (I am an IT guy and have been assembling my own systems from 1995) that a CPU degrades. I have two 12900Ks too and those are more than a year older than my 13900K and don't have any stability (or degradation) issues. Why? because Intel choose the greedy path in order to win the race against AMD and clocked their RL CPUs way higher than they are physically able to resist.
Im just using your own statements. You said the CPU is stable with intel default settings. Which means that beforehand you weren't using the intel default settings. You were probably running it unlimited and allowed it to draw up to what, 350+ watts? As an IT guy you saw your CPU pull 350 watts and thought that's fine, what could possibly happen? Isn't it obvious to you, with your 30 years of experience, that the chip will die under these circumstances? No?
 

setx

Distinguished
Dec 10, 2014
263
233
19,060
I don't need to measure it, the work has been done the last 5-6 decades.
So, can you show us your work in the last 5-6 decades in the field of electromigration? So that we can see that your words are not empty.

Personally in the oldest my CPUs working now (Westmere, ~14 years old) I don't see any degradation (from the user point of view). Hint: safety margins in properly designed products exist for a reason.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
safety margins in properly designed products exist for a reason.
EXACTLY. Now you are getting it. It's not like your CPU hasn't degraded, it has. But it already was given more voltage than required out of the box to avoid crashes due to said electromigration.

It's not hard to test your CPUs, especially if you have them for so long. You find the lowest voltage that will allow you to run something like ycruncher without crashing, and then you try it again after a year. Now your minimum vcore to pass the test will be higher. Then you try it again in another year. It will be even higher etc.
 

Taslios

Proper
Jul 11, 2024
54
76
110
Not in any measurable way. I mean even if they have to drop clocks, how much? By 100 mhz? 200? Let's say 300 just for the sake. There will be a 3% drop in MT performance by that. It just doesn't matter even if it happens.
a drop is a drop is a drop... Intel is going to be fighting off lawsuits regardless of what they do at this point so anything that can limit exposure is likely good...
 
  • Like
Reactions: bit_user

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
a drop is a drop is a drop... Intel is going to be fighting off lawsuits regardless of what they do at this point so anything that can limit exposure is likely good...
Well I'd argue a drop is not a drop is not a drop. Dropping performance by 50% isn't the same as dropping performance by 2%
 

Geekaycee

Prominent
Dec 28, 2022
15
8
515
I suspect that microcode may not be the "root cause" of anhanced degradation but rather an easy way for intel to ameliorate the symptoms of a much deeper problem.

First of all it needs to be checked, whether undervolted CPUs have been suffering from similar degradation problems as well. If yes, then too high of voltage may not be the root cause. Given the millions of gen 13 and gen14 iCore CPUs that Intel has sold, there should be a sizeable number of users out there that have undervolted their CPUs.
What do the crash reports say about such configured systems?

You can have my case. A 13700KF was installed in January 2023, run under Noctua NH-D15 with stock settings and -0.11V undervolted to avoid thermal throttling from day one. The machine is primarily used for work and a little gaming. Three or four weeks ago I played a bit of gta 5 online and got a crash with the message "out of video memory". Found online info about similar cases where people talk about their Intel cpu's degrading over time. Cute...

Immediately ran Cinebench 2024 and that crashed quite quickly as well. Updated the BIOS to the latest version and tried with Intel default settings. It helped, only now the cpu clocked around 4.6-4.8 GHz on p-cores. After a few hours of manual settings, I was able to run Cinebench 2024 stably with -0.05 V undervolt and per Intel specs on IccMax, PL1 & PL2. Unfortunately, this time the thermal throttling is present.
 

setx

Distinguished
Dec 10, 2014
263
233
19,060
But it already was given more voltage than required out of the box to avoid crashes due to said electromigration.
As usual, you are wrong.
That 1.5x overclock is stable for many years and I've applied minimal overvoltage margin for it originally.

It's not hard to test your CPUs, especially if you have them for so long. You find the lowest voltage that will allow you to run something like ycruncher without crashing, and then you try it again after a year. Now your minimum vcore to pass the test will be higher. Then you try it again in another year. It will be even higher etc.
You obviously never designed any scientific experiment, lol. That would prove absolutely nothing as it doesn't separate motherboard degradation. In fact, there is very high chance that that motherboard will fail within a year. I have plenty of working old CPUs without motherboards...


Returning to the topic: there are plenty of CPUs that are working for 10+ years, even overclocked with mild overvolt. And there are new Intel CPUs that fail after a year on stock. What does that tell us? That new Intel CPUs had design errors. And as user I don't care whatever they failed to account for electromigration or something else.
 
Mar 10, 2020
421
387
5,070
Every cpu gets degraded just by using it. No cpu will be as good as new after even 1 day of usage. Everyone's cpu is degraded no matter what cpu they are using.
You are correct, every time a transistor switches it loses a little performance potential.
The following doesn’t knock Intel.
However, the degradation is not normally to the point something fails (SSDs excepted though they state a lifetime TBW) ram, cpus, chipsets do not generally fail over a period of 5 years, in the case of the 14xxx chips this is less than 2 years.

With regard to the chips that have been used, how many have been damaged, degraded not to the point of failure but significantly aged? That number won’t be known or knowable. Would you feel comfortable using something used and worn to within 1% of failure? Would you feel comfortable buying a used i9/i7 from eBay given that you don’t know its history?

I had no cpu problems over the years, 6502, 68000, 286, 386dx and 387, 486sx, p120, p233 (run for 4 years at 266, 83 MHz bus, Athlon 700, xp, 64x2, Phenom 2 (still running) 8350.. motherboard vrm was weak, the motherboard died, 4790k (still running), 2700x, 3900x current.

In 40 years I haven’t seen this sort of problem with a cpu. I haven’t seen this sort of problem with memory, I still haven’t seen a problem with a 250GB Samsung 840 (tlc) drive I got new when it was current.

For people with i9/7 13 and 14th gen I hope this is a fix and not a band aid. If the fix is to reduce this and that, for example limiting boost, then Intel is breaking promises made when it was sold. These chips were sold on high boost and further boost - thermal velocity boost (when conditions allow).

It would be interesting to know the MTBF numbers for components, both Intel and AMD, hard to find for both.
 
Last edited:
But, it was almost by definition far more rare.
Why, because AMD sells very little on the OEM market? or because AMD caught it sooner? It happened less, but it could happen to anyone and was out of their control until they knew the problem.
You guys' argument is like saying that it's okay for cigarettes to kill people because a certain number of people die from lightning strikes, anyhow, even though the lightning deaths are far more rare. It's a false equivalence.

Not only that, it's classic whataboutism.
No it really isn't. If Intel had happened to catch this early on like AMD was able to (I think it took them weeks to maybe a month before the exact problem was identified and made public how to avoid it entirely) would anyone even be talking about this? In the end Asus' bad response regarding voiding warranty with AMD beta BIOS ran longer than the burning story.
 

bit_user

Titan
Ambassador
We tend to be short on memory where it comes to commonalities such as what happened with the 7xxx Ryzen CPU/mobo when it was first released. I mean, IIRC that was actually a FIRE, and not just chip degradation.
I think you're taking liberties with the facts. There were no fires, as in someone's PC going up in flames.

Also, let's please try to avoid false-equivalences, here. As I've pointed out, only a handful of such incidents are known to have happened and AMD quickly added protections to safeguard against it.
 
Looking at this from the outside. People that got their CPUs degraded can't be reversed to an undegraded status. I have a 13900k that suffered from degradation as after using the system sporadically for 9 months unstability started. I manually configured the power, current and temp limits on the BIOS and also applied an small undervolt. The system is completely stable now. But clearly there was some damage induced previously and that CPU would never go back to be as it was when new and can't also RMA it because under the manually configured parameters works fine (Intel's defaults).
If you've applied an undervolt that isn't Intel defaults and unless your temp limit was over 100C before that isn't either. If it's possible you should run with just the power and current limits, because you should absolutely RMA that CPU if it's unstable. I personally wouldn't trust that CPU to last and as it stands you're running out of warranty time.

I'm sure you've already seen the table, but the one at the bottom is their official guidance: https://www.tomshardware.com/pc-com...ashes-stick-to-intels-official-power-profiles
 

bit_user

Titan
Ambassador
The extrapolation is the thing done in bad faith (not knowing about VID tables) trying to discredit his findings for... Fanboism? Bias? Something.
...or the Intel disinformation machine is running in top gear.

I'd love to see some investigative journalism on that. For people skilled in detecting these disinformation networks, I'll bet it wouldn't be that hard to pick apart whatever Intel might be doing.
 
Mar 10, 2020
421
387
5,070
Do you understand what electromigration is? Well you don't, if you did you wouldn't ask. Every cpu degrades just by being used, even at super safe settings.

Higher voltages and amperages accelerate the process but there isn't a cut off, like under this voltage it stops degrading.
Do you know what electromigration is?

Intel does, AMD does, they have been designing complex electronically dense circuits for decades. Do you genuinely believe that they will not adapt the paper designs that they fabricate/present for fabrication to minimise the effects of electromigration?

There is always a minimal amount of damage and yes pushing for the last possible bit of clock speed will exacerbate the problem BUT ITS A KNOWN PROBLEM, run within specs and the device will last years, run slightly out of spec and it should last a little bit less… run excessively out of spec… BOOM.

The problems being discussed in the thread should not be happening. ICs do last for many years, look at your lcd tv, your washing machine, your car, your Casio watch and if run in spec they should and will continue to do so.

For intel, something has gone wrong, whether it is a design bug, a power implementation bug .. whatever.. Intel has a problem that needs addressing.

Arguably chips like the “ks” chips encourage people to push the operational envelope for those devices… push it just a little harder… little more V, make a little more available power to draw… rinse and repeat… and it’s fun getting the extra speed. A new chip shouldn’t be approaching the margins out of the box, they never used to. It was a user’s choice to push the chip into its danger zone.
 

bit_user

Titan
Ambassador
In Intel's case, I believe they quickly took Alder, modified it, did some accelerated (read: not enough) testing and shipped it out the door.
No, I think it wasn't done so hastily. Maybe too aggressively, in terms of hitting high clock speeds.

Here's an entire article on the differences:


They changed:
  • the manufacturing process.
  • the cache sizes, parameters, policies, and algorithms.
  • optimized critical paths in the P-cores to make tighter timing constraints
  • the number of E-core clusters
  • the memory controller
 
Status
Not open for further replies.