News Intel finally announces a solution for CPU crashing errors — claims elevated voltages are the root cause; fix coming by mid-August

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

setx

Distinguished
Dec 10, 2014
263
233
19,060
The agesa allowed amd chips to push 1.4+ volts into the soc just by enabling xmp and it turned the cpus into handgrenades.
You are completely incompetent. It wasn't AGESA that pushed dangerous voltages – motherboards applied positive offset to requested voltages or completely ignored them.

Every cpu gets degraded just by using it. No cpu will be as good as new after even 1 day of usage. Everyone's cpu is degraded no matter what cpu they are using.
Admitting that you are wrong is that hard for you?

My Westmere Xeons bought from ali are as good as new and may even live longer than all of us.
 

rluker5

Distinguished
Jun 23, 2014
901
574
19,760
I don't know how well just adjusting the microcode on the CPU to limit the voltage will fix the problem. It likely will also take significant collaboration with motherboard manufacturers to limit the voltage to say 1.5v or 1.4v. Collaboration could include changing bios options for every motherboard or giving guidance for every motherboard.

From: https://skatterbencher.com/2021/11/04/alder-lake-overclocking-whats-new/
"Intel Alder Lake Voltage
While Alder Lake clocking resembles Tiger Lake more than Rocket Lake, in the voltage department things are not quite that similar.

Compared to Tiger Lake, Alder Lake transitions away from using FIVR for the Cores, Ring, and integrated graphics. Instead, power gates are used. However, unlike Rocket Lake some parts of the Alder Lake CPU are powered using a FIVR."

So that leaves the motherboard largely in control of what it delivers in response to the CPU's requests. LLC settings, motherboard "enhancements" with some as default, and on Asus: SVID adjustments can all change the peak voltages by over 100mv without adjusting core clocks or voltages. If anyone has messed with any of these their system may become unstable with a new microcode changing what volts are requested.

For example the low default LLC settings used as a cheap undervolt by most motherboard manufacturers (Buildzoid's explanation) are partly responsible for low thread voltage spikes as volts across the board have to be raised to ensure stability under heavily drooped all core loads, but the volts don't droop under high volt, high clock single core loads. If you lower the 1.5v spikes per the motherboard default settings by lowering everything then the all core loads may become unstable due to vdroop.

Different VRM configurations inherently have different levels of droop which even makes different models from the same vendor different. Motherboard manufacturers also have different features, different dials and different tunings. In motherboard comparisons one usually just sees performance comparisons, sometimes power consumption comparisons, but I've yet to see a CPU maximum voltage spike or voltage behavior comparison. There is likely significant variances.

Seeing as how settings like this vary per motherboard it is a complicated task. Will the microcode update come with a disclaimer to re evaluate any changes from stock anyone may have done?

That is what is likely taking the extra time.



I personally do not have stability issues, have a preference for lower volts most of the time, and value the freedom to control my hardware as I see fit. I used to make my own custom bios settings for my Nvidia GPUs and the risk was worth the reward to me. Right now my bios freedoms for my 13900kf so far exceed what Maxwell or Kepler Bios Tweaker could achieve it isn't even funny. Clearly this isn't working for some people and motherboard manufacturers aren't helping matters, but I like not having some governor capping voltage or power at some red line of chip safety or longevity.

For others, many motherboard bioses have an upper volt cap that you can set while you are waiting for this microcode update. 1.4v should be safe for occasional low core uses and 1.3v should be safe for continuous all core, if you can cool that much (probably could be higher, but I'm being conservative here). You might have to undervolt, lower you vdroop with LLC, and possibly cap single core to 6 for stability with these.
 
  • Like
Reactions: thestryker

rluker5

Distinguished
Jun 23, 2014
901
574
19,760
I suspect that microcode may not be the "root cause" of anhanced degradation but rather an easy way for intel to ameliorate the symptoms of a much deeper problem.

First of all it needs to be checked, whether undervolted CPUs have been suffering from similar degradation problems as well. If yes, then too high of voltage may not be the root cause. Given the millions of gen 13 and gen14 iCore CPUs that Intel has sold, there should be a sizeable number of users out there that have undervolted their CPUs.
What do the crash reports say about such configured systems?
Not all undervolts are equal. Some reduce maximum voltage far more than others. Even a Jufes non undervolt fix of turning off single core boost probably reduces maximum volts more than the majority of undervolted overclocks.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
The scale doesn't matter. I mean let's say for the sake of argument the intel issue is affecting 40% of Intel users vs only 3% for amd users. Does it really matter? It was down to pure luck that their mobo didn't supply 1.4 vsoc.
I wouldn't say it was luck. If the AM5 problem were more prevalent, it would've arisen during pre-release testing. We see a lot of infrequent issues slip through verification, precisely because they're infrequent. However, that means they also tend not to affect a lot of users.

I think the point is that it was a catastrophic problem which was out of the consumer's hands and could kill hardware albeit a significantly simpler one to solve.
But, it was almost by definition far more rare.

You guys' argument is like saying that it's okay for cigarettes to kill people because a certain number of people die from lightning strikes, anyhow, even though the lightning deaths are far more rare. It's a false equivalence.

Not only that, it's classic whataboutism.
 
Last edited:

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
You are completely incompetent. It wasn't AGESA that pushed dangerous voltages – motherboards applied positive offset to requested voltages or completely ignored them.


Admitting that you are wrong is that hard for you?

My Westmere Xeons bought from ali are as good as new and may even live longer than all of us.
Oh really? And how did you measure they are as good as new?
 

bit_user

Titan
Ambassador
Every cpu gets degraded just by using it. No cpu will be as good as new after even 1 day of usage. Everyone's cpu is degraded no matter what cpu they are using.
Not to the same degree as we're seeing here. Yes, a normal CPU might not turbo boost quite as high, after a couple years of usage, but that's shaving off just a couple %. The mitigations people are having to do, once the instability has started, is usually far more impactful. It's no longer performing like the same model of CPU they bought.

Oh really? And how did you measure they are as good as new?
I'm sure they were referring to how it performs.
 
  • Like
Reactions: thestryker
I'm a bit skeptical that just a microcode update can fix this. It will also require motherboard manufactures actually follow certain new levels with regards to voltage. I just hope Intel does right by their customers and replaces every CPU suffering instability due to this issue.
In the past (even just recently - at the beginning of this whole mess), Intel encouraged motherboard manufactures to kinda 'do their own thing,' undoubtably to make sure their chips stayed performance-relevant against the competition. Intel wanted to have their cake and eat it too. Now they're facing angry cake-lovers AND a belly ache!

Don't get me wrong, Intel's P-cores are some of (if not the) best performing performance CPUs available, but Icarus was warned and Icarus didn't listen!
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
Not to the same degree as we're seeing here. Yes, a normal CPU might not turbo boost quite as high, after a couple years of usage, but that's shaving off just a couple %. The mitigations people are having to do, once the instability has started, is usually far more impactful. It's no longer performing like the same model of CPU they bought.


I'm sure they were referring to how it performs.
But same thing with the guy complaining about his cpu being degraded. He said it works fine with intel stock settings. So big nothing burger.
 

punkncat

Polypheme
Ambassador
If we are to believe and accept the article at face value then it would seem that Intel is on track to at least diminish the issue, and are offering replacements for those damaged CPU. I have no idea what kind of hoops a customer would have to jump through, but at least they are giving lip service to making things right.

We tend to be short on memory where it comes to commonalities such as what happened with the 7xxx Ryzen CPU/mobo when it was first released. I mean, IIRC that was actually a FIRE, and not just chip degradation. (edit and apologies- it was pointed out that the correct term for that failure was "burned out", not "caught fire" My apologies for being unclear and incorrect on the issue. Pls see post #146 -P)

The one thing I am not clear on with this issue is exactly what skew are affected? It is only K skew or is it everything i7/i9? None of the i5, even K?
 
Last edited:

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
Oh boy! Looks like Wendell (and crew at L1T) have found a way to query VID on Linux and says it looks like 1.355V is as high as it goes on W680 boards (and these CPUs are still degrading alarmingly fast).

View: https://x.com/tekwendell/status/1815601269492310032


Still a developing story, but if Raptor (and refresh) CPUs are degrading at a max of 1.355V and ~150W on W680 boards, well, I see a recall coming.
Wendell doesn't even know what vid is so.... Grain of salt.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
They are working without errors or slowdowns (and at ~1.5x overclock), so they are as good as new from the user point of view.

Now tell us how you measured that CPUs degrade everyday.
Do you understand what electromigration is? Well you don't, if you did you wouldn't ask. Every cpu degrades just by being used, even at super safe settings.

Higher voltages and amperages accelerate the process but there isn't a cut off, like under this voltage it stops degrading.
 
Oh boy! Looks like Wendell (and crew at L1T) have found a way to query VID on Linux and says it looks like 1.355V is as high as it goes on W680 boards (and these CPUs are still degrading alarmingly fast).

View: https://x.com/tekwendell/status/1815601269492310032


Still a developing story, but if Raptor (and refresh) CPUs are degrading at a max of 1.355V and ~150W on W680 boards, well, I see a recall coming.
To complement:
View: https://www.youtube.com/watch?v=yYfBxmBfq7k


This statement from Intel definitely doesn't pass the sniff test.

Regards.

PS: Do not give clicks to that FrameChasers fella, please. He is a big Asmongold wannabe.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
I did say 'developing story'. I'm also pretty sure he knows EXACTLY what VID is, plus he's got a team working on this with him. Personal attacks are very unbecoming. ;)
Check his Twitter, he really doesn't know what bid is. Or well, he didn't until yesterday when buildzoid told him. It's not a personal attack, it's a fact.
 
I did say 'developing story'. I'm also pretty sure he knows EXACTLY what VID is, plus he's got a team working on this with him. Personal attacks are very unbecoming. ;)
That's what the individual from FrameChasers was parroting and using to justify his "misinformation" shpiel on both GN's Steve and L1Tech's Wendel, which is bananas.

That guy is forever banned in my list.

Regards.
 
What exactly is wrong with what they said? (are saying)
Wendel reported on data he was able to harvest from several data sources which were given to him. Wendel, to this day, has not presumed to even mention a root cause and has only reported on the volume of the failures he is aware and have personally confirmed.

The fella from FrameChasers, because Wendel asked back to BZ whether or not the information he provided via X/Twatter was what he requested (it was not), the narrative has become "no one believe what Wendel said, as he doesn't know what a VID Table is!!!!!!!", which I find really stupid since one and the other have absolutely nothing to do on the data which was presented.

His only point was "it doesn't happen to me, because I have my CPU undervolted! see? total hoax!"

There's so many parallels you can draw to how dumb that extrapolation is, that I won't even humor it. Absolute trash take.

Regards.
 

Taslios

Proper
Jul 11, 2024
54
76
110
So, they think they've found the cause and a solution, but they're going to just let this problem fester for another month before a solution can realistically get into the hands of end users??

Wow, I'm sure glad I don't have a Raptor Lake that's continuing to degrade, in the meantime. You'd think they could at least post some tips for enthusiasts to follow, in order to minimize damage until then. I guess the community is left to follow the advice people have discovered on their own.

I'm guessing they're trying to ride a knife's edge of undervolting, to avoid sacrificing either too much performance or risking instability due to too little voltage. I'll bet the voltage window is really narrow, at those higher frequencies. That's the main reason I can see why it'd take them so long to perfect their solution, before letting anyone else even have a beta version.
The fix will no doubt hurt the performance of the chips. They know AMD is releasing Zen 5 next week.

Even if this is the proper fix. Most reviewers are not going to retest after the release so Intel will at least seem a little better than it actually is for a little bit longer in these reviews.
 
Wendel reported on data he was able to harvest from several data sources which were given to him. Wendel, to this day, has not presumed to even mention a root cause and has only reported on the volume of the failures he is aware and have personally confirmed.

The fella from FrameChasers, because Wendel asked back to BZ whether or not the information he provided via X/Twatter was what he requested (it was not), the narrative has become "no one believe what Wendel said, as he doesn't know what a VID Table is!!!!!!!", which I find really stupid since one and the other have absolutely nothing to do on the data which was presented.

His only point was "it doesn't happen to me, because I have my CPU undervolted! see? total hoax!"

There's so many parallels you can draw to how dumb that extrapolation is, that I won't even humor it. Absolute trash take.

Regards.
So what Wendell reported isn't wrong, but what people are extrapolating is wrong, soooo Wendell is wrong??

(Is this what you are saying?)
 
So what Wendell reported isn't wrong, but what people are extrapolating is wrong, soooo Wendell is wrong??

(Is this what you are saying?)
No. Wendel's information is correct (assuming you do trust his reporting) based on the evidence other sources have been providing, corroborating his points.

The extrapolation is the thing done in bad faith (not knowing about VID tables) trying to discredit his findings for... Fanboism? Bias? Something.

Regards.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
The fix will no doubt hurt the performance of the chips. They know AMD is releasing Zen 5 next week.

Even if this is the proper fix. Most reviewers are not going to retest after the release so Intel will at least seem a little better than it actually is for a little bit longer in these reviews.
Not in any measurable way. I mean even if they have to drop clocks, how much? By 100 mhz? 200? Let's say 300 just for the sake. There will be a 3% drop in MT performance by that. It just doesn't matter even if it happens.
 
No. Wendel's information is correct (assuming you do trust his reporting) based on the evidence other sources have been providing, corroborating his points.

The extrapolation is the thing done in bad faith (not knowing about VID tables) trying to discredit his findings for... Fanboism? Bias? Something.

Regards.
Gotcha.
Yeah, I am fully aware and believe that no one (possibly not even Intel, still) knows what is really going on.
But that doesn't mean that extrapolations can't be made. People are definitely doing it for clicks, but it at least looks like Wendell is trying to dive deeper and get more data.

What are we supposed to do if not extrapolate and try to gain more data?
 
  • Like
Reactions: bit_user
Status
Not open for further replies.