News Intel finally announces a solution for CPU crashing errors — claims elevated voltages are the root cause; fix coming by mid-August

Page 8 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
What makes you think there are no restrictions? as already written here, there are three options - air (less than 200W), stationary (about 200W), aio (about 253W).

You have no experience with these motherboards)

Both boards didn’t even know that for KS pl=320W and iccmax=400A.
The aio option runs without any power limits. At least on z mobos. Go try it.
 
Mar 10, 2020
420
384
5,070
If you mean by Intel, I wish.

But at the end of the day does it matter? That's the definition of an ad hominem argument. Does it matter if someone is paid by PR or not? If what he is saying is correct, it's correct, and if it's wrong, it's wrong, whether or not he is getting paid to say it doesn't change the validity (or not) of what someone's saying.
It matters.

The forums are a discussion area for excitable users, enthusiastic amateurs and even engineers. I’d like to see a presence from AMD, Intel, Nvidia et al, direct contact with the companies could be interesting.

As to why it matters, for sake of argument, I am a shill for Arm, I post enthusiastically for Arm, I throw shade on AMD in order to persuade the genuine forum members to buy ARM. Effectively I am advertising ARM.

There are advertising rules regarding comparative advertising, the comparisons must be valid. Someone shilling any brand, and paid by the company for doing so is an employee, this should be disclosed and that person must respect the advertising rules/laws.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
It matters.

The forums are a discussion area for excitable users, enthusiastic amateurs and even engineers. I’d like to see a presence from AMD, Intel, Nvidia et al, direct contact with the companies could be interesting.

As to why it matters, for sake of argument, I am a shill for Arm, I post enthusiastically for Arm, I throw shade on AMD in order to persuade the genuine forum members to buy ARM. Effectively I am advertising ARM.

There are advertising rules regarding comparative advertising, the comparisons must be valid. Someone shilling any brand, and paid by the company for doing so is an employee, this should be disclosed and that person must respect the advertising rules/laws.
If what you are saying about arm is true (supported by data) then I really don't care if you are an employee of an interested party or not. I just want accurate information, where that comes from I really genuinely do not care.

I'd argue that if anything disclosing such information gets on the way of actual data. Cause whatever you say - even if it's completely accurate - will be perceived as you shilling for arm based cpus.
 
Mar 10, 2020
420
384
5,070
If what you are saying about arm is true (supported by data) then I really don't care if you are an employee of an interested party or not. I just want accurate information, where that comes from I really genuinely do not care.

I'd argue that if anything disclosing such information gets on the way of actual data. Cause whatever you say - even if it's completely accurate - will be perceived as you shilling for arm based cpus.
Put simply, in the UK an advert must be identifiable as an advert. An employee making a post promoting their company’s kit is advertising and must disclose the relationship to the company.
 
Well, two general things:
1- Going over 1.5v is bad. Specially if it's for long periods of time. What BZ tried to understand is under what circumstances that would happen, which he didn't have a clear answer. The fact Intel upped the limit in newer specs is weird (in hindsight), but motherboard vendors are not going "out of spec" by pumping over 1.5v even if anyone that has ever run an Intel CPU knows you do not want to pump that much voltage into it.
41:35
2.- This saves money because it would lower the RMA count and image. If they say "ah, yeah; we screwed up the CPU VID tables so we need to recall them" (for example; manufacturing defect is another example) that is more expensive for sure and it implies they'll have to flat out replace full SKU lines. If they tell people they can fix it via microcode by capping voltages and lowering performance, then they'll delay the innevitable and consequences. I thought this would be somewhat obvious.
So you couldn't be bothered to read what I actually said then?

Also what company would ever recall anything they didn't have to unless they were forced by pertinent government regulation?
 
Last edited:
41:35

So you couldn't be bothered to read what I actually said then?
AT that time he described the spec and how they got rid of the "offset" and just flaat out put 1.72v. I don't hear anything in that specific time that goes against anything I wrote for #1.

And if it wasn't clear for #2: CPUs which have been damaged due to the issue, if Intel says "ok, now it is fixed" and people gets degradation down the line outside of warranty, they'll be left with a defective CPU and that is saving Intel money. I don't know how else to phrase this. The ideal scenario is for Intel to just recall all CPUs for people that is unsure if their CPUs are affected, because how can a normal user know? Will they ever know? Will SIs do right by them after applying "the fix"? Will SIs be willing to replace their pre-builts after this? Will Intel play ball with SIs? How will Intel reach out to the media to address this? Etc.

The real people affected by this are not enthusiasts, but "regular" people that are probably getting crashes and gremlins and have no idea why. Those are the people Intel needs to do right by first, since Enthusiasts will be able to RMA with no problems, I'm sure.

Regards.
 
  • Like
Reactions: stuff and nonesense

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
14900 MSRP $549 - $579
14900K MSRP $589 - $599

On the second point, I'm not convinced "When we said This indicates that the CPU is designed to be overclocked, we obviously meant This indicates that the CPU is designed to be overclocked but it won't be stable if you do." is a get out.
How could they ever possibly guarantee you an overclock? Ever? What type of overclock? How many MHz? If it was guaranteed why wouldn't they sell it like that out of the box?

Oc is out of spec and nobody can guarantee you your chip will work with oced settings.
 
  • Like
Reactions: TJ Hooker
Our [extensive] analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor. Intel is delivering a microcode patch...
When Intel are issuing a statement like this, I really don't see that trying to put the blame on the users who have been encountering high failure rates, from home-build individuals to corporations building server farms, mostly with many years of experience, is a sustainable position.

Interestingly, the statement doesn't actually claim that the algorithm itself is incorrect, only that the voltage requests it produces are. If the best number for the algorithm to put out is 42, it's requested that it's written to put out 42, and it puts out 84, that's an error in the microcode. If the best number is 42 but it's requested to put out 84 because somebody thought that was the best number then that's a different kind of mistake. Both situations can be covered by the statement above.
 
  • Like
Reactions: ewjammer
How could they ever possibly guarantee you an overclock? Ever? Oc is out of spec and nobody can guarantee you your chip will work with oced settings.
Then don't sell two different versions of the same chip, charge more for the unlocked one and use the phrase "designed to be overclocked".

And the user in question, DrDocumentum, isn't running at stock settings. They've had to apply an undervolt.
 

Geekaycee

Prominent
Dec 28, 2022
15
8
515
You can have my case. A 13700KF was installed in January 2023, run under Noctua NH-D15 with stock settings and -0.11V undervolted to avoid thermal throttling from day one. The machine is primarily used for work and a little gaming. Three or four weeks ago I played a bit of gta 5 online and got a crash with the message "out of video memory". Found online info about similar cases where people talk about their Intel cpu's degrading over time. Cute...

Immediately ran Cinebench 2024 and that crashed quite quickly as well. Updated the BIOS to the latest version and tried with Intel default settings. It helped, only now the cpu clocked around 4.6-4.8 GHz on p-cores. After a few hours of manual settings, I was able to run Cinebench 2024 stably with -0.05 V undervolt and per Intel specs on IccMax, PL1 & PL2. Unfortunately, this time the thermal throttling is present.

Sent a support query to Intel is this is their reply. They offered a "standard warranty replacement," which means I send the CPU to Intel for examination and then receive a replacement CPU afterward. According to them, the wait time is 7 business days.

What can one say? How can one be without a work computer for at least a week? And then possibly receive another CPU that breaks down after 9 months or sooner. This is not good enough, Intel. In the worst case, people will have to start selling both the CPU AND the motherboard at a big loss for a switch to AMD. This is unacceptable.
 

DrDocumentum

Reputable
Apr 10, 2020
12
20
4,515
My mainboard (MSI PRO Z690-A) prompted me to choose power limits based on cooling solutions. One option was default limits, one was big air cooler, one was AIO (or something similar; been a while since I last looked and resetted to factory settings). Default should be Intel limits. Air was PL1 = PL2 = 190W for my 12700k iirc.

Also, at least for power limits, it seriously isn't hard to find those online. Intel literally gives you both PL1 and PL2 limits for all their CPUs on their website, so you can look them up and adher to them (or not) on your own volition.
https://www.intel.com/content/www/u...-25m-cache-up-to-5-00-ghz/specifications.html
I didn't dig through the datasheet, so no idea if anything else is said in there regarding voltages. But power limits? I'm sorry, if you can't even look up this much, I don't know what to say. It's really not hard, took me all but 5 seconds.

And there are reviews at default power limits...
The CPU microcode asks for certain voltages (power) based on clock speed and thermal headroom. High (or unlimited) values are board imposed "limits" not the actual values the CPU should be using. Intel already acknowledge this fact on the published statements.

Reviews reporting high power usage have been published since those CPUs where released and Intel was very happy with them and do nothing to correct it. Intel endorsed and explained that those high power levels where OK and working as designed.

Explanation here:

View: https://www.youtube.com/watch?v=nrWQLFWbQY8&t=13s
 

Silas Sanchez

Proper
Feb 2, 2024
109
65
160
Do you understand what electromigration is? Well you don't, if you did you wouldn't ask. Every cpu degrades just by being used, even at super safe settings.

Higher voltages and amperages accelerate the process but there isn't a cut off, like under this voltage it stops degrading.
This is absolute rubbish and clearly shows you lack the basic understanding of electrical principles, solid state physics, and the nature of scientific theory.
It is incorrect to say that a cpu degrades just by being used, first of this is more of your unsound generalizations as this is a blanket statement that makes very obvious assumptions, like treating all cpus the same.
You claim to know about logical fallacies but you have made one here, you are committing the fallacy of composition, you are arguing that cpus degrade in a way that affects the user. Just because a device technically degrades everyday it does not follow that it actually degrades on a macroscopic practical level. You have fallen prey to being unable to recognize different definitions. A cpu is not a rubber tire or a battery, you are going to have a hard time explaining how 20year old cpus are still going strong, or why my 20year old cpu has spent its entire life on and spent 20years as a casual gaming rig. Still used to this day and the benchmarks are near the same give or take, the frame rates are near the same. CPUs that are made well dont wear out! According to you a 10yr old cpu that doesn't give errors or crashes has degraded. Fallacious.


The reality is you certainly don't know about electromigration because even the best minds don't. You really don't understand do you? when you shrink down transistors and keep clocks conservative this keeps the power density in check and the current density naturally stays lower.
You are just spreading misinformation and your lack of physics is showing badly.
Another phenomenon is current crowding which happens at the interface of a metal and semiconductor, due to the diff in conductivity the current densities in the metal become a potential problem.


Yes higher voltages accelerate it, as per ohms law a higher current density entails more voltage. In fact you probable don't know about this but what high enough voltages do is deteriorate solid insulators like the gate dialectic and that is a far more plausible explanation than electromigration. So stop, your just coming off as obnoxious.

The final mistake you made here was in your assertion that there is no cut off. This shows a fundamental lack of understanding of the nature of principles and models. At some point a model will break down and no longer be valid and for all practical purposes at some point the effect doesn't happen, this is also thanks to the quantized nature of matter and energy. Its not continuous but peters out in a fuzzy way.

Again, if you understood what electromigration is you wouldn't even be asking. Every conductor on the planet degrades just by using it. Your cpu is a conductor. I don't need to measure it, the work has been done the last 5-6 decades. Maybe more.
Again more of your complete lack of understanding on all things electrical, physics and common sense. No, not all conductors degrade just by using it, the limiting lifespan of many conductors is the insulation which can be up to 100years. Many conductors dont ever experience electromigration, long long before they do the actual copper will have been attacked by oxidation or the system will have been replaced. It is complete unscientific nonsense to claim a conductor degrades just by using it.
Yes a CPU is a conductor, but that is irrelevant as you failed to mention the actual parts that are vulnerable.
"the work has been done the last 5-6 decades" Please just stop, stop! You are making stuff up.


EXACTLY. Now you are getting it. It's not like your CPU hasn't degraded, it has. But it already was given more voltage than required out of the box to avoid crashes due to said electromigration.
Again, you don't even understand the basics of solid state physics and you don't know if electromigration is even a real problem in cpus today. You are literally making stuff up.
 

Silas Sanchez

Proper
Feb 2, 2024
109
65
160
A car dealer asked me how many years I expected to have my car and I told him 20 years. He was shocked. I think chip manufacturers would be shocked if I said I needed my CPU for five years. For them it’s maybe a couple of years tops. No surprise we need 1000w power supplies now. Is it bad engineers making a quick buck by depending on ever increasing voltage to hit faster quotas? Will we ever have smarter, faster chips with less voltage?
Well its all part of what engineers call dancing on the head of the needle. Its a dynamic interplay between voltage, capacitance, frequency, and leakage currents.
The way the logic gates work is, as they shrink down, the frequency naturally increases and the voltage to switch them naturally decreases, but higher frequency require more voltage to be stable but this increases the power density. Keeping frequency down limits progress and keeping voltage to low lets too much leakage currents through thus hitting up against the power density limit. So yeah, a general mess and to progress further we will be forced to spend more on parts, electricity, and things will run hotter. The problem is things like games only use a few cores and that means hitting the power density wall. Stacking transistors is a problem for heat, multi cores are what near 200 cores now but games and editing software cant use them all.
 
  • Like
Reactions: Gururu

35below0

Respectable
Jan 3, 2024
1,727
744
2,090
13600K? Not enough power for sure. Not a fan of E-Cores. 6P+8E would be moving to the same performance for a lot of money. Also, PCI-e lanes and a more powerful platform that will last longer before I want to upgrade again. I do a lot in AE and Premiere Pro, as well as 3D rendering in various apps. Need the CPU power. And I game on the same rig. I considered a 14900k, but I'm still not really upgrading that much, with the exception of single-thread performance and more modern featureset. I like the idea of being able to buy into a high end platform like X299, get a mid-range CPU, and upgrade the CPU years later for a fraction the price.
It outperforms a 12900K, which is also an upgrade over your existing CPU. That was my reasoning.

The power in the 13900/14900K comes with an asterisk, and the 13600K is unaffected by those problems.
It would not have the longevity of the 900s, that is true.

So i would say it does have the power, it just won't keep up with future demands as wells as the i9 CPUs would.

AM5 being tried and tested at this point, it may be the better choice than the next gen Intels.

This is not good enough, Intel. In the worst case, people will have to start selling both the CPU AND the motherboard at a big loss for a switch to AMD. This is unacceptable.
Who are you going to sell a broken chip/mobo to?

13600K is affected by this though? Only pure Alder Lake should be unaffected and the microcode problem can affect even low-power Raptor Lake (not mobile).
13600K and 14600K have not had these issues. i9 and i7 had them, mostly the i9s.
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
This is absolute rubbish and clearly shows you lack the basic understanding of electrical principles, solid state physics, and the nature of scientific theory.
It is incorrect to say that a cpu degrades just by being used, first of this is more of your unsound generalizations as this is a blanket statement that makes very obvious assumptions, like treating all cpus the same.
You claim to know about logical fallacies but you have made one here, you are committing the fallacy of composition, you are arguing that cpus degrade in a way that affects the user. Just because a device technically degrades everyday it does not follow that it actually degrades on a macroscopic practical level. You have fallen prey to being unable to recognize different definitions. A cpu is not a rubber tire or a battery, you are going to have a hard time explaining how 20year old cpus are still going strong, or why my 20year old cpu has spent its entire life on and spent 20years as a casual gaming rig. Still used to this day and the benchmarks are near the same give or take, the frame rates are near the same. CPUs that are made well dont wear out! According to you a 10yr old cpu that doesn't give errors or crashes has degraded. Fallacious.


The reality is you certainly don't know about electromigration because even the best minds don't. You really don't understand do you? when you shrink down transistors and keep clocks conservative this keeps the power density in check and the current density naturally stays lower.
You are just spreading misinformation and your lack of physics is showing badly.
Another phenomenon is current crowding which happens at the interface of a metal and semiconductor, due to the diff in conductivity the current densities in the metal become a potential problem.


Yes higher voltages accelerate it, as per ohms law a higher current density entails more voltage. In fact you probable don't know about this but what high enough voltages do is deteriorate solid insulators like the gate dialectic and that is a far more plausible explanation than electromigration. So stop, your just coming off as obnoxious.

The final mistake you made here was in your assertion that there is no cut off. This shows a fundamental lack of understanding of the nature of principles and models. At some point a model will break down and no longer be valid and for all practical purposes at some point the effect doesn't happen, this is also thanks to the quantized nature of matter and energy. Its not continuous but peters out in a fuzzy way.


Again more of your complete lack of understanding on all things electrical, physics and common sense. No, not all conductors degrade just by using it, the limiting lifespan of many conductors is the insulation which can be up to 100years. Many conductors dont ever experience electromigration, long long before they do the actual copper will have been attacked by oxidation or the system will have been replaced. It is complete unscientific nonsense to claim a conductor degrades just by using it.
Yes a CPU is a conductor, but that is irrelevant as you failed to mention the actual parts that are vulnerable.
"the work has been done the last 5-6 decades" Please just stop, stop! You are making stuff up.



Again, you don't even understand the basics of solid state physics and you don't know if electromigration is even a real problem in cpus today. You are literally making stuff up.
Huge wall of text and what you are basically saying all cpus degrade. Great, that's exactly what I'm saying.
 

Geekaycee

Prominent
Dec 28, 2022
15
8
515
Who are you going to sell a broken chip/mobo to?

Personally, I haven't yet decided what to do with the cpu and the motherboard as I'm figuring out what Intel will or will not do. There is another option involving the online store from which the cpu was bought from, which I haven't yet wanted to explore.

Some people might start selling their cpus/mobos pretty soon. You think this will not happen? Let's watch the second hand marked.
 
Huge wall of text and what you are basically saying all cpus degrade. Great, that's exactly what I'm saying.
A huge wall of text to explain the difference between theoretical infinitesimal degradation and practical degradation. Your initial claim was
No cpu will be as good as new after even 1 day of usage.
Exactly how much performance are you saying a CPU will lose after one day of usage? Because if there's no practical difference between a 1 day old chip and an unused one, it is by definition "as good as new".
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
501
2,060
A huge wall of text to explain the difference between theoretical infinitesimal degradation and practical degradation. Your initial claim was

Exactly how much performance are you saying a CPU will lose after one day of usage? Because if there's no practical difference between a 1 day old chip and an unused one, it is by definition "as good as new".
It doesn't lose performance. It just requires more voltage as time passes.

I don't have the data, you need to ask amd or intel for specifics.
 
I've no idea about the exact number - but I'd assume everyone that was pushing 1.4v into their soc.
The AMD defined maximum safe SOC voltage for Ryzen 5000 is 1.2-1.25V and no motherboard maker pushed SOC voltage further than that. So how is the user overclocking SOC to 1.4V the same as Intel bricking itself with too high a voltage right out of the box?
 
It doesn't lose performance. It just requires more voltage as time passes.

I don't have the data, you need to ask amd or intel for specifics.
This is logical since oxidized copper acts the same as installing a low ohm resistor on the vias. The higher the resistance of the via, the higher the voltage is needed to push through the via and ensure the right ending voltage remains on the other side of the via (in laymen’s terms, think of a resistor as something that will eat up some voltage in exchange for slowing down the current flow). This problem progressively becomes a positive feedback loop in that: oxidation equals > via resistance which equals > voltage required which equals the via experiencing > thermal energy input which equals accelerated copper oxidation which equals > via resistance which equals > voltage required which equals > thermal energy input which equals accelerated copper oxidation which equals etc. etc. etc.
 
Perhaps you mean PBO….?
It actually was EXPO.

"We do know that 1.25V is the recommended safe SoC voltage limit, and we're told that 1.4V and beyond definitely increases the likelihood of the condition occurring. To be clear, running beyond 1.4V doesn't ensure that your chip will burn out, but your odds will increase. Conversely, 1.35V appears to be "safe." Proceed at your own risk, though. [EDIT: AMD has issued a statement, clarifying that it will issue firmwares that limit SoC voltage to 1.3V. As such, this appears to be the maximum safe limit.]"

https://www.tomshardware.com/news/a...use-identified-expo-and-soc-voltages-to-blame
 
Nobody overclocked, the soc was set tot 1.4v just by enabling xmp
No it didn’t, at least in my experience with an Asus motherboard. I can’t say for sure if other motherboard manufacturers pushed SOC voltage further, but technically XMP is overclocking and AMD only rates their CPUs to handle ddr4-3200 so pushing ram speed further is not guaranteed. It is not AMDs fault that motherboard makers ignored AMD’s max safe voltage definitions so that they can advertise their boards as able to achieve ddr4-3800-4000 speeds on even the worst silicon lottery memory controllers.
I have a 32GB 3800mhz cas-14 dual ranked kit on a Ryzen 5950x in an Asus dark hero motherboard and the set SOC voltage under XMP was 1.185V. This was before any of the fried chips started showing up and thus before any “fixes” came about.
 
Last edited:
  • Like
Reactions: stuff and nonesense
Status
Not open for further replies.