News Intel announces an extra two years of warranty for its chips amid crashing and instability issues — longer warranty applies to 13th- and 14th-Gen C...

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Well obviously amd didn't discover and fix the issue that's why both their zen 3 and zen 4 cpus fail at an alarming rate compared to 12,13 and 14th gen.

For some reason GN didn't think it was important to show that graph on his video, he just very quickly skipped over it at the 45 minute mark. Literally - blink and youll miss it,lol. Really makes me wonder why, do you have any ideas?
nope, only the 11th gen intel is failing at an alarming rate in that whole graph, 2% and 4% from a single builder is well below alarming rate by any means, and as previously said, if that low failure rate is just simply sticking to intel spec setting but intel didn't made those to be default after the whole 13th gen period? that means they deliberately left those unsafe profiles on, and failure rate have to be viewed as general consumers who didn't know to do all those tweaking folloing that hidden intel spec book, it is still that the CPU failing in the wild that matters, not Puget

For GN, they may have their agenda, so as all those dodgy intel claims following this whole mess of a drama, but if GN overblown and let everyone knows there is a trouble and makes those now default bios at the EOL of the LGA 1700, it is a good thing, at least consumers know the issue and know who they should find when their hard earned money got them a faulty machine, meanwhile if Intel downplay by those dodge BS, it is the consumers suffering, if nobody picks this out in a dramatic way, this will still be "go find asus/gigabyte/MSI/nvidia and the SW developer for your crashes, not us"
 
  • Like
Reactions: helper800
I hate to ask a novice-type question, but what does this mean for consumers using the i5 chips like the 14600K? I've read about the MB manufacturers setting the limits too high out of the box, but still have some difficulty determining if this is an Intel problem or MB issue.

Is Intel stating that their chips are deficient or that, given the settings of MBs, they are likely to become damaged.

My only experience with this is the entry level gaming PC I built for my son using the i5-14600K and an MSI Pro Z-790A WiFi MB. When posting the first time, the MB wanted the cooling mechanism and then set the Bios based upon that response. I chose the basic cooler, but wound up using a box cooler. I did go back in later, after updating the Bios, to lower the voltages to the chip based on recommendations from users here to address the MB default settings.

I guess the stupid question is: what should an average user, who either doesn't OC or does it minimally, do?
IMO when needing to specifically go into BIOs and dig around those limits inside one of the 3 profiles intel supplied is Intel's problem, in my 30 years of PC building this is literally the first ever I saw a CPU generation have 3 official profiles and none of them being enforced as the safe and in spec performance default.

And there isn't a hundred of major motherboard makers out there, it is intel's liability to make sure that the board partners use what they are sure is a safe limit and enforce that, and let them do their own tweaks at OC with a warning message, it happens that way for decades.

AMD also failed for the melting X3D but the reacted soon enough to not damage the lineup reputation...


Back to the question: Nobody till now knows exactly what will be safe for the lineup, but with the latest knowledge kudos to all the geeks out there even go and probe the voltage profile to test out, it seems that updating to the latest bios should have a intel profile as default, set that and go inside to find the voltage limiter in your bios and hard cap that at 1.40v should be what you could do at best now. I bet intel themselves can't really be sure what actually caused all these problems or they will come out and announce it loud, it more likely is they are still guessing
 
nope, only the 11th gen intel is failing at an alarming rate in that whole graph, 2% and 4% from a single builder is well below alarming rate by any means, and as previously said, if that low failure rate is just simply sticking to intel spec setting but intel didn't made those to be default after the whole 13th gen period? that means they deliberately left those unsafe profiles on, and failure rate have to be viewed as general consumers who didn't know to do all those tweaking folloing that hidden intel spec book, it is still that the CPU failing in the wild that matters, not Puget

For GN, they may have their agenda, so as all those dodgy intel claims following this whole mess of a drama, but if GN overblown and let everyone knows there is a trouble and makes those now default bios at the EOL of the LGA 1700, it is a good thing, at least consumers know the issue and know who they should find when their hard earned money got them a faulty machine, meanwhile if Intel downplay by those dodge BS, it is the consumers suffering, if nobody picks this out in a dramatic way, this will still be "go find asus/gigabyte/MSI/nvidia and the SW developer for your crashes, not us"
Don't you think that GN - within all this drama - should pinpoint exactly what the problem is, instead of just quickly scrolling over the data that clearly shows that intel defaults solve the issue? Don't you think that by hiding that data on purpose it misleads people into buying the wrong CPUs when all they care about is stability? I personally missed the graph when I watched the video, I read about it on reddit. Imagine me buying a zen 3 or zen 4 CPU cause "intel is unstable" only to find out that amd is even more unstable. Even at default settings.


Techtubers = just doing everything for clicks. I'm not suggesting he has any affiliation with amd, but what he did was just incredibly scummy, and it's highly ironic cause all of his videos are basically him complaining about other companies being scummy.
 
intel profile as default, set that and go inside to find the voltage limiter in your bios and hard cap that at 1.40v should be what you could do at best now. I bet intel themselves can't really be sure what actually caused all these problems or they will come out and announce it loud, it more likely is they are still guessing
Ι've recommended that before it became popular by buildzoid. Its called IA VR limit. We used to use it for extreme single core overclocks (pushing 1.6+ volts) but it's also nice to keep the cpu from killing itself. 1.4v is still excessive though, 1.3 - 1.35v is what i'd use for a 24/7 PC.
 
Ι've recommended that before it became popular by buildzoid. Its called IA VR limit. We used to use it for extreme single core overclocks (pushing 1.6+ volts) but it's also nice to keep the cpu from killing itself. 1.4v is still excessive though, 1.3 - 1.35v is what i'd use for a 24/7 PC.
Yet you don't know if 1.4 or 1.35 or 1.1v is the real safe margin, yeah it makes complete sense as the VID by default gets most i9 to 1.45v+ for single core boost, you are not Intel nor you published those data claiming that is what is really safe with guarantee, nor do buildzoid, so, any logical customer would and should assume what intel and the board suggested by default was what is bloody SAFE, you do know that with that hard cap at 1.35v quite a lot of 13900 and 14900k can't get their max advertised speed don't you? and as you said earlier, it is reasonable for non-overclocked, adequately cooled PC should last for close to 10 years.

Don't you think that GN - within all this drama - should pinpoint exactly what the problem is, instead of just quickly scrolling over the data that clearly shows that intel defaults solve the issue? Don't you think that by hiding that data on purpose it misleads people into buying the wrong CPUs when all they care about is stability? I personally missed the graph when I watched the video, I read about it on reddit. Imagine me buying a zen 3 or zen 4 CPU cause "intel is unstable" only to find out that amd is even more unstable. Even at default settings.


Techtubers = just doing everything for clicks. I'm not suggesting he has any affiliation with amd, but what he did was just incredibly scummy, and it's highly ironic cause all of his videos are basically him complaining about other companies being scummy.
GN did bash the X3D burning and the 12VHPWR plug for Nvidia as hard as they bashed Intel, anyone sensible should know that their market nature is to catch attention, but that doesn't mean that what intel is doing in all those covered postings are anywhere near ok if skipping that single graph is very evil deed.

If it weren't there those techtuber, guess how many more burnt X3D will be accused to be user error and costing them thousands to replace just to burn again? how many default i9 users don't even know what they should do and even intel RMA asks them just to downclock for stability, just to get more unstable for use through time and losing hundreds of hours trouble shooting the issue? how many even will know about the 12VHPWR plug need to be jammed in the socket hard and with the tall cards, leaving some 35mm clearance to the case side panel where most case just don't exist such clearance? without such medias we will still be kicked around left and right and not even get the specs of what to do.

I still remember when I bought the 14900k day 1, pre-ordering it and knowing I would want to undervolt it, so once I got the CPU going up intel website and search about RPL spec, it isn't there, only alder lake available, going google shows the same, only after I think a few months I search again and find all those specs, No I am not suggesting AMD/GN/whoever else is the holy sinless people, but intel is the worst this time (generation) around so they will get my personal distrust/hate you would want to accuse.

We don't necessarily need to agree with the techtubers conclusion, but the exposure of what the big guys where we can't fight of did wrong is important
 
  • Like
Reactions: helper800
Perhaps a solution going forward, intMD specify the voltages and currents to be implemented and not exceeded. Approve the motherboards, allow the use of a label marking the motherboards as approved.
IntMD might have to annoy the motherboard manufacturers by enforcing the limits. Not allowing the motherboard makers to run roughshod over the design specs.

As with GPUs an alternate switchable bios could be implemented allowing free rein, a tell tale fuse on the cpu going pop if this is activated (bye bye rma).

This might limit a particular board’s flexibility.. overclocks, under volts etc. etc. on the approved bios but allow tinkering as a conscious act, switching to an alternate “open” bios maybe being a 2 or 3 step act.
 
Perhaps a solution going forward, intMD specify the voltages and currents to be implemented and not exceeded. Approve the motherboards, allow the use of a label marking the motherboards as approved.
IntMD might have to annoy the motherboard manufacturers by enforcing the limits. Not allowing the motherboard makers to run roughshod over the design specs.

As with GPUs an alternate switchable bios could be implemented allowing free rein, a tell tale fuse on the cpu going pop if this is activated (bye bye rma).

This might limit a particular board’s flexibility.. overclocks, under volts etc. etc. on the approved bios but allow tinkering as a conscious act, switching to an alternate “open” bios maybe being a 2 or 3 step act.
Your conclusion is the biggest point of contention which some users in here are conveniently ignoring: the voltages in the case of AMD were OUT OF SPEC as per AMD's own admission and fixed the issue with motherboard makers while officially telling all affected owners "we got you fam", but the issue with Intel is that the voltages ARE IN SPEC as per their own guidance and the real problem, as Intel's own wording, is they screwed up the VID for the CPUs as they were allowing them to request too much voltage (keep in mind, they're always in spec, as per their own documentation). There is only speculation as to HOW MUCH voltage can kill a Raptor Lake part, but one thing is certain: Intel has said they'll issue a patch (after several others in the past) to try and correct the issue.

This is one of those situations where Intel is not sorry about the mistake, but sorry they got caught and had to explain it openly instead of keeping it "hush hush". Remember the first reports started pouring down around 2022.

I guess this will be my last post on the topic as it's just become a circlejerk for the people playing tough devil's advocate (not your case, SnN).

Regards.
 
I said from the beginning, pretty clearly, that if I used intels default (the same way puget does, they even list their settings in their CPU reviews btw, you can basically copy them) I should avoid 11th gen and everything amd like the plague, and go for 12 13 14th if stability is important. What part do you disagree with specifically?
Ryzen 7000 has the second best field failure rate of all of the tested chips.

Out of 7 tested processors the field failure rankings are:

Best) Intel 12th
2nd) Ryzen 7000
3rd) Intel 10th
4th) Intel 14th
5th) Intel 13th
6th) Ryzen 5000
7th) Intel 11th

The 11th field failure rate has basically gone down over time to the failure rate of the 10th generation (less than 1 per year), though maybe this is because those machines are starting to get retired anyway. But the 13th and 14th generation field failure rates are still going at the clip of the infamous 11th generation, so their placement in the above rankings is likely to fall.

Maybe Ryzen 7000 will also continue to increase, but it's already better than Intel 14th by a decent two places, so is still probably a better bet than Intel 14th. It may fail early at a higher rate than Intel 14th (aka "shop failure"), but that's basically almost a DOA RMA. Once you've used it for a while it should be more likely to be stable than an Intel 13th or 14th, at least until the microcode update.
 
Ryzen 7000 has the second best field failure rate of all of the tested chips.

Out of 7 tested processors the field failure rankings are:

Best) Intel 12th
2nd) Ryzen 7000
3rd) Intel 10th
4th) Intel 14th
5th) Intel 13th
6th) Ryzen 5000
7th) Intel 11th

The 11th field failure rate has basically gone down over time to the failure rate of the 10th generation (less than 1 per year), though maybe this is because those machines are starting to get retired anyway. But the 13th and 14th generation field failure rates are still going at the clip of the infamous 11th generation, so their placement in the above rankings is likely to fall.

Maybe Ryzen 7000 will also continue to increase, but it's already better than Intel 14th by a decent two places, so is still probably a better bet than Intel 14th. It may fail early at a higher rate than Intel 14th (aka "shop failure"), but that's basically almost a DOA RMA. Once you've used it for a while it should be more likely to be stable than an Intel 13th or 14th, at least until the microcode update.
Legitimate question, how is that a good thing?

For the end consumer, you know, the guy that buys a chip off the shelf - field + shop failure rate is the same thing. In fact it's scary that a bigger percentage of zen 4 (compared to zen 3) don't even get a chance to degrade, they just ship completely duds straight out of amd. Is 4-5% failure rate better than 2% failure rate nowadays?
 
This is all abit strange my 13900k dose not crash in games and runs fine. Kinda panicing at moment with all this stuff going on maybe should move to AMD in future
If you are happy at the moment then don’t worry.

If your pc becomes outdated or if your cpu dies (outside of the extended warranty) whichever is first then buy what is good for you at that time.
Neither intel or AMD are significantly better overall today. AMD win some battles, Intel wins other battles.
 
Legitimate question, how is that a good thing?
By that you mean "DOA RMA"? Because you aren't actually using it as your daily driver yet, hopefully. I mean it would be equally as bad if this was your replacement for a broken computer, but presumably most people have a functioning computer that they are still using and just purchased this processor for a non-urgent upgrade.

Once it's your daily driver a failure is far more serious.
For the end consumer, you know, the guy that buys a chip off the shelf - field + shop failure rate is the same thing. In fact it's scary that a bigger percentage of zen 4 (compared to zen 3) don't even get a chance to degrade, they just ship completely duds straight out of amd. Is 4-5% failure rate better than 2% failure rate nowadays?
I'm kind of curious what Puget systems considers a failure to be. But yes, I agree, basically all of these rates look elevated to me. I'd expect modern manufacturing quality control to have less that one in a thousand failure rates, even on complex electronics. At least for a company as experienced as Puget.
 
IMO when needing to specifically go into BIOs and dig around those limits inside one of the 3 profiles intel supplied is Intel's problem, in my 30 years of PC building this is literally the first ever I saw a CPU generation have 3 official profiles and none of them being enforced as the safe and in spec performance default.

And there isn't a hundred of major motherboard makers out there, it is intel's liability to make sure that the board partners use what they are sure is a safe limit and enforce that, and let them do their own tweaks at OC with a warning message, it happens that way for decades.

AMD also failed for the melting X3D but the reacted soon enough to not damage the lineup reputation...


Back to the question: Nobody till now knows exactly what will be safe for the lineup, but with the latest knowledge kudos to all the geeks out there even go and probe the voltage profile to test out, it seems that updating to the latest bios should have a intel profile as default, set that and go inside to find the voltage limiter in your bios and hard cap that at 1.40v should be what you could do at best now. I bet intel themselves can't really be sure what actually caused all these problems or they will come out and announce it loud, it more likely is they are still guessing
This is what I wound up doing. I think I set the voltage to 1.20v, tho. I ran CineBench23 a few times just changing the voltages from 1.10v up to 1.5v. The higher voltage created more heat - I still just barely even got it to the upper 80C's, tho. Yet performance wise the 1.20v I kept it on provided nearly the same excellent performance as the higher voltages with -8C drop to 79-80C. I thought that was very reasonable. The first or second bios update had the settings set even lower - I can't recall exactly what they were, but probably 1.00v to 1.15v, but performance was down, as was heat still further.

I imagine that the new settings in the new Bios's that have come out are the result of Intel requiring the MB manufacturers to put the in-spec profiles as the default; but for someone like myself, who has only a decent knowledge building a PC and very little knowledge of anything which requires changes to Bios settings, the three profiles put me on alert that I needed to watch things. Just from those settings, the water cooled option was pretty obviously a wide open, restriction free setting that had me thinking - THAT can't be correct.

EDIT: Just a quick note - the PC has never crashed. My son uses it daily. Literally no issues. But I think part of it is that I noticed the three options as the first boot completed and thought that the water cooled option had been selected. So my flags were up from the beginning to find out what that setting was and to change it if I needed to - which I knew I did based on it having no limits on power delivery.
 
Last edited:
I guess the stupid question is: what should an average user, who either doesn't OC or does it minimally, do?
It's not a stupid question, and quite frankly nobody has been given enough information yet to definitively know. Keeping to Intel specifications and the voltage low seem to be the most impactful things an end user can do. Lower power SKUs don't seem to have been affected as badly as the higher ones, but they're still listed as potential failures.

If you aren't having problems with your system I think the most prudent thing to do is wait until the update lands and hope that Intel does provide some way of identifying potential problem chips.
 
For the end consumer, you know, the guy that buys a chip off the shelf - field + shop failure rate is the same thing. In fact it's scary that a bigger percentage of zen 4 (compared to zen 3) don't even get a chance to degrade, they just ship completely duds straight out of amd. Is 4-5% failure rate better than 2% failure rate nowadays?
Here again, we really don't have enough information to interpret that data the way you want to. Since no data was provided about how the AMD failures were distributed in time, we don't know if AMD had early problems with Ryzen 7000 CPUs failing out of the box that were later corrected, whether it's been a constant issue, or maybe there was a bad batch in there somewhere.

As for "shop failures" being scary, I emphatically disagree. Those basically mean you build a PC and it either fails to boot, or it fails during burn-in testing. Either way, there's clear evidence of a problem up front. Once you make it past the front lip of the Bathtub curve, you're left with a system that should be stable and reliable.

640px-Bathtub_curve.svg.png


By contrast, degradation is insidious. You never know "you're in the clear", when it will strike, or whether a given crash or software bug is caused by it. I would absolutely hate that uncertainty, especially if I were a professional user, as most of their customers are. And when it does start to strike, you might end up wasting lots of time before you're able to conclude that it's actually the CPU. That's surely why they had to make that blog post - probably because they were getting swamped by messages from stressed out Raptor Lake users.

So, even if I'm a DIY builder, I'd much rather the failures be as heavily biased towards "shop failures" as possible. Also, that means they'll be covered by not only the manufacturer warranty, but even the return policies of some stores that are less hassle than dealing with the CPU vendor's RMA process.
 
I'm kind of curious what Puget systems considers a failure to be.
Whatever it is (i.e. crash or "won't boot"), it must be something where simply swapping out the CPU resolves the issue. If they have a crash during burn-in testing, swapping the CPU is probably the first thing they try. If it still crashes, then maybe they put the original CPU on a new motherboard and swap that in.

Because the software is a known quantity to them, they don't have that additional element of uncertainty that most of us would have - they can pretty safely assume any failures are hardware failures and it's just a question of finding which component. However, this does cut both ways - if an OS or driver update introduces some slight instability in their burn-in test, would that start showing up as increases in their hardware failure rate? Do they actually go back and re-test components on which a crash was encountered, to make sure the failures are repeatable?

I agree, basically all of these rates look elevated to me. I'd expect modern manufacturing quality control to have less that one in a thousand failure rates, even on complex electronics. At least for a company as experienced as Puget.
Eh, it's hard to say what I'd have guessed, but probably north of 0.1%. If you look at Comet Lake (10th gen), the shop failure rate is indeed significantly less than 1%, so it's clear that such expectations aren't too far out there.
 
Does anyone know if retail boxed processors get better QC than OEM wholesale processors? I could see it happening that OEM is less stringently checked than retail as Intel and the OEM would be counting on OEM QC to find DOAs (e.g. what Puget Systems does). OTOH I can also think of reasons for Intel to use the same, or even higher, quality control on both sets of processors.
 
Whatever it is (i.e. crash or "won't boot"), it must be something where simply swapping out the CPU resolves the issue. If they have a crash during burn-in testing, swapping the CPU is probably the first thing they try. If it still crashes, then maybe they put the original CPU on a new motherboard and swap that in.

Because the software is a known quantity to them, they don't have that additional element of uncertainty that most of us would have - they can pretty safely assume any failures are hardware failures and it's just a question of finding which component. However, this does cut both ways - if an OS or driver update introduces some slight instability in their burn-in test, would that start showing up as increases in their hardware failure rate? Do they actually go back and re-test components on which a crash was encountered, to make sure the failures are repeatable?


Eh, it's hard to say what I'd have guessed, but probably north of 0.1%. If you look at Comet Lake (10th gen), the shop failure rate is indeed significantly less than 1%, so it's clear that such expectations aren't too far out there.
Given my experience in the Systems sections of this forums I have seen more than a handful of confirmed by OP dead on arrival CPUs and am not surprised by Puget Systems stated failure rates. I have also seen about as many if not more CPUs that died after working for more than a few months. I would have guessed CPU failure rates be in the low single digit %s just based off my experience here.
 
Last edited:
Given my experience in the Systems sections of this forums I have seen more than a handful of confirmed by OP dead on arrival CPUs and am not surprised by Puget Systems stated failure rates. I have also seen about as many if not more CPUs that died after working for more than a few months. I would have guessed CPU failure rates be in the low sing digit %s just based off my experience here.
I was kind of thinking that "DOA" failure rates would be higher for hobbyists (especially first timers using things like liquid metal thermal pastes, and without dedicated grounding equipment) than for a commercial systems builder.
 
I was kind of thinking that "DOA" failure rates would be higher for hobbyists (especially first timers using things like liquid metal thermal pastes, and without dedicated grounding equipment) than for a commercial systems builder.
That is also a very good point. We can become very biased based on our own anecdotal evidence especially in a place like the tom's systems forum where people with problems concentrate and may give the impression something is more or less wide spread than it is across the average. A good example of this type of bias that can lead to incorrect conclusion based on bias is in review sites for food or products that only consumers write reviews.

Yelp, for example, may show a restaurant rating as being low rated. This may be because humans perceive things that happen to them that are good or bad in a skewed manner. People on average remember and are more personally impacted by negative experiences than positive. This creates a negative feedback loop. If only people that have bad experiences post reviews for a restaurant, then how useful of a conclusion can you have about how good said restaurant truly is based on its Yelp rating? If only 5% of customers have some sort of bad experience but contribute to 80%+ of the reviews, the review of said restaurant is useless.

Because Tom's is a place where people that may have defective CPUs concentrate, like people with negative experiences as denoted above, it could lead people like me to believe that there are many more failures than there actually are across all CPUs.
 
I was kind of thinking that "DOA" failure rates would be higher for hobbyists (especially first timers using things like liquid metal thermal pastes, and without dedicated grounding equipment) than for a commercial systems builder.
Wrt to grounding, protection diodes on the i/o have been standard for years. While they don’t prevent static damage the mitigation is huge.

Liquid Metal scares me stupid, I’ll stick with conventional paste…
 
More often laptop owners lack knowledge or even interst to trouble shoot, either you know how to do trouble shooting and search on web, or "it just crash/hangs" and ask your geek friend... But this time around say it's a M/B or SSD failure, intel will still be blamed for it coz it's the now famous 14th gen CPU
A little more talk with him, it's a Lenovo and the story sounds a little more like a battery or power supply fail. BestBuy said he should just ship it back to Lenovo, they wouldn't touch it. He has a local shop he'll talk to on Monday, they've done laptop repair on the premises.
 
  • Like
Reactions: slightnitpick
Status
Not open for further replies.