News Intel denies reports that it identified a root cause for Core i9 crashing issues — investigation continues

TerryLaze · Jun 18, 2024

NinoPino said:
As I said before, I suppose Intel is aware of the operating details of his CPUs better than everyone of us. So if Intel's recommendations does not work, than there is a problem on the CPU (or other component).

What do you think that intel's recommendations are?! Because this topic is about intel saying that they have not found the cause yet.

The table they released only shows the max settings they allow before calling it overclocking, they never said anything about that being a solution.

rluker5 · Jun 18, 2024

TheHerald said:
Around 0.08v is the vdroop on apex z690 + 13900k running CBR23 with everything stock (~330watts). It gets even bigger on prime95 but with the same power draw more or less (im thermal throttled).

I'm running a 13900kf with a Prime Z690 P btw so it makes sense that my vdroop would be worse than yours since you've got better vrms. And the default LLC on my board is 3, which is analogous to the default LLC setting on my Aorus Ultralite Z690i. I had just not ever seen vdroop that big before, probably because my 13900kf uses more power than any of my previous CPUs.

Combine that with the top clocks having the lowest extra volts to give is just a recipe for instability if it isn't addressed. And it pains me to see people recommending increasing volts across all clocks significantly by using Intel's failsafe settings when the top clocks are the problem and that particular problem is only exacerbated by increasing volts and power draw across the board. If the chip is fine at low power draw and only shows instability at high power draw and clocks (I.E. shader compilation in some games) then raising the power draw by raising volts across the board probably isn't the best solution.

rluker5 · Jun 18, 2024

NinoPino said:
If you are sure it's not burned I thrust on you. After all it is normal that a CPU work well for months under heavy overclock and progressively became more and more unstable, what else can be if not a "problem that isn't understood".

That sounds more likely to be thermal paste segregation leading to excessive temps, or perhaps vrm degradation due to a lapse in cooling.

NinoPino said:
As I said before, I suppose Intel is aware of the operating details of his CPUs better than everyone of us. So if Intel's recommendations does not work, than there is a problem on the CPU (or other component).

May be you have done a fine tuning well calibrated for your specific sample that Intel obviously cannot do for a large batch, but I am also sure that with Intel failsafe the stability should go away whatever, if this not happen than the CPU have a problem.

Intel's failsafe immediately degrades the performance of the CPU and does not directly deal with the apparent underlying issue. If the silicon is degrading, for example, due to too many volts and too much power, why would it be better to apply more volts and power with Intel's failsafe settings? If they fix the stability issue that would be strong evidence that the silicon can take even more of a beating than any of these degradation theorists think. I tried Intel failsafe and sometimes my cores would see over 1.6v. Not for long because that is utterly uncoolable and performance suffered greatly. I don't like to see over 1.4v on my 13900kf and I can't cool that much anyways.

I just want for people to get good performance and stability and Intel's failsafe is incapable of delivering both.

NinoPino said:
I am just curious, for how many hours do you stress tested your CPU with full load on all cores ?

Not more than 15 minutes for any particular test. That is how statistical bell curves work. If I have a failure in any program that program gets into my testing list. That hours long testing is just a waste of time that is hard on the components of your system. Why would I pump 300w into my room for hours, have the ambient temp go to 45c and think that because the cooling capacity has been decreased that my system is unstable when the ambient temps are 25c? If an adjustment to a known stable system is unstable then that adjustment has an issue.

NinoPino said:
As said before, really do you think to be smarter than Intel's engineers ?
I do not want to be offensive, it is simply illogical.

I don't have to be smarter than Intel's engineers to not agree with the fix given to the lowest common denominator of pc users in terms of being able to diagnose and fix their pc issues. I just have to be smarter than the least smart among us.

NinoPino · Jun 18, 2024

rluker5 said:
That sounds more likely to be thermal paste segregation leading to excessive temps, or perhaps vrm degradation due to a lapse in cooling.

Yes, this is a possibility as many others problems with cooling, PSU, etc., but I sincerely hope that who change a $600 CPU have already checked all of this.

rluker5 said:
Intel's failsafe immediately degrades the performance of the CPU and does not directly deal with the apparent underlying issue. If the silicon is degrading, for example, due to too many volts and too much power, why would it be better to apply more volts and power with Intel's failsafe settings? If they fix the stability issue that would be strong evidence that the silicon can take even more of a beating than any of these degradation theorists think. I tried Intel failsafe and sometimes my cores would see over 1.6v. Not for long because that is utterly uncoolable and performance suffered greatly. I don't like to see over 1.4v on my 13900kf and I can't cool that much anyways.

I just want for people to get good performance and stability and Intel's failsafe is incapable of delivering both.

The extreme difference from recommended Intel settings and the optimal settings hand tuned by some users that permit great gains on power imho confirm my opinion that Intel released CPUs at overclock limit to gain market, and now pay the dues in form of stability problems.

rluker5 said:
Not more than 15 minutes for any particular test. That is how statistical bell curves work. If I have a failure in any program that program gets into my testing list. That hours long testing is just a waste of time that is hard on the components of your system. Why would I pump 300w into my room for hours, have the ambient temp go to 45c and think that because the cooling capacity has been decreased that my system is unstable when the ambient temps are 25c? If an adjustment to a known stable system is unstable then that adjustment has an issue.

I don't have to be smarter than Intel's engineers to not agree with the fix given to the lowest common denominator of pc users in terms of being able to diagnose and fix their pc issues. I just have to be smarter than the least smart among us.

When I overclock I test stability with many hours of full load stress (all threads 100% load). If it pass an initial 2-3 hour of test, to be 100% sure I leave it running all the night.
15 minutes for my standards are too few to say it is stable.

bit_user · Jun 18, 2024

rluker5 said:
That sounds more likely to be thermal paste segregation leading to excessive temps, or perhaps vrm degradation due to a lapse in cooling.

Thermal paste pump-out or degradation should be resulting in higher temps at idle and lower load, which I'd hope people would've noticed.

Can't simply be VRM degradation, because people swap CPUs and find it's working again.

rluker5 · Jun 18, 2024

NinoPino said:
Yes, this is a possibility as many others problems with cooling, PSU, etc., but I sincerely hope that who change a $600 CPU have already checked all of this.

I would hope that as well, but i9s also come in prebuilts. Not everyone is as observant or as diligent as you.

NinoPino said:
The extreme difference from recommended Intel settings and the optimal settings hand tuned by some users that permit great gains on power imho confirm my opinion that Intel released CPUs at overclock limit to gain market, and now pay the dues in form of stability problems.

On Asus the Intel failsafe SVID is the extreme outlier:

This is on my Prime Z690 P, 13900kf with everything else Asus default.

Compare adding 100+mv to your CPU to these sensible recommendations: https://www.tomshardware.com/pc-com...blame-other-high-end-intel-cpus-also-affected
There theoretically could be a few that have poor enough bins to need those crazy voltages but most, even having issues, should be able to get by with a far less detrimental fix.

NinoPino said:
When I overclock I test stability with many hours of full load stress (all threads 100% load). If it pass an initial 2-3 hour of test, to be 100% sure I leave it running all the night.
15 minutes for my standards are too few to say it is stable.

I have to test several programs as Intel's single core behavior at say 6GHz is pretty far from 5.5GHz all core behavior. Some programs do a lot of popping around between the two if they aren't that heavily threaded. It is kind of hard to predict which ones will detect something, but I have had some game crashes so I toss those games in a CPU limited scenario on my list. Also, I have pushed my cache too far in the past and just one core focused test won't expose that instability, that one seems to come out with ether a variety of uses and/or memory tests. So long as temps have plateaued under load and the CPU has passed my variety of tests, I consider that good enough to reduce my failures to a couple per year in my less stressful use.

rluker5 · Jun 18, 2024

bit_user said:
Thermal paste pump-out or degradation should be resulting in higher temps at idle and lower load, which I'd hope people would've noticed.

Can't simply be VRM degradation, because people swap CPUs and find it's working again.

So if it were CPU degradation, what would be the cause?
I've always heard too much voltage causing electromigration. But if the fix many are pushing is to drastically increase voltages, wouldn't that just make the problem worse if it actually exists? And if it doesn't make the problem worse, but increases stability against crashes, wouldn't that be strong experimental evidence disproving the degradation theory?

Edit: Also Intel CPUs need more volts for the same clocks at higher temps, just like Nvidia GPUs.

DeathRabit · Jul 11, 2024

level1techs hinted that this is memory controller failure due probably silicon degradation - 13900K/KS/KF 14900K/KS/KF data shows that around 30% user of this CPUs get crash in game and % increase overt time on other side older intel CPU and AMD do not have such issue.
Also some server farms use this CPUs and failure rate of this CPU is so bad that companies start recommend swap to AMD R9 7950x to solve issue. Currently one company 3 years support costs for Intel server 1280$ AMD ones 139$ this alone can say all.
Solution run DDR5 Ram at 3400Mhz fix stability issue most time.

https://forum.level1techs.com/t/intels-got-a-problem-13900k-ks-kf-14900k-ks-kf-crashing/213008/1

Search

News Intel denies reports that it identified a root cause for Core i9 crashing issues — investigation continues

TerryLaze

Titan

rluker5

Distinguished

rluker5

Distinguished

NinoPino

Respectable

bit_user

Titan

rluker5

Distinguished

rluker5

Distinguished

DeathRabit

Splendid

TRENDING THREADS

Latest posts

Moderators online

Share this page