If you are sure it's not burned I thrust on you. After all it is normal that a CPU work well for months under heavy overclock and progressively became more and more unstable, what else can be if not a "problem that isn't understood".
That sounds more likely to be thermal paste segregation leading to excessive temps, or perhaps vrm degradation due to a lapse in cooling.
As I said before, I suppose Intel is aware of the operating details of his CPUs better than everyone of us. So if Intel's recommendations does not work, than there is a problem on the CPU (or other component).
May be you have done a fine tuning well calibrated for your specific sample that Intel obviously cannot do for a large batch, but I am also sure that with Intel failsafe the stability should go away whatever, if this not happen than the CPU have a problem.
Intel's failsafe immediately degrades the performance of the CPU and does not directly deal with the apparent underlying issue. If the silicon is degrading, for example, due to too many volts and too much power, why would it be better to apply more volts and power with Intel's failsafe settings? If they fix the stability issue that would be strong evidence that the silicon can take even more of a beating than any of these degradation theorists think. I tried Intel failsafe and sometimes my cores would see over 1.6v. Not for long because that is utterly uncoolable and performance suffered greatly. I don't like to see over 1.4v on my 13900kf and I can't cool that much anyways.
I just want for people to get good performance and stability and Intel's failsafe is incapable of delivering both.
I am just curious, for how many hours do you stress tested your CPU with full load on all cores ?
Not more than 15 minutes for any particular test. That is how statistical bell curves work. If I have a failure in any program that program gets into my testing list. That hours long testing is just a waste of time that is hard on the components of your system. Why would I pump 300w into my room for hours, have the ambient temp go to 45c and think that because the cooling capacity has been decreased that my system is unstable when the ambient temps are 25c? If an adjustment to a known stable system is unstable then that adjustment has an issue.
As said before, really do you think to be smarter than Intel's engineers ?
I do not want to be offensive, it is simply illogical.
I don't have to be smarter than Intel's engineers to not agree with the fix given to the lowest common denominator of pc users in terms of being able to diagnose and fix their pc issues. I just have to be smarter than the least smart among us.