[SOLVED] Should I RMA my CPU?

Status
Not open for further replies.
I assembled this system about a year back, with few parts taken from my last system:

CPU: AMD Ryzen 5 3600
MoBo: MSI B450 Gaming Pro Carbon AC
GPU: NVIDIA Gigabyte RTX 2070 Super
Hard disk: Intel 660p
RAM: 2 * 8 GB Corsair LPX 3000
PSU: Corsair VS650 (bought in December 2017)

Since about a month ago, I started having random app crashes (mostly chrome or plex) and BSODs. The app crashes were more frequent, with every 10 minutes, while I would get 2 or 3 BSODs a day.

Mostly the errors were IRQL_NOT_LESS_OR_EQUAL, but often other errors like MEMORY_MANAGEMENT, KERNEL_SECURITY_CHECK_FAILURE, SYSTEM_THREAD_EXCEPTION_NOT_HANDLED etc. All of them would contain ntoskrnl.exe when viewed on bluescreenviewer.

I ran memtest86 for 4 hours with no errors, and also tried running one ram stick at a time, and would continue to get BSODs in each case, so I'm confident RAM is not the problem (I don't think both RAM can get faulty suddenly).

I also tried reinstalling windows, and reinstalling in a different hard disk too, but still got the same errors. The chkdsk command gave no errors either, so I'm confident that there's no issues with the hard disk. I also ran verifier for 24 hours, and in fact there were no bsod during that time (but some app crashes). I also tried running prime95 but without any errors.

I then realized that when I keep my prime95 running, none of my apps crash too, and no blue screens. I would start getting issues as soon as I stop the test. Ofcourse that is not how I want to run my system, so I tried to debug other issues related to CPU. I realized that there were no crashes during high load, but when system got idle, the load would be high. I also changed the power plan from "Ryzen Balanced" to "Windows balanced", and the frequency of crashes decreased. However, there were still crashes which correlated with high voltage of CPU. Finally, I modified my power plan to use a maximum of 99% of my CPU instead of 100%, and no app crashes or bsod at all since 2 days (I verified with Windows Reliability Monitor). Having said that, even though the current solution looks somewhat safe, I would not want to decrease the performance for stability, and also fear that it may start crashing in future again.

Also, it may be caused due to overheating, but I've always only had BSODs, and not system shutting down suddenly. My CPU temperatures go as high as 87 when running stress tests, but often stay around 60-70. Also crashes don't seem to be directly related with stress test, so I feel my CPU is not handling high voltages correctly rather than high temperatures.

I don't have any issues with GPU, so I'm not sure if it is a PSU fault, but I'm not sure if it is a faulty CPU, MoBo or PSU. I don't think any other part could be faulty. Should I get my CPU replaced from warranty? And will it be done, since I don't seem to have any "concrete proof" of a faulty CPU?
 
Solution
" CPU, MoBo or PSU. "

The thing is....out of these three things...I find quite a bit more failures with GPUs and MBs than with CPUs....so I normally tend to shy away from it being the CPU.

Have you checked your PSU voltages? You can use HWInfo. They should remain within 5% of 12,5 and 3.3V at idle and under heavy load.
" CPU, MoBo or PSU. "

The thing is....out of these three things...I find quite a bit more failures with GPUs and MBs than with CPUs....so I normally tend to shy away from it being the CPU.

Have you checked your PSU voltages? You can use HWInfo. They should remain within 5% of 12,5 and 3.3V at idle and under heavy load.
 
Solution

King_V

Illustrious
Ambassador
The PSU could be at issue. It's nearly 3 years old, and it's not a great one.

The Corsair section of this writeup states the following:
The black and gray label VS series units are much better than the older orange and black label VS models, but they are still units you really only want to use with basic use office or internet browsing machines, or in a pinch, maybe a machine with a lower TDP slot powered card. Also, they are not a modern design, having an older group regulated platform which you can find plenty of in depth information about if you do a search for "group regulated power supplies". These are better than any of the units down below in the wall of shame list, and better than the older VS and CX units, but don't assume you that you can simply pair a graphics card that has a 550w recommendation with a 550w VS unit and not have any problems, because in all probability, you will. These units are not meant for use with high demand gaming systems. In a PINCH, for VERY short term use, they will work, but they are not going to last under the rigors of daily gaming loads.
 
I created my bootable installer using Rufus, and I have the latest BIOS. I had tried updated all my drivers, and even rolled back some drivers to use Microsoft ones, but none of that helped. My case is Corsair Carbide Spec-02 Mid-Tower, with one front fan and one back fan apart from stock cpu cooler and gpu fans.

gEKwIsl.png


Most of my voltages are within range. I got these min/max after running some programs and running prime95, and keeping idle for a while. Before making changes to the power settings, the CPU volts would go as high as 1.4 and temperatures would go as high as 86. So avoiding that has stopped crashes for now.
 
I created my bootable installer using Rufus, and I have the latest BIOS. I had tried updated all my drivers, and even rolled back some drivers to use Microsoft ones, but none of that helped. My case is Corsair Carbide Spec-02 Mid-Tower, with one front fan and one back fan apart from stock cpu cooler and gpu fans.

gEKwIsl.png


Most of my voltages are within range. I got these min/max after running some programs and running prime95, and keeping idle for a while. Before making changes to the power settings, the CPU volts would go as high as 1.4 and temperatures would go as high as 86. So avoiding that has stopped crashes for now.
Shouldn't the maximum clocks for the 6 cores be higher than 3527Mhz?
 
Update: I tried new PSU but that did not solve the problem. I also went through a very similar issue that another user had, and decided that it was indeed a CPU issue most likely: https://forums.tomshardware.com/threads/bsod-caused-by-cpu-voltage.3539176/

So I did an RMA, and reset all power settings, and I'm getting no crashes whatsoever with my replacement CPU since last 3-4 days!
 
Status
Not open for further replies.