[SOLVED] RAM is faulty, but pc works fine when not under load?

Feb 3, 2021
20
0
10
So I built a new pc (specs will be down below) and it seemed to be working just fine. However when under load (gaming) I would either get a black screen and the pc would become unresponsive OR I would get a blue screen which said something about a service system error and a kernel error. Tried some solutions, didn't work.

So then I did the Windows Memory Diagnostic Tool built into windows, which said my hardware was faulty. Then I used Memtest86 to further test it, and from those tests it became clear my RAM is indeed faulty.

But the strange thing is, Windows 10 is working fine. Also, I can still control the lighting on my RAM. When I get in game, the game runs smoothly, but it isn't until like 5 minutes or so that it starts to crash.

So my question is: How does everything still work fine even though the RAM is clearly faulty? And should I just buy new RAM or could this be software related?

Specs:
Ryzen 5 2600
GTX 1080
16GB Corsair Vengeance Pro RGB (2x8gb) 3200MHz DDR4
Aorus B450 I Pro WiFi (Mini-Itx)
Corsair SF450 PSU (Yes I know it's not a lot, but the system should not draw more than like 300/350 watts so it's fine)
1TB Adata SX8200 Pro
Cooler Master NR200P
 
Solution
This is common behavior. It's not like the WHOLE stick of RAM is faulty. As long as the bad sector(s) don't get referenced, everything works fine.

I'd added a second kit of RAM to my personal machine a while back. Everything working fine (games, everything). Then a month or 2 later, I fired up Folding@Home and the thing would consistently crash after less than 10 minutes. I was confused. I typically ran Prime95 for 12-18 hours to test RAM and had ran it for 19 hours without errors with this new kit installed. Once I'd discovered this fault using F@H, I removed all but the new kit of RAM (2x4GB only) and ran Prime95 again. Got an error after 21 hours of test.

Folding@Home has since been added to my list of initial stress tests...
This is common behavior. It's not like the WHOLE stick of RAM is faulty. As long as the bad sector(s) don't get referenced, everything works fine.

I'd added a second kit of RAM to my personal machine a while back. Everything working fine (games, everything). Then a month or 2 later, I fired up Folding@Home and the thing would consistently crash after less than 10 minutes. I was confused. I typically ran Prime95 for 12-18 hours to test RAM and had ran it for 19 hours without errors with this new kit installed. Once I'd discovered this fault using F@H, I removed all but the new kit of RAM (2x4GB only) and ran Prime95 again. Got an error after 21 hours of test.

Folding@Home has since been added to my list of initial stress tests for RAM and GPUs. Prime95 and Memtest have been removed from my list of initial stress tests and relegated to dinosaur programs that aren't capable of efficiently testing todays massive quantities of system RAM. I use OCCT more now since it has tests for many components in 1 program.
 
Last edited:
Solution
Something to consider: If the error always occurs at one or more specific locations in RAM, then the RAM is almost certainly bad. If system use of RAM does not hit that location until using more RAM, then this would be a reason for not failing when not under load...it would really be not failing due to not consuming enough RAM to hit the "bad" spot(s). You would definitely need new RAM.

On the other hand, if the location of failure is random, and does not occur in the same memory locations (memtest86 will show address of failures, running several times would indicate if the same location is always hit, versus random locations), then there might be some other issue which only shows up as RAM failure. An example would be that if the power rails to the RAM is slightly low, then at lower temperatures and lower power consumption the RAM would be fine, but operations consuming more power and/or producing more heat would cause a random failure. If that is the case, and if you don't have anything unusual for heat removal (e.g., a layer of dust or cables blocking air flow over RAM), then perhaps raising the RAM voltage by a minimal increment would cause the issue to simply go away.

One more consideration: If you are using several sticks of RAM together with interleaved channels, and if otherwise perfectly good sticks of RAM have different latencies, then this could be an issue of timing. Disabling dual channel for example might solve this, although performance would go down.