Question Demanding games are crashing to the desktop on my new build ?

Aug 24, 2023
25
4
35
Hello,
I've built a new PC month ago, and I'm currently experiencing issues with it. Demanding games are causing crashes to the desktop, usually accompanied by a "bug report" from the game I was playing.

What's happening is that the game either freezes or the screen turns black for a few milliseconds, and then I'm brought back to the desktop. The Event Viewer sometimes shows an error, but sometimes it doesn't. I don't see any artifacts, and I've been testing it with the GPU at stock settings. The problem seems to occur only while gaming. I've tried multiple 3Dmark stress tests and running the Heaven Benchmark for over 40 minutes - it was fine.
Examples of games I've tested:
  • Witcher 3 on Ray Tracing Ultra preset with DLSS Quality
  • Metro Exodus on Extreme (I used its benchmark for testing). It crashes after around the 5th cycle.
    • This one is causing BSOD sometimes
    • On the other hand I was able to run it once without an error (30 loops...)
  • Guardians of the Galaxy with max settings and RT (here it crashes less frequently)
What seems to (sometimes) help:
  • Lowering the in-game settings (such as turning off RT, applying a frame limit)
  • Undervolting the GPU (although it still crashes sometimes
  • Generally, not pushing the GPU to its full potential (not allowing it to get too hot)
Here is the setup:
  • CPU: Ryzen 5 7600X
  • Motherboard: ASRock B650E PG RIPTIDE
  • RAM: G.Skill Trident Z5 Neo 32 GB (6000MHz, CL30)
  • GPU: EVGA RTX 3070 (used)
  • PSU: MSI MPG A850G PCIE5 850W
What I have tried:
  • Tested the GPU in a different PC- considering the symptoms I was sure the issue was with the GPU, but it seems it's not. I put the RTX 3070 into my old rig (Ryzen 3600, Corsair PSU 650W, 16 GB DDR4). I was unable to crash it once.
    • It ran smoothly through 10 loops of the Metro benchmark (and then 30 additional loops)
    • I also played Witcher for 30 minutes without a single crash.
    • After testing I put it back to new rig and it failed again after few loops…
    • I used HWInfo to log information from sensors on both tries (the GPU Hot Spot and temperature were sometimes bigger in old rig and it did not fail)
  • Tested the CPU
    • I ran Cinebench for 30 minutes with
  • Tested the RAM
    • 4h of Memtest86, 0 errors
  • Run Stress Tests in OCCT
    • VRAM - 30 minutes, no errors
    • PSU - 2x30 minutes, no errors
      • Metro Exodus benchmark run successfully for 30 iterations right after that....
      • ... to fail after 2-3 iterations 2h later after restart with BSOD and computer restart.
  • Checked cable connections in the new rig
    • Made sure the GPU is connected using two separate cables from the PSU.
    • Verified that motherboard and CPU cables are properly inserted.
At this point, I am clueless. I am thinking that maybe the PSU is malfunctioning, but I can’t find any clues in the HWinfo logs. Maybe you guys will find something.
 

SorryBella

Proper
Aug 23, 2023
153
35
120
Tested the RAM
  • 4h of Memtest86, 0 errors
This is fine to check if your RAM is going to run on stock JEDEC speed but it cant really detect minute issue with XMP settings. If you have tried stock speed then nevermind but if you havent try OCCT RAM stress test. There's also TestMem5 but it is finicky to set up just for a quick RAM check.

 
  • Like
Reactions: saperos
Aug 24, 2023
25
4
35
Hey @SorryBella , thanks for taking the time to reply!

I'm really puzzled by this whole situation. Today, I ran the Metro Exodus benchmark three times in a row for 30 loops each. I was testing to see if the problem might be related to the SSD I use for games. So, I moved Metro to my OS SSD and set the Power Limit to 112% (max) in MSI Afterburner to catch any issues faster. Strangely, the problem didn't occur at all during these tests. It completed all 30 loops without any crashes, and the hot spot temperature reached 93.1 degrees C – this was during the first run.

During the second run, I moved Metro back to my gaming SSD while still keeping the power limit at 112%. Again, the problem never occurred, and the hot spot temperature reached 93.1 degrees C.

For the third run, I kept Metro on the gaming SSD but reset MSI Afterburner to its defaults (setting the Power Limit to 100%). Once again, the problem didn't occur, and the hot spot temperature reached 90.1 degrees C.

One thing that's different from yesterday is that I plugged the PC directly into the wall instead of using the extension cord. I'm not sure if it matters; maybe it's just a coincidence. (My old PC was also connected to this cord, and I had no problems with the tests.)

I still plan to run the OCCT RAM Stress test, as the Memtest86 was run with XMP enabled in the BIOS.
 
Aug 24, 2023
25
4
35
@SorryBella
Update: I run the OCCT Memory test for 1h, chose 100% memory, Auto Instruction Set, Auto Threads. I was shocked to come back after an hour seeing 7242 errors... Image: https://drive.google.com/file/d/1Y2Yso_LwE-m4gQaRsa3dxl9CO04oT_ot/view?usp=drive_link

I run it for 10 minutes after that with 80% load, 0 errors

After That I tried 10 minutes with AVX2 Instruction set. Started to fail almost immediately! Image: https://drive.google.com/file/d/1m3OUp6Y8GsernWUFz_CKz2rdVChT-Fr7/view?usp=drive_link
 

SorryBella

Proper
Aug 23, 2023
153
35
120
Update: I run the OCCT Memory test for 1h, chose 100% memory, Auto Instruction Set, Auto Threads. I was shocked to come back after an hour seeing 7242 errors...
Yep, called it. Sounds like your RAM is indeed unstable at the EXPO setting. While the E chipset line have fewer BIOS updates than the normal chipset line, if you havent done an update to AGESA 1.0.0.7 equipped BIOS, id try that first before thinking of a RAM RMA.
 
  • Like
Reactions: 7medd and saperos
Aug 24, 2023
25
4
35
Yep, called it. Sounds like your RAM is indeed unstable at the EXPO setting. While the E chipset line have fewer BIOS updates than the normal chipset line, if you havent done an update to AGESA 1.0.0.7 equipped BIOS, id try that first before thinking of a RAM RMA.
Can this truly be the root cause of my game crashes? I have turned off EXPO in bios, and same avx2 test finished without a single error!

I have recently upgraded bios - running v1.28, latest change log mentions updating AGESA to ComboAM5 1.0.0.7b more info here
 
Aug 24, 2023
25
4
35
@SorryBella looks like my ram is worse than I thought: With Expo turned off, it initially passed 30 min of avx2 test and 30 mins of auto test, but my game crashed again.

I run test again with some additional GPU load in the background (heavens) and test started to generate lots of errors and crashed OCCT...

OCCT: https://drive.google.com/file/d/1p_NskrVH7ap_MbsnqxLstyVWwvByjNw0/view?usp=sharing
Event Viewer Error: https://drive.google.com/file/d/1miCkyKk5L4YB-EVFUS2H7Hu653_h33Ng/view?usp=sharing

Is there anything else I need or I shall proceed with RMA? Looks clearly to me that this ram is broken.
 

SorryBella

Proper
Aug 23, 2023
153
35
120
So it still crashes dead stock? Yeah returning to the shop you bought it from would be a good idea. On all auto voltage it should function atleast on JEDEC speed but it seem even then its very random. The only other thing i could think out of my head is turning off RAM retraining.
 
Aug 24, 2023
25
4
35
@SorryBella - first thank you so much for all the invaluable help you are providing!

And yes, now using SSE instruction set and 95% load it is failing even without the additional load in the background.
You can see timings on the OCCT errors screenshot from my previous message
 
Aug 24, 2023
25
4
35
@SorryBella, apologies to keep tagging you... I thought I will test ram 1 stick at a time. I was expecting only one of them to cause problem, but both are showing memory errors after few minutes of testing, is that possible that both are malfunctioning, or the root problem is still elsewhere...

One thing I want to mention, my 7600x runs in "eco" mode, those settings are applied in BIOS:
 
Last edited:
Aug 24, 2023
25
4
35
Hello @Eximo, thank you for your reply! CPU IMC suggests a CPU problem, do I understand correctly? Can I conduct any other tests to narrow down the list of potential problematic parts?
 

Eximo

Titan
Ambassador
Well if you can get some different DDR5 to test with you can be more certain. But I was saying the CPU could very well be the problem.

AM4 Ryzen chips also had a habit of losing memory channels and only being able to run single channel memory. And these CPUs are constructed similarly with the CPU cores and I/O / Memory controller on a separate chip.

If running JEDEC memory speeds helps, you can certainly do that temporarily.

Just depends on if you want to buy another CPU/Memory to test with or take it to shop to do the testing.

Doesn't rule out the motherboard though.
 
  • Like
Reactions: saperos
Aug 24, 2023
25
4
35
Nothing seems to be working, I tested two sticks together with Expo, without Expo, then 1 stick at a time, ram errors keep showing up (but only in OCCT, I ran Memtest86 without a problem as stated in 1st post)
 
Aug 24, 2023
25
4
35
@Eximo @SorryBella - I performed one final test that I thought of. I moved the single stick to channel 1 from channel 3, but the error flood started in OCCT again in less than 10 minutes from starting. I'm beginning to think there's nothing else to do but to start swapping out hardware... What's your initial guess?
  • If both sticks are failing when inserted one at a time (and tested in different slots), is it possible that both could be damaged?
  • Perhaps it's a faulty CPU (like the CPU IMC issue that @Eximo mentioned).
  • Or could it even be the motherboard?
  • Is it possible that the PSU can be the problem?
I was initially hopeful when I saw those RAM errors, but it seems like once again, anything might be causing them. :(
 
Aug 24, 2023
25
4
35
Hello @kerberos_20, thanks for jumping in!

So far I have tested stick 1 in slot 3 and then stick1 in slot 1.

Additionally tested stick 2 in slot 3

Will test remaining combinations.
 
Aug 24, 2023
25
4
35
@kerberos_20 - I tried and PC is not booting if: (red led is light on Mobo for CPU and RAM)
- 1 stick is in slot 0 (closest to CPU)
- 1 stick in slot 2
- 2 sticks (slots 0 & 2)
 
Aug 24, 2023
25
4
35
try to reseat your CPU
Just finished reseating the CPU - cleaned the paste and applied once again. My hopes were high as I thought that one of the mounting screws for my Thermalright Peerless Assassin cooler was not properly screwed. I gave CPU a spin in Cinebench, it finished stability test without issues.

I run OCCT memory test after that. With 2 sticks (in slots 2 & 4), no EXPO and BIOS on completely stock settings it run for 15 minutes (on Auto, 95% load)
After that I turned on Heavens Benchmark in the background - OCCT test started throwing errors ~10 minutes later....

Is my testing procedure valid? Is it OK to add more load to GPU using heavens?
 
Last edited:
Aug 24, 2023
25
4
35
@kerberos_20, @Eximo I did one final test, I underclocked RAM heavily, to 3600Mhz, it was running for almost 40 minutes with Heavens in the background but started to fail eventually... Is broken RAM my best bet for now? I can order new sticks and test if it helps...

GQJ6m9h.png
 
Aug 24, 2023
25
4
35
@kerberos_20 @Eximo @SorryBella sorry to keep tagging you folks, but at this point I am really desperate :(
While waiting for new RAM I started running every possible test from OCCT and spot that Linpack also throws an error from time to time. What can it mean?

kXnzOjp.png
 

Eximo

Titan
Ambassador
Not sure what the typical error rate is. DDR5 does have a form of ECC built in so it can carry on with minor errors.

Note it down, and see if it still does it when you test with the new memory.
 
  • Like
Reactions: saperos