brotherkill99

Reputable
Feb 2, 2019
7
2
4,515
I dont know what to do anymore.

System specifications:
-Graphics card: ASUS TUF GeForce RTX 4070 OC
-Processor: AMD Ryzen 7 5800x
-Motherboard: Gigabtye B550 AORUS ELITE
-RAM: G.Skill Trident Z RGB 16GB 4000Mhz
-Storage: Gigabtye Aorus 1TB, Samsung 870 Evo 1TB
-Cooler: Arctic Liquid Freezer II 240mm
-Powersupply: Corsair RM650x

Everything is running fine, Temperatures are fine, Stress tests were done on all of the components and Id like to add that based on everything Ive written down bellow I have not found a repeatable sequence of crashes, each time something was different.

So before I get into this I would like to address a bit of past events. I first encountered this issue about a year or a year and a half ago. After a while after I purchased the PC (yes it was preowned, at that time PC had a Gigabyte eagle 3070) I noticed that my RAM was not running at advertised 4000Mhz but at stock meaning XMP was disabled, so I went ahead into BIOS and turned the XMP on, at first the PC was running okay for a while but it started crashing, meaning it was unstable at 4000Mhz and after some research I noticed that other people have the same issue and AMD runs at best at 3600Mhz. So I settled for 3600Mhz and it was fine but after a while it started crashing so I turned XMP off, reset BIOS yet the issue was still there. Back then the it was still a Blue screen of death and not a black one.

After not being able to have a stable system, every time it would go under load a crash occurred. I was troubleshooting the pc for hours, tried reseating RAM, reset BIOS(including taking out the battery), complete reinstall of windows multiple times(even tried with windows 10, cant remember if I was running 11 or 10 at the time), multiple toggling of settings and bios settings, changing out RAM, graphics card...nothing helped and after a couple of days of troubleshooting it started to work and not crash , the issue was gone just like that on its own, but . . . . .

After a couple months diablo 4 came out, tried playing it, worked fine for a few days then boom a black screen of death, after that constant black screens of death and it didnt stay specific to diablo 4 anymore, started happening everytime there was a slight load on the system, the crashes were random (sometimes it could run for a few hours, sometimes a few minutes). And once again I repeated all of the troubleshooting steps that I stated earlier, yet this time I updated the bios to the lastest version, updated graphics card bios, nothing helped.

Then tried changing out both the graphics card(to an old gtx card) and RAM(to a 1x8GB) which resulted in no crash, so I tried it out with my ram too and there was no crash again. This led me to believing there could be something wrong with my graphics card (some of the black screens also had artifacts), so I decided to repaste it which resulted in better temps yet it didnt help with the crashes. And after days of troubleshooting boom issue disappears on its own again and months go by again(sometime in the between Ive also turned XMP back on and set it to 3600, was running fine)...In the meantime I changed out the graphics card to a 4070 incase the old 3070 could be faulty and it was running fine to this day and here I am with the issue again, black screen of death and nothing more, every time I turn any of the games on It happens, reinstalled windows, reset BIOS settings, changed out ram same thing. At this point I dont even know anymore.

Im glad to get any and every advice I can get. I was specaulating that it could be RAM but after changing it only thing I can specalute is that it could be either the powersupply(which seems to run fine) or faulty main drive(that I just tested, had no errors, runs at advertised speed). Some of the things I could try that I have in mind would be to try to reinstall a windows 10 edition on my second drive(incase there was a bug with this specific combination of hardware on win11), which would clear my main ssd and the win11 if it crashed.

Im desperate please help me.
 
Last edited:
EXACTLY which DIMM slots are your memory modules installed in? Starting at the CPU socket and working towards the edge of the motherboard, 1, 2, 3, 4, with 4 being closest to the edge of the motherboard.

What is the EXACT model of your memory kit? If you do not know, you can easily find out by doing this.

Open command prompt "as administrator" by right clicking on the command prompt link and selecting open as administrator.

Then, in command prompt, copy and paste the following string, then hit enter:

wmic MEMORYCHIP get BankLabel, DeviceLocator, MemoryType, TypeDetail, Capacity, Speed

Also, we need to know EXACTLY what motherboard BIOS version is currently installed? Please don't say "the latest", because about 75% of the time somebody says that, it isn't. Please give us the actual BIOS version that is currently installed.
 
  • Like
Reactions: helper800
My initial impression is that a 650w PSU, even though it's a nice Corsair RMx, could be insufficient for either of the GPUs you have used in the PC depending on the age and potential sub second peak power draw the GPUs could be asking for. The RAM kit at 4000mghz would not allow the memory controller on the CPU to have a 1:1:1 ratio with the memory clock, infinity fabric clock, and memory controller clocks unless you got a golden chip with the 5800x. The faster kits in XMP could be using between 1.45v and 1.55v which can make them hot enough, 45c plus, that they then become unstable because of the heat. I would try setting the RAM to the default JDEC speeds by taking them out of a DOCP / XMP profile. This will very likely eliminate that as a potential issue.

Are you using a PCIe riser cable?
Do you have the GPU in the top x16 slot, and if so, do you have it set for PCIe 4.0 speeds in the BIOS?
Have you tried removing the Nvidia driver you have with DDU and manually installing the newest one without an internet connection?
 
keep it at JDEC stock speeds for purposes of diagnosing the issue, as I have said already.
I don't see where you said that anywhere. And you while you did say to do that, as "I" already said, they already tried that and still had issues then as well. If they still have had issues with the default JEDEC configuration, then while I agree it's a good idea to keep it there until you get the problem to resolve, that itself is unlikely to be the problem. It's a lot more likely that the problem is incorrect population, incompatible kit or maybe something like incorrectly mounted CPU cooler or bent pins. Or even BIOS version.

Not disagreeing with you, just saying, you didn't already say that. Not clearly at least.

My initial impression is that a 650w PSU, even though it's a nice Corsair RMx, could be insufficient for either of the GPUs you have used in the PC depending on the age and potential sub second peak power draw the GPUs could be asking for. The RAM kit at 4000mghz would not allow the memory controller on the CPU to have a 1:1:1 ratio with the memory clock, infinity fabric clock, and memory controller clocks unless you got a golden chip with the 5800x. The faster kits in XMP could be using between 1.45v and 1.55v which can make them hot enough, 45c plus, that they then become unstable because of the heat. I would try setting the RAM to the default JDEC speeds by taking them out of a DOCP / XMP profile. This will very likely eliminate that as a potential issue.

Are you using a PCIe riser cable?
Do you have the GPU in the top x16 slot, and if so, do you have it set for PCIe 4.0 speeds in the BIOS?
Have you tried removing the Nvidia driver you have with DDU and manually installing the newest one without an internet connection?
 
  • Like
Reactions: helper800

brotherkill99

Reputable
Feb 2, 2019
7
2
4,515
Okay Im going to just reply to the thread and answer both of your questions but before this Im going to tell you about recent findings. As I stated in my original post I said that im going to try to clear the maindrive and windows 11 by installing windows 10 on my secondary drive and attempt it to crash/ (since I didnt do that yet for some reason). I have wiped both disks, installed same edition of WIN 10 on them, installed a game on them. Now PC crashes everytime something more heavy was run off the Gigabyte Nvme SSD(was my main drive before) and later it even crashed while testing with CrystalDiskMark (tried even changing the slot, incase motherboard had a faulty slot), while on the other hand my Samsung SATA SSD didnt crash a single time, it did tests perfectly and everything, the only time it crashed was if I tried to run a game off of my Nvme SSD. According to this my main drive seems to be faulty which would explain me unable to find a logical sequence of crashes, meaning it was probably just dying with the random crashes and months working in-between and now it completely failed. The only weird thing is that reads/write seems to work fine and is performing according to specifications, except the crash after testing with CrystalDiskMark. One thing to support this claim could be the information SMART controller of the Nvme reports, that I thought was glitched, since it seems ridicules, yet isn't impossible since the PC was preowned, also comparing my SATA drive that I had 1-2 years more then the Nvme ( View: https://imgur.com/a/KMo1g9E
) ( View: https://imgur.com/a/GsZhMpE
)
--------------------------------------------------------------------------------------------------------------------------------------
EXACTLY which DIMM slots are your memory modules installed in? Starting at the CPU socket and working towards the edge of the motherboard, 1, 2, 3, 4, with 4 being closest to the edge of the motherboard.

What is the EXACT model of your memory kit? If you do not know, you can easily find out by doing this.

Open command prompt "as administrator" by right clicking on the command prompt link and selecting open as administrator.

Then, in command prompt, copy and paste the following string, then hit enter:

wmic MEMORYCHIP get BankLabel, DeviceLocator, MemoryType, TypeDetail, Capacity, Speed

Also, we need to know EXACTLY what motherboard BIOS version is currently installed? Please don't say "the latest", because about 75% of the time somebody says that, it isn't. Please give us the actual BIOS version that is currently installed.
- DIMM slots 2 and 4 are populated
-Exact memory of kit is Trident Z RGB DDR4-4000 CL18-22-22-42 1.35V, F4-4000C18D-16GTZRB
-Im pretty sure I installed the right version of bios but apparently not. Latest that is currently out for B550 AORUS ELITE (rev. 1.0) which is my model is F17c ( View: https://imgur.com/a/pvjphno
) and the current that I have installed that was the newest at the time of me updating my bios is F15d ( View: https://imgur.com/a/YFGaHP5
) yet this bios appears in B550 AORUS ELITE V2 (rev. 1.0/1.1) ( View: https://imgur.com/a/NqIgas1
) and not in AORUS ELITE (rev. 1.0) meaning I chose the V2 rev 1.0. But its working fine with no issues, might still be a good idea to update it to the latest correct version?
--------------------------------------------------------------------------------------------------------------------------------------
My initial impression is that a 650w PSU, even though it's a nice Corsair RMx, could be insufficient for either of the GPUs you have used in the PC depending on the age and potential sub second peak power draw the GPUs could be asking for. The RAM kit at 4000mghz would not allow the memory controller on the CPU to have a 1:1:1 ratio with the memory clock, infinity fabric clock, and memory controller clocks unless you got a golden chip with the 5800x. The faster kits in XMP could be using between 1.45v and 1.55v which can make them hot enough, 45c plus, that they then become unstable because of the heat. I would try setting the RAM to the default JDEC speeds by taking them out of a DOCP / XMP profile. This will very likely eliminate that as a potential issue.

Are you using a PCIe riser cable?
Do you have the GPU in the top x16 slot, and if so, do you have it set for PCIe 4.0 speeds in the BIOS?
Have you tried removing the Nvidia driver you have with DDU and manually installing the newest one without an internet connection?
-I have not been using XMP during these tests, BIOS is in default settings.
-I'm not using a PCIe riser cable
- GPU is at the most top slot, closest to the CPU.
-I completely reinstalled windows and wiped the drives while testing multiple times.
--------------------------------------------------------------------------------------------------------------------------------------
I don't see where you said that anywhere. And you while you did say to do that, as "I" already said, they already tried that and still had issues then as well. If they still have had issues with the default JEDEC configuration, then while I agree it's a good idea to keep it there until you get the problem to resolve, that itself is unlikely to be the problem. It's a lot more likely that the problem is incorrect population, incompatible kit or maybe something like incorrectly mounted CPU cooler or bent pins. Or even BIOS version.

Not disagreeing with you, just saying, you didn't already say that. Not clearly at least.
-Population is correct, kit is compatible, CPU cooler is correctly mounted with 31 celsius in idle, CPU passes all benchmarks, About BIOS I spoke at the top.
 
Last edited:
  • Like
Reactions: helper800
First thing I'd do is update to F17b. It's been more than two months since that was released so if there were any problems with the fact that it's a beta BIOS it would have been pulled by now. I'm surprised it hasn't been re-released with a stable BIOS version number already.

Then, I'd do a hard reset of the BIOS afterwards. And then, go in and reconfigure any custom settings you need to including enabling XMP but THEN after enabling XMP and restarting, go directly back into the BIOS and set the memory frequency to 3600MT/s because it's not only pointless to try and run it at 4000MT/s because of the infinity fabric penalty you would incur. Then save your settings and exit the BIOS. See how it does.

Technically, your memory kit model is not listed as compatible on the G.Skill website OR on the QVL list for your motherboard, so technically that might (And often IS) be the problem. It is not a matter of it being the "right KIND of memory", as in DDR4, DDR5, etc., nor is it technically a matter of the memory frequency, especially if you manually configure it for 3600MT/s. Usually it is simply a matter of a given board, or board and CPU combination, not likely the composition of a given memory kit due either to the ICs used (Memory chips), the number of ranks, the number of rows, or some aspect of the timings configuration that the memory module wants to run at but the board doesn't like or that the board likes but the memory module doesn't work well with.

And for the record, because it was mis-used twice in this thread and I like for people to know correct terminology when dealing with things, memory is not specified in Mhz. It is specified in MT/s which is mega transfers per second and they are not the same thing.


BIOS Hard Reset procedure

Power off the unit, switch the PSU off and unplug the PSU cord from either the wall or the power supply.

Remove the motherboard CMOS battery for about three to five minutes. In some cases it may be necessary to remove the graphics card to access the CMOS battery.

During that five minutes while the CMOS battery is out of the motherboard, press the power button on the case, continuously, for 15-30 seconds, in order to deplete any residual charge that might be present in the CMOS circuit. After the five minutes is up, reinstall the CMOS battery making sure to insert it with the correct side up just as it came out.

If you had to remove the graphics card you can now reinstall it, but remember to reconnect your power cables if there were any attached to it as well as your display cable.

Now, plug the power supply cable back in, switch the PSU back on and power up the system. It should display the POST screen and the options to enter CMOS/BIOS setup. Enter the bios setup program and reconfigure the boot settings for either the Windows boot manager or for legacy systems, the drive your OS is installed on if necessary.

Save settings and exit. If the system will POST and boot then you can move forward from there including going back into the bios and configuring any other custom settings you may need to configure such as Memory XMP, A-XMP or D.O.C.P profile settings, custom fan profile settings or other specific settings you may have previously had configured that were wiped out by resetting the CMOS.

In some cases it may be necessary when you go into the BIOS after a reset, to load the Optimal default or Default values and then save settings, to actually get the hardware tables to reset in the boot manager.

It is probably also worth mentioning that for anything that might require an attempt to DO a hard reset in the first place, IF the problem is related to a lack of video signal, it is a GOOD IDEA to try a different type of display as many systems will not work properly for some reason with displayport configurations. It is worth trying HDMI if you are having no display or lack of visual ability to enter the BIOS, or no signal messages.

Trying a different monitor as well, if possible, is also a good idea if there is a lack of display. It happens.
 
  • Like
Reactions: helper800
kit is compatible
Kit is NOT technically compatible. It is not listed on the G.Skill memory configurator as compatible, and that is the real killer here, and is also not listed on the QVL list for your motherboard. There are kits that are SIMILAR in model, but when it comes to memory kits even a single digit being different can make the difference between actually working on that board or not. And even when the model is the same, sometimes it still won't work because a lot of these memory manufacturers change the composition of the kit at some point along the way so memory kit "X" six months or a year ago when it was tested on a given board that worked and was listed as compatible THEN might NOT be six months or a year later if the manufacturer decides to change the ICs (Memory chips) used on that module, or makes other changes, and they DO do this from time to time to accommodate changes in what is available to them or what is more cost effective.

Perfect example seen here:



Now that doesn't necessarily mean a kit won't work, but an unlisted kit or one that is but may have had changes to it's composition might require some additional tweaks to the frequency, voltage or timings in order to get them to work. In that event, I'd recommend reading and giving the recommendations at the following link a try. The process for tightening timings or finding configuration settings that don't cause any problems with a kit that doesn't want to play nice is basically the same as for overclocking, but without the increase in configuration frequency.

 
  • Like
Reactions: helper800

brotherkill99

Reputable
Feb 2, 2019
7
2
4,515
Thanks for your help and informating me, Ill do a bios update and everything youve said but before that Id just like to ask you if you think the drive is faulty in this case according to the latest testing I did and the SMART controller information... should I also try doing the same test with both of the ssds again after updating the bios and everything? Thanks again.
 
If there is a problem that exists when the M.2 drive is installed but does not exist when it is not installed, then there has to be a problem with the drive OR you don't have all of the correct drivers installed. If you have not already done so, I would download and install ALL of these drivers which come directly from the support page for your motherboard. Drivers obtained for the integrated components on your motherboard (Chipset, network adapters both LAN and WiFi, Bluetooth, Audio, etc.) should ONLY ever come directly from the board manufacturer. Not from Windows update. Not from some "driver updater". Not from some other website. In some cases it is fine to use the chipset, AIO or graphics drivers from the AMD website since the chipset originates from them anyhow, and in some cases they will actually have a newer driver for the chipset or graphics available than what the board manufacturer has.

First though, let's be sure we are talking about EXACTLY the right board.

This is the B550 Aorus Elite, right? Not the B550 Aorus Elite AX or WiFi or anything else. JUST the Aorus Elite. And it is revision 1.0?
 
  • Like
Reactions: helper800

brotherkill99

Reputable
Feb 2, 2019
7
2
4,515
I have sucesfully updated the bios to F17c, now im doing the hard reset and yes this is B550 Aorus Elite rev 1.0 . While I tested SSDs both were connected to the pc, they were not disconnected.. Both of them had a clean windows installed with a game, default BIOS settings...After running a game on the SATA the pc did not crash, after booting to the Nvme and running the game it crashed, this I tested with other apps too same result. The sata crashed only if I tried to run the game from the Nvme. If the Nvme would be faulty this would explain the random crashes that happened over the time even after eliminating so many components from the list. There are also screenshots of the SMART Controller which could be glitched but points in this direction...It would seem like that the drive was in process of failing but now somewhat failed because the timming of the crashes are not random anymore but happen in a mater of seconds after booting a game. Should I test it in the same way I did after bios update and setting up xmp, if it crashed in the same way, could this confirm that the drive partly failed and is the reason to crashes(partly failed: since It still detected, can be written to and read from sometimes, it crashed after doing a benchmark with CrystalDisk using 4Gib files)?

After Ive done so many troubleshooting steps, changing out the components, now I can finally recreate the crash at the same time at will, doesnt this point out that Its the Nvme?
 
Last edited:
Yes, pretty much if it does this at will with the NVME drive installed, and does not do it with another drive installed, then it almost has to be either the drive or the board. But not 100%. It could also be a bent pin on the CPU that directly affects the lanes used by the M.2 drive.

But I'd likely RMA the drive and go from there.
 
  • Like
Reactions: helper800

brotherkill99

Reputable
Feb 2, 2019
7
2
4,515
Yes, pretty much if it does this at will with the NVME drive installed, and does not do it with another drive installed, then it almost has to be either the drive or the board. But not 100%. It could also be a bent pin on the CPU that directly affects the lanes used by the M.2 drive.

But I'd likely RMA the drive and go from there.
I didnt check with another M.2 drive, but I tried installing the M.2 drive on the second slot on the motherboard same result...The system worked fine before with the M.2 and CPU was not ever taken out of the slot by me, meaning its highly unlikely the pins are bent. After I enable XMP ill do the test again and if it fails the test the same way, ill just replace the drive and go from there as you said...Only time will tell after all. Thanks again for all the help and ill update you once I do the tests again.
 
I don't see where you said that anywhere. And you while you did say to do that, as "I" already said, they already tried that and still had issues then as well. If they still have had issues with the default JEDEC configuration, then while I agree it's a good idea to keep it there until you get the problem to resolve, that itself is unlikely to be the problem. It's a lot more likely that the problem is incorrect population, incompatible kit or maybe something like incorrectly mounted CPU cooler or bent pins. Or even BIOS version.

Not disagreeing with you, just saying, you didn't already say that. Not clearly at least.
I clearly say to keep the RAM and default JDEC speeds, in the quote below, for the purposes of eliminating issues as I detailed above it, though not comprehensive to all potential RAM issues it would help mitigate, in the original quote. I certainly could have been more specific as to what to do, then the purpose of doing that and how it can help for the purposes of fixing the issue. He kept enabling nd then disabling the DOCP / XMP over the time he was intermittently having an unknown issue. I felt it prudent to clarify for the purposes of diagnostics, keep the speeds default. Also, I write mghz out of a poor habit, it's good to have clarity!
I would try setting the RAM to the default JDEC speeds by taking them out of a DOCP / XMP profile. This will very likely eliminate that as a potential issue.
 
  • Like
Reactions: Darkbreeze