[SOLVED] System no longer boots with 4 sticks of RAM

jdk09

Reputable
Oct 27, 2019
25
1
4,535
Hello,

I could use (err, need) some help with a boot issue that I think is RAM related, but I'm not sure. Any help would be much appreciated:

System: Other than the XMP profile, nothing is/was overclocked
Motherboard – Gigabyte Z390 Aorus master
CPU – Intel i9-9900KF
Cooler – Corsair h150i
RAM – Corsair vengeance RGB pro 3600 4x8 GB (XMP enabled)
GPU – OLD – MSI RTX 2070 super (Armor OC) NEW MSI RTX 3080ti gaming X trio
PSU – Seasonic prime ultra gold 850W
Hard drives – Adata XPG SX8200 pro 1TB (boot/OS on this) and Adata XPG SX8100 2TB
Windows 10

I built this system (except for the GPU upgrade) in late 2019. Everything worked well and the system was stable. About 5 days ago I got very lucky and was able to get an RTX 3080ti from a local shop. I put the 3080ti in, no problems, computer runs well. About 3 days ago, after probably ~8 hours of use with the new card I had a sudden crash/restart while in a game. It didn’t freeze or CTD, just a sudden restart and boot failure message. I thought it might have been overheating since the side of the case was pretty hot and I’d been playing for a longer period of time with the new card than I had done to that point. The machine also isn’t in the best location for airflow. On the restart it put me into the BIOS and said there had been a boot failure. I loaded optimized defaults and re-enabled the XMP profile for my RAM. System booted fine, I made a minor adjustment to help airflow for the case, played with some fan curves to cool more aggressively and went on. For the next 2 days no issues at all, I kept an eye on temps and everything stayed low, GPU was only in low-mid 70s C at full load. No issues with CPU temps either. Last night, I shut off the computer (I shut it off every night).

Today, I turn the computer on and it won’t boot. Power starts, motherboard shows me cycles of error codes C2-C4, 2C-4C, CC, eventually settles on C1 then shuts down or restarts. Other times it doesn’t even cycle long it just clicks on and off within 1-2 seconds in an endless loop. Nothing shows on the monitor at all. First I google the error codes, seems it may be memory, though others suggested GPU – which seemed plausible since that’s what I upgraded recently. Most of the codes I saw weren’t in the manual so that didn’t help me. So here’s what I tried (and I know it probably wasn’t the perfect sequence).

1 – Powered off, unplugged machine, returned power and booted – Exact same problem, no change
2 – Unplugged machine, reset CMOS using the handy button Gigabyte gave me, powered back up – Exact same problem again
3 – Powered down, unplugged all peripherals and removed 2 DIMM from A1 and B1, leaving A2 and B2 in place – when I power it up the machine cycles doing memory training and then boots fine, I do not see the “C” error codes at any point in the memory training and motherboard settles on AO after going through it's boot.
4 – So next I swap the DIMM, I remove A2 and B2 and put the original A1 B1 sticks back in A1 and B1 (to hopefully test both the sticks and the slots) – Power machine on and it does memory training and boots fine. So at this point it seems the machine can boot with 2 sticks in either configuration and that I probably don’t have a bad stick of RAM. At this point I make sure my motherboards software is up to date in Windows and run Windows update, just in case. I restart, go to BIOS and load optimized defaults again, just in case. I DO NOT enable the XMP profile.
5 – Power down, put A2 and B2 back in, so back to 4 sticks – and the problem of cycling, C error codes and no monitor input is back, just the same as before.
6 – Remove A1 and B1, computer starts normally
7 – plug peripherals back in, boots fine

So that’s where I’m at, 2 sticks in A2 and B2 and working apparently normal, also works with the other 2 sticks in A1 and B1. Any ideas on why I can no longer run with 4 sticks of RAM? The sticks are all from the same package and had been working since late 2019. The only system change was the new video card, but the problem didn’t develop right after the upgrade, it took a few days. There was one crash that I thought might have left the memory in a weird state or altered a setting, but I’ve reset the BIOS and RAM to default. I don’t think it’s lack of power 850 should be plenty, esp since it’s a boot issue, not problems with GPU under load. Ideas?
 
Solution
With 2 DIMMs installed I went to BIOS, loaded the optimized defaults, left XMP disabled and the settings all on default. So the frequency was at 2133. Works fine with 2, gives me the error codes and can't boot with all 4. The only other frequency option is the XMP frequency of 3600, I don't think it can go lower, am I wrong, is here a way to lower it more?
2133MT/s is the standard minimum for DDR4. If that doesn't work, the chances of anything else working are quite low. You could still try your luck with other profiles just in case you get lucky. As a last-ditch effort, you could try loading the XMP 3600 timings but then lower the memory frequency somewhere in the 2133-2666 range to give the memory the most generous margins...

InvalidError

Titan
Moderator
Boot the PC with only one set of DIMMs installed, load the slowest DIMM timings, put the other DIMMs in and see if that works.

If that fails, something has gone wrong and your system is no longer stable with all DIMMs loaded.

If it works, you can try loading faster profiles until you find the fastest one that still boots.

Nvidia's RTX3070 and up are known to give many PSUs a hard time due to sharp load transients. It is quite possible that the new GPU is generating more noise than your other components can tolerate with fully loaded DIMM slots.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
Well I did try the first thing, loading with 1 set (2 sticks) of DIMM. Doing this tried all 4 sticks, in all 4 slots in various configurations of A1/B1 or A2/B2. Every DIMM combo is stable as a 2 stick setup, but 4 do not work. I just can't guess what has "gone wrong." It worked last night, I turned the PC without anything seeming wrong and then it can't boot today.
 

InvalidError

Titan
Moderator
Well I did try the first thing, loading with 1 set (2 sticks) of DIMM. Doing this tried all 4 sticks, in all 4 slots in various configurations of A1/B1 or A2/B2. Every DIMM combo is stable as a 2 stick setup, but 4 do not work. I just can't guess what has "gone wrong." It worked last night, I turned the PC without anything seeming wrong and then it can't boot today.
Did you try loading the SLOWEST PROFILE using one pair of DIMMs so you can get into BIOS to manually load a profile BEFORE adding the second pair? That would give the setup what should be its highest chance of success.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
With 2 DIMMs installed I went to BIOS, loaded the optimized defaults, left XMP disabled and the settings all on default. So the frequency was at 2133. Works fine with 2, gives me the error codes and can't boot with all 4. The only other frequency option is the XMP frequency of 3600, I don't think it can go lower, am I wrong, is here a way to lower it more?

There is an 'enhanced stability' option under a 'memory enhancement settings' option, but it seemed to me that using the RAMs native settings would be more stable than Gigabyte's tweaking. What do you think?

For what it's worth, after I ran the machine for 15 minutes without issue I did go back and re-enable XMP (still 2 DIMMs) and it's been running fine for a couple hours now at the 3600 frequency with 2 sticks.
 
You have chosen OC RAM at 3600MHz in a (4x8) combination and that is way more than the IMC is officially capable of.
I recommend you return what you have for a kit at 2666MHz, CL16 (2x16) 32MB selected from the MB QVL.
Any XMP profile would not work as OC at 3600MHz would require manual tuning for voltage a Timings.
 

InvalidError

Titan
Moderator
With 2 DIMMs installed I went to BIOS, loaded the optimized defaults, left XMP disabled and the settings all on default. So the frequency was at 2133. Works fine with 2, gives me the error codes and can't boot with all 4. The only other frequency option is the XMP frequency of 3600, I don't think it can go lower, am I wrong, is here a way to lower it more?
2133MT/s is the standard minimum for DDR4. If that doesn't work, the chances of anything else working are quite low. You could still try your luck with other profiles just in case you get lucky. As a last-ditch effort, you could try loading the XMP 3600 timings but then lower the memory frequency somewhere in the 2133-2666 range to give the memory the most generous margins within reason. If even that doesn't work, I doubt there is much that can be done about it besides putting the RTX2070 back in just to verify that the setup still works in its last-known-working form.

You have chosen OC RAM at 3600MHz in a (4x8) combination and that is way more than the IMC is officially capable of.
I recommend you return what you have for a kit at 2666MHz, CL16 (2x16) 32MB selected from the MB QVL.
OP put that system together in 2019, I think he's way past his return window.
 
Solution

Karadjgne

Titan
Ambassador
I'll bet money the issue is none of the above, but a power supply problem. While Seasonic Prime is an excellent psu, and has excellent protective circuitry, it wasn't a thought that they'd have to deal with the massive transient spikes the Ampere cards seem to be capable of, hitting well above any rated TDP levels before the card pulls them back down to normal operating levels. It's those spikes which will trip the protective circuits in the psu.

It's not a design flaw, the psu acts exactly as intended, it's not Seasonic fault the ampere cards spikes are crazy high.
 

InvalidError

Titan
Moderator
OP needs to test his RAM using Memtest86 and if any errors after 4 passes then an RMA is justifiable as RAM has a lifetime warranty.
OP's problem is that his PC quit booting due to a "memory error" after upgrading his GPU from an RTX2070 to an RTX3080, His PC still works fine with either half of the 4x8G kit installed but gets no boot whatsoever with all four installed despite working fine until the GPU upgrade.

For OP to use memtest86, his system would need to be bootable. Currently, it isn't with the RTX3080Ti installed.

The RTX3070 and up are notorious for wreaking havoc on PSUs and causing all sorts of issues with many high-quality PSUs. OP's memory issue is most likely excess noise from the RTX3080Ti making its way to the IMC and/or DIMMs as I have already mentioned in #2, hence my suggestion of checking that everything still works with the RTX2070 in #7.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
Thanks for the ideas. I guess I will switch back to the 2070 tomorrow and see if that fixes the problem. I almost hope it doesn't fix it, with how hard it was to get the 3080ti......If the 3080ti is causing the problem with spikes, why would it have run fine for 4 days before suddenly hitting a brick wall and becoming un-bootable? If switching back to 2070 works, is there a way around the spike problem? Is there a psu out there that handles it better?
 

InvalidError

Titan
Moderator
If the 3080ti is causing the problem with spikes, why would it have run fine for 4 days before suddenly hitting a brick wall and becoming un-bootable?
Components have their greatest parameter drift over their first couple days of operation, so this could basically be the break-in period on the GPU and the PSU needing to cope with noise it had never been exposed to before.

If it was my PC, I'd be really tempted to bodge a bunch of low-ESZ capacitors on a PCIe extension cable (or maybe just ram their leads into the connector's back) and see if that helps smooth things out.

If you are using a single PCIe cable from the PSU to feed the GPU to reduce cable clutter, maybe try to use two separate cables. If you already were using two, maybe try using the one with 2x(6+2) cable instead - not recommended for long-term, just trying to see whether there may be a way to work around the PSU-GPU combo sending the system over the stability cliff by changing net power supply impedance to the GPU.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
Well I tried a few more things, and it isn't any more clear to me. I'm hoping someone much more knowledgeable than me can give me guidance.

Since my last post, I tried (all with XMP disabled):

1 - Changed from using 2 cables from PSU to connect to the 3x 8 pin GPU connection to 3 cables to the 3x 8-pin slots - so one cable for each 8 pin. I also updated my BIOS to most current version and uninstalled an MSI GPU fan tweaking program which I'd installed shortly before the problems began. Didn't think that would be likely to affect boot, but just in case....this resulted in no change, system is fine with 2 sticks of RAM, can't boot with all 4 in, gives same cycling of CC, C2-4, 2C-4, settles on C1 and shuts down. I did notice that on a successful boot with 2 sticks of RAM the first few codes to flash are the same C codes, but they quickly progress to other codes and eventually the AO. So I think that at least tells me that the failure is very early in the boot sequence and I think the cycling is the system re-trying to boot before eventually giving up. The monitor also never gets to the Aorus logo (doesn't have any iput actually) so I guess that's more evidence that the problem is early.

2 - So then I swapped out the video cards and put the 2070 super back. First I booted with 2 sticks of RAM, no problem. Then I put all 4 back in and I got the exact same sequence of error codes and boot failure! I was not expecting this. If it matters, all 4 sticks RGB light up while the computer is trying to boot.

3 - Just for fun I put the 3080ti back in and tried to boot with the 4 sticks and of course got the same boot failure. Took 2 sticks out and system boots fine.

So I noticed that the C codes are a normal part of the boot sequence, the first ones to flash. I'm hoping that helps someone solve the problem.

I also was thinking back to the events leading to the initial failure and remembered a detail that I didn't put in the original post. On Monday, before I had these problems, I had the 3080ti and 4x8 sticks of RAM, running fine. I had a crash while playing a game. At the time I thought the GPU had overheated. The crash was from game straight to black screen and re-boot. It put me directly into BIOS and said there was a boot failure. I clicked on enable XMP and exited. I cannot recall if I actually clicked 'load optimized defaults' or not at that time. But the computer seemed fine after this and I kept using it without a problem for more than a day. Usually I shut the computer down at night, but I didn't that night (Mon) and it was in sleep mode. The next day (Tues) I had no problems, I did shut the computer down that night. Then, Wed morning the computer wouldn't start. So from Mon pm when I had a crash, the next time it had to boot from powered-down state was when this problem started, maybe the crash caused this problem? Or is there a setting I don't know about that may have changed in the crash to allow the machine to only be able to boot with 2 sticks of RAM?

All of this points me away from the 3080ti or the PSU being the direct problem since I couldn't boot with 4 sticks of RAM and the 2070 either, where that had been a stable system for me for a long time. I'm definitely beyond my knowledge of the interplay between components, if anyone can help I'd really appreciate it!
 

InvalidError

Titan
Moderator
At this point, it is down to three possible components:
1- the CPU's IMC having failed in such a way that it can only manage one DIMM per channel on one channel
2- a motherboard problem with the memory VRM or elsewhere causing issues with four DIMMs loaded
3- one of the DIMMs having failed in a way that causes issues when all four DIMMs are installed

Additional tests you can do based on this would be removing one DIMM and see if you get no boot regardless of which combination of 3 out of 4 DIMMs are installed. If that works regardless of which single one DIMM is removed and what channel it is on, then both channels on the CPU are fine, all of the memory is likely fine too and something on the motherboard appears to have gone wrong .

If the PC boots but only when one specific channel has only one DIMM on it regardless of which two DIMMs you try to put in it but you can use the DIMM on either slot on that channel, then the IMC has probably gone weak on that channel and cannot manage two DIMMs anymore, If failure to boot only happens when one specific slot is involved, that could be a broken trace on the motherboard, bad contact in the CPU socket or busted control line from the CPU.

If the PC can boot with two DIMMs on one channel only when one specific DIMM is left out but works with any other combination of two DIMMs on either channel, the DIMM that must be left out for the computer to boot is likely bad.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
I understand most of that, not all, but most. I want to tell you what I've done and see if it was enough to narrow the problem.

I'm going to name the DIMMs from left to right W, X, Y, Z, which have been in slots A1, A2, B1 and B2, respectively. Since these problems began a few days ago I have kept each DIMM only in it's respective slot so I don't get them mixed up. Just now, I removed 1 of the 4 DIMMs and booted, trying all combos of a 3 DIMM setup (but not moving W, X, Y or Z in between slots). I was going to vary that next, but my last attempt scared me.

I moved left to right
-Try 1 excluded DIMM W, leaving slot A1 open - boots fine
-Then removed X, slot A2 is open (W is back in slot A1) - boots fine, but BIOS suggests I reinstall to occupy slot A2 and B2 for dual-channel
-Next I removed Y from slot B1 (A1, A2 and B2 are occupied) - cannot boot, same sequence of error codes and eventual shut down as with prior failures
-Finally, I remove Z from B2, with W, X and Y in their places - First attempt it cycles through more C codes than it has been doing on successful boots, but does get to the Aorus logo, where it hangs for about a minute before I power the system down. I give it a minute and then power up again. This time it boots successfully (without the excessive C codes) but it does a disk check and puts me in auto-repair with a choice of restart or 'advanced options.' I choose restart, it restarts and boots fine, I'm writing this with those 3 sticks in place.

I have a feeling I need to repeat this exercise with the DIMMs put in reverse order, from Z --> W and sequentially exclude one at a time. Is that right?
 

InvalidError

Titan
Moderator
From the look of it, all problematic configurations have one thing in common: A1A2 being occupied by WX.

Next thing I'd try is swapping WX and YZ, see if the behaviours follow the DIMMs. If it does, then the DIMMs or motherboard may be problematic. If the behaviour remains the same, then it may be a weakness in the CPU's IMC on A-channel. If the behaviour is something new, then I'd lean toward the motherboard being the problem.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
I've done a bunch of testing different combinations. Each time I thought I was starting to see a pattern something didn't behave like I predicted. Can anyone else make sense of this? I organized all the attempts I've made here: XMP disables for all of this, and WXYZ are the 'names' for each RAM stick, - means empty:

In order of trial
A1(W) A2(X) B1(Y) B2(Z) – Fail (original configuration when problem started)
A1(-) A2(X) B1(-) B2(Z) - Good
A1(W) A2(-) B1(Y) B2(-) - Good

3 stick
A1(-) A2(X) B1(Y) B2(Z) - Good
A1(W) A2(-) B1(Y) B2(Z) – Good (but BIOS prompted me to reinstall in a better configuration) – I retried this config towards the end of testing and it booted without that BIOS suggestion, so I think that by cancelling the suggestion I told it not to tell me this again, since it should have come up in some other non-ideal configs.
A1(W) A2(X) B1(-) B2(Z) - Fail
A1(W) A2(X) B1(Y) B2(-) – More C codes during boot, hangs on Aorus logo à power down à restart à Disc check but successful boot
A1(-) A2(Z) B1(W) B2(X) - Fail
A1(Y) A2(-) B1(W) B2(X) - Fail
A1(Y) A2(Z) B1(-) B2(X) - Good
A1(Y) A2(Z) B1(W) B2(-) – Good

2 stick (and one 4 stick)
A1(-) A2(-) B1(Y) B2(Z) – Stuck on Aorus logo, error code 64, eventual manual shutdown
A1(Y) A2(Z) B1(-) B2(-) - Good
A1(Y) A2(Z) B1(W) B2(X) - Fail
A1(-) A2(-) B1(W) B2(X) – Fail
A1(W) A2(X) B1(-) B2(-) - Fail
 

InvalidError

Titan
Moderator
Conclusions:
1- all four DIMM slots are working at least as far as one DIMM per channel is concerned
2- both memory channels can handle two DIMMs per channel when using the YZ pair
3- boot fails whenever WX are together on the same memory channel regardless of 2/3/4 total DIMMs

Looks like the problem got narrowed down to the WX pair, everything else checks out. It may be worth trying memtest86 in one of the bootable WX configurations to see whether there may be a bad memory cell causing boot crashes depending on where it lands in the memory map based on memory configuration.

Last thing you could try would be (WY or WZ) and (XY or XZ) pairs to figure out which one of WX causes the pair to fail.

That said, you shouldn't need to play DIMM musical chair to get it to work at default frequency and timings, so I'd say it is time to look into an RMA.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
Thanks for all the interpretation and guidance. My fingertips are sore from flipping all those tabs on the motherboard over and over again. If it seems really likely the problem is RAM, I'll start an RMA. Sort of lucky in a way, definitely the easiest thing to replace.

I'm relieved it doesn't seem to be the the new video card directly, but I do wonder what triggered the problem. Seems to have begun after the first cold boot following a significant crash that triggered a 'boot failure' message. I wonder if something in the RAM went bad, caused the crash, was stable enough to keep going in a post-boot environment, but was then unable to boot from cold. Would that make any sense?
 

InvalidError

Titan
Moderator
I wonder if something in the RAM went bad, caused the crash, was stable enough to keep going in a post-boot environment, but was then unable to boot from cold. Would that make any sense?
A bad solder joint, broken trace or other continuity defect could cause "only boots when cold/warm" type issues with the different thermal expansion coefficient and stresses causing contact to intermittently come good or go bad under as conditions change.

It wouldn't explain why the WX pair work fine when split between separate memory channels but not together while the YZ pair works fine in any configuration, especially when all four came as a single kit.
 

jdk09

Reputable
Oct 27, 2019
25
1
4,535
I couldn't resist more tinkering with how much I've done, so I put X in A2 and then tried W, Y and Z in B2 successively, all worked.

So I thought maybe all 4 could work together if I separated W and X to opposite channels, Tried A1(Z) A2(X) B1(Y) B2(W) - it failed.

I took Y out, reasoning that if it worked, it would implicate W sharing a channel with anything as the problem. If it didn't work it would suggest X. This configuration worked A1(Z) A2(X) B1(-) B2(W). So it looked like the problem was W sharing either channel with any other DIMM was causing the failure. I reviewed my notes on all the other trials I did

With 2 exceptions, any time W shared a channel, the boot failed. Any other configuration where W was either not in, or was on it's own channel worked

The exceptions:
-2 stick setup with Y in B1 and Z in B2 froze on the Aorus logo and I eventually manually shut down. I didn't re-try. This should have worked, of note Y on A1 and Z on A2 did work just fine, as do either dual channel Y/Z combo.

-The other exception was the oddest trial I did. With A1(W) A2 (X) B1(Y) B2(-) the boot got to Aorus screen after a longer period of C code cycling on the motherboard LEDs, it froze and I shut it down, but when I powered back up it worked, but did prompt the disc repair. I was able to get into windows and it seemed normal, however. This arrangement shouldn't have worked since W shared a channel with X.

In the end, it isn't quite perfect, but I tried in all 20 combinations of DIMM arrangements and the rule held on 18, and both exceptions were sort of anomalies where the hangup occured a little further in the boot sequence than the other times.

I'll start and RMA, but just curious if anyone has seen a case where DIMM will work in a dual channel pairing, but is not able to share a channel?

Thanks again for all the help InvalidError!
 

InvalidError

Titan
Moderator
I'll start and RMA, but just curious if anyone has seen a case where DIMM will work in a dual channel pairing, but is not able to share a channel?
If you have two DIMMs running dual-channels, there is only one DIMM per channel and that is the configuration that provides the best stability especially at higher clocks.

Having two DIMMs on one channel doubles the load on the CPU's memory control/address lines since there are twice as many DRAM dies connected to the channel's command/address bus. It also adds wiring stubs that can interfere with signal integrity. When you power the system and see the BIOS looping through a bunch of memory-related codes, the BIOS is attempting to find a combination of low-level memory parameters that will make the bus stable. The more stuff is on each channel/bus, the more difficult that is. Especially if one of the DIMMs is flaky.
 
Last edited:
  • Like
Reactions: jdk09