Question G.SKILL RAM failed twice in one month: bad luck, other component, BIOS/AGESA?

Oct 27, 2024
3
0
10
Hi all, I'm new here and while I have been tinkering with PCs all my life I now have an issue that has be both stumped and worried.

To summarize: my RAM failed, and within a week of receiving a replacement kit through RMA it failed again in the same way in the same slot.


Background

I have been running this system without problems since early 2021:
  • Motherboard: MSI MAG B550 TOMAHAWK
  • CPU: AMD Ryzen 9 5900X
  • RAM: G.SKILL Trident-Z NEO RGB F4-3600C16D-32GTZN (32 GB total, 3600 MHz, 16-16-16-36, kit of 2)
  • PSU: Seasonic PRIME TX-850, 850 W, 80+ Titanium
  • GPU: MSI SUPRIM-X RTX 3080 10GB
  • Case: Fractal Design Define R5, 3 intake fans, 1 exhaust fan
  • CPU cooler: Noctua NH-U14S w/ Thermal Grizzly Kryonaut underneath, 2 fans, one on each side
  • OS: Windows 11 Pro 23H2
I'm not an overclocking kind of guy, so CPU/RAM settings have been on the defaults the whole time, except that I turned on the XMP profile for 3600-16-16-16-36 RAM operation, and did some IOMMU / DMA protection tidbits to get the "enhanced security" stamp of approval in Windows 11.

RAM is installed in recommended A2 and B2 locations (2nd and 4th from CPU socket).


RAM kit failure 1

Last month the RAM module in slot B2 failed. Symptom: Windows desktop froze solid after having just turned on the system getting ready to play. Needed a hard shutdown. After that it would not POST again, the EZ Debug LED on my board stuck on the CPU LED.

After some inspection of the connections it turned out one of the RAM modules would not POST anymore in XMP mode. The other module still worked fine on its own. A day later the affected module would also not POST anymore in the default 2133 MHz setting.

Removal of all RAM would cause a DRAM LED, so the affected module was clearly holding up the CPU somehow.

My conclusion: bad RAM. RMA it to G.SKILL receive brand new RAM, go on with my life.


RAM kit failure 2

The fresh RAM (produced this month!) lasted less than a week. This time it was a BSOD, again after just starting the system, running some Windows updates, nothing special. Rebooted a few times with BSOD, ended up with a Windows system repair which I didn't start for data safety as I wanted to Memtest86+ the RAM first.... lo and behold, at 99% on that test the entire system froze with artifacts on the screen, and again the module in slot B2 was stuck on the CPU LED from that point on. What are the odds?

The other module works fine, also on its own in the cursed slot B2, all tests succeed.

Now I'm really scared to use my spare F4-3600C16D-32GTZNC (looser SK Hynix version of the same kit) which is the only working kit I have left.


Question/speculations

So is this a really bad coincidence or is my system eating RAM now? My possible theories:
  • I have really bad luck, RMA again, forget about it.
  • The motherboard is blowing it up (VRMs?). In that case: why only that specific slot and not all? Visual inspection of the board does not reveal any blown components or traces.
  • The CPU is blowing it up. Isn't that just signaling while voltage is provided by the motherboard?
  • The PSU is blowing it up. This thing is top of the line and about 3 years old, how would I test that?
  • Beta BIOS 7C91vAI1 is blowing it up. The first failure happened the day after I did the update to get AGESA 1.2.0.Cc to fix Sinkclose, as while I don't care about overclocking I do want my security. But in that case: all board brands have pushed this update by now so I would expect a lot more complaints of bad RAM.
Some more stats:
  • CPU Core Voltage (SVI2 TFN): 0.9V-1.48V idle, it settles at around 1.36V on full load
  • CPU SOC Voltage (SVI2 TFN): 1.081V
  • CPU Infinity Fabric speed: 1800 MHz (1:1 with RAM)
  • RAM voltage: 1.364 V
  • CPU temp: 37°C idle, 65°C full load
  • GPU temp: 40°C idle, 76°C full load
So this all seems to be well outside dangerous territory.

I'm now using the remaining module in B2 to see if it eventually fails as well, but that might be tomorrow, or next week, ...
I'll RMA this kit as well soon, but does anyone have an idea how I might start isolating this without possibly going through a RAM kit every time I change a single thing? In a couple of days I'll have the chance to test the modules in another board (Asus TUF B550-PLUS).

Thanks a lot!
 
Last edited:
Hi all, I'm new here and while I have been tinkering with PCs all my life I now have an issue that has be both stumped and worried.

To summarize: my RAM failed, and within a week of receiving a replacement kit through RMA it failed again in the same way in the same slot.


Background

I have been running this system without problems since early 2021:
  • Motherboard: MSI MAG B550 TOMAHAWK
  • CPU: AMD Ryzen 9 5900X
  • RAM: G.SKILL Trident-Z NEO RGB F4-3600C16D-32GTZN (32 GB total, 3600 MHz, 16-16-16-36, kit of 2)
  • PSU: Seasonic PRIME TX-850, 850 W, 80+ Titanium
  • GPU: MSI SUPRIM-X RTX 3080 10GB
  • Case: Fractal Design Define R5, 3 intake fans, 1 exhaust fan
  • CPU cooler: Noctua NH-U14S w/ Thermal Grizzly Kryonaut underneath, 2 fans, one on each side
  • OS: Windows 11 Pro 23H2
I'm not an overclocking kind of guy, so CPU/RAM settings have been on the defaults the whole time, except that I turned on the XMP profile for 3600-16-16-16-36 RAM operation, and did some IOMMU / DMA protection tidbits to get the "enhanced security" stamp of approval in Windows 11.

RAM is installed in recommended A2 and B2 locations (2nd and 4th from CPU socket).


RAM kit failure 1

Last month the RAM module in slot B2 failed. Symptom: Windows desktop froze solid after having just turned on the system getting ready to play. Needed a hard shutdown. After that it would not POST again, the EZ Debug LED on my board stuck on the CPU LED.

After some inspection of the connections it turned out one of the RAM modules would not POST anymore in XMP mode. The other module still worked fine on its own. A day later the affected module would also not POST anymore in the default 2133 MHz setting.

Removal of all RAM would cause a DRAM LED, so the affected module was clearly holding up the CPU somehow.

My conclusion: bad RAM. RMA it to G.SKILL receive brand new RAM, go on with my life.


RAM kit failure 2

The fresh RAM (produced this month!) lasted less than a week. This time it was a BSOD, again after just starting the system, running some Windows updates, nothing special. Rebooted a few times with BSOD, ended up with a Windows system repair which I didn't start for data safety as I wanted to Memtest86+ the RAM first.... lo and behold, at 99% on that test the entire system froze with artifacts on the screen, and again the module in slot B2 was stuck on the CPU LED from that point on. What are the odds?

The other module works fine, also on its own in the cursed slot B2, all tests succeed.

Now I'm really scared to use my spare F4-3600C16D-32GTZNC (looser SK Hynix version of the same kit) which is the only working kit I have left.


Question/speculations

So is this a really bad coincidence or is my system eating RAM now? My possible theories:
  • I have really bad luck, RMA again, forget about it.
  • The motherboard is blowing it up (VRMs?). In that case: why only that specific slot and not all? Visual inspection of the board does not reveal any blown components or traces.
  • The CPU is blowing it up. Isn't that just signaling while voltage is provided by the motherboard?
  • The PSU is blowing it up. This thing is top of the line and about 3 years old, how would I test that?
  • Beta BIOS 7C91vAI1 is blowing it up. The first failure happened the day after I did the update to get AGESA 1.2.0.Cc to fix Sinkclose, as while I don't care about overclocking I do want my security. But in that case: all board brands have pushed this update by now so I would expect a lot more complaints of bad RAM.
Some more stats:
  • CPU Core Voltage (SVI2 TFN): 0.9V-1.48V idle, it settles at around 1.36V on full load
  • CPU SOC Voltage (SVI2 TFN): 1.081V
  • CPU Infinity Fabric speed: 1800 MHz (1:1 with RAM)
  • RAM voltage: 1.364 V
  • CPU temp: 37°C idle, 65°C full load
  • GPU temp: 40°C idle, 76°C full load
So this all seems to be well outside dangerous territory.

I'm now using the remaining module in B2 to see if it eventually fails as well, but that might be tomorrow, or next week, ...
I'll RMA this kit as well soon, but does anyone have an idea how I might start isolating this without possibly going through a RAM kit every time I change a single thing? In a couple of days I'll have the chance to test the modules in another board (Asus TUF B550-PLUS).

Thanks a lot!

really not shocked to see a msi motherboard with ram problems at this point its becoming a meme.

the reason your ram is failing is its clocking up the voltage 1.364v

it should be 1.35v make sure ram in bios is set to 1.35v

i personally avoid g skill

if you must go with ram go kingston/patriot/pny


update your bios to latest version incase its a bios bug
 
Oct 27, 2024
3
0
10
really not shocked to see a msi motherboard with ram problems at this point its becoming a meme.

the reason your ram is failing is its clocking up the voltage 1.364v

it should be 1.35v make sure ram in bios is set to 1.35v

i personally avoid g skill

if you must go with ram go kingston/patriot/pny


update your bios to latest version incase its a bios bug
If I manually set the voltage to 1.35V it stays at 1.364V, if I lower it to 1.34V it becomes 1.356V. Not sure about the logic behind that 🤔 Latest BIOS.
Nevertheless this has been working like this for years until now, and I thought it is generally accepted these dies are able to handle up to 1.4V-1.45V without issues, especially Samsung?
 

triplex1

Prominent
Jun 2, 2024
585
89
470
If I manually set the voltage to 1.35V it stays at 1.364V, if I lower it to 1.34V it becomes 1.356V. Not sure about the logic behind that 🤔 Latest BIOS.
Nevertheless this has been working like this for years until now, and I thought it is generally accepted these dies are able to handle up to 1.4V-1.45V without issues, especially Samsung?
Any hardware failures I've had in all the years I've been working and building computers were with Msi motherboards and most of the times with the bios provided by the specific company (I'm not even talking about beta versions), until I gave up on it and never got my hands on it again Msi
There are some memory like the G.SKILL Trident-Z NEO that have had issues, but I'm pretty sure it's your motherboard that can't support them.
Before doing RMA again, try them on another system
 
Last edited:
If I manually set the voltage to 1.35V it stays at 1.364V, if I lower it to 1.34V it becomes 1.356V. Not sure about the logic behind that 🤔 Latest BIOS.
Nevertheless this has been working like this for years until now, and I thought it is generally accepted these dies are able to handle up to 1.4V-1.45V without issues, especially Samsung?

That sounds like the motherboard isn't working correctly mine sits bang on 1.35v

Personally I wouldn't trust it to be accurate if it can't get voltage right from manual id toss the board.

Yes 1.4-1.45 is fine but MSI motherboards are generally crap boards I've had 3 fail so I would put money on it it's the board not the CPU at fault. Going any higher then what the manufacturer suggests is generally with some risks.
 
Last edited: