Minidump - memory_corruption on a brand new build

tnebrs

Commendable
Dec 13, 2016
1
0
1,510
Hello everyone,

I've been struggling with frequent FAULTY_HARDWARE_CORRUPTED_PAGE blue-screen crashes on a brand new build on my work computer. I have a fair bit of experience when it comes to diagnosing these things, however I seem to have hit a bit of a brick wall and I'm hoping this might be the best place to find help.

I was initially receiving a variety of stop BSODs, which after some analysis I found initially due to my UEFI firmware undervolting the CPU, so whenever it went into "Turbo mode", there wasn't enough power to cope. I have moved this up to the stock VID, as well as changed all my RAM timings and voltages to what they should be. I also fully reinstalled Windows and reinstalled my low-level drivers individually in the recommended order just in case this hadn't correctly executed. These steps resolved the majority of the blue screen issues, but left me with this one. It probably strikes once every couple of weeks or so with heavy use (Adobe suite software and many Chrome tabs. I'm a tab addict.)

N.B. A little note on the RAM - it doesn't all match. I am currently running 24GB of Corsair Vengeance LPX DDR4 but due to a supplier issue this is split into to 2 matched sticks of 8GB 2400 installed in (what I believe to be) the correct dual channel slots and one stick of 8gb 2666. All of them are set to the speed of the 2400 sticks and the rated voltage is the same.

Things I have done:

Clean fresh install of Windows 10 Anniversary Edition, updated to latest stable build.
Installed drivers in correct order, including IRST and AHCI drivers (I use a SSD, but not in RAID)
Set the VID, VDIMM and all stated RAM timings to manufacturer recommended in the UEFI.
Disabled fast startup
Ran a Driver Verifier check on all non-Windows drivers with everything apart from memory limitation and DDI compliance settings checked.
Removed the mis-matched RAM stick and tried to run for a week.

Things I have not done:

memtest86 or WMD. WMD did not successfully complete one pass after a 14 hour stint (on all 24gb, mind) and I haven't attempted it again since. I will probably run 8 passes on one stick at a time after writing this post, as so far all my minidumps point to memory corruption. It just seems strange that this would happen on brand new memory at stock voltages. I will update with any findings.

My most recent minidumps are below, the last two or three are from the driver verifier. WinDBG reads DRIVER_VERIFIER_DETECTED_VIOLATION (c4) and Probably caused by : memory_corruption

System Specs:

DXdiag & MSInfo: https://we.tl/YAx39VaMGq
Part numbers for the RAM:
CMK16GX4M2A2400C16 x2
CMK8GX4M1A2666C16 x1

Minidumps: https://we.tl/AUni37OcDz

Thank you!

tnebrs

 
Solution
generally you will want to update the BIOS to current version, then run memtest to confirm that the memory timings are set correctly. if you get any memory errors running memtest you need to fix the problem. generally by inspecting all of the memory primary and secondary timings (often the bios will use the wrong default command rate for some memory modules. IE set to 1T rather than the 2T clock rate that some modules require. It ends up causing bit corruptions that windows will bugcheck on.
you either have to get the timings set to the slowest module or just lower the clock rate of the memory so the timing windows in the electronics does not violate the chips timing requirements.

if you have just one module with errors you can...
generally you will want to update the BIOS to current version, then run memtest to confirm that the memory timings are set correctly. if you get any memory errors running memtest you need to fix the problem. generally by inspecting all of the memory primary and secondary timings (often the bios will use the wrong default command rate for some memory modules. IE set to 1T rather than the 2T clock rate that some modules require. It ends up causing bit corruptions that windows will bugcheck on.
you either have to get the timings set to the slowest module or just lower the clock rate of the memory so the timing windows in the electronics does not violate the chips timing requirements.

if you have just one module with errors you can sometimes put it in the slot that is closest to the CPU as it tends to be few nanoseconds faster than the other slots.

generally a memory dump with memory timing problems will show 1 bit data corruptions in the data. The newer windows debugger will flag them so you don't spend too much time trying to debug hardware timing problems as a software problem.
 
Solution