Question WHEA errors in OCCT during CPU upgrade of a XEON/EEC RAM system, no overclock

Cyber_Akuma

Distinguished
Oct 5, 2002
456
12
18,785
I am actually not even sure if this is a CPU or RAM problem, or something else, I am assuming CPU only because I just installed a new one, but I am not 100% sure.

I have a Dell Precision T3610. About a month ago I upgraded it's RAM to a 8x16GB DDR3 EEC configuration. I ran several RAM and memory tests and got no errors.

Yesterday I replaced it's Xeon E5-1620 v2 CPU with a used 2667 v2 I got on eBay (Not overclocked... Pretty sure this system doesn't even let me overclock). I again ran Memtest86 (which took about 10 hours) and got no errors, then I ran the latest Memtest86+ (which is finally out of beta) overnight and got no errors.

Then I booted into Windows and ran several 10-30 minute tests in Prime95 on various CPU, RAM, and both stress-testing configurations, half with and half without Furmark also running... no errors.

So just to be through I then got the latest OCCT and ran a RAM test on all of my RAM... no errors. I then ran the CPU test on Extreme... and that's when I noticed it got a WHEA error.

According to the Event Viewer it says "Event 47: A corrected hardware error has occurred. Component: Memory. Error Source: Unknown Error Source". The details were mostly 0 in every field and had a Physical Address listed. I then tried running OCCT's 2021 Linpack test with it's default 2GB of RAM usage and had no issues. I ran it again trying to set it to use as much of my RAM as possible and I again got a WHEA 47 error. Both of these seemed to happen at the same physical address according to the details. I tried immediately running this same test again expecting it to give me an error at the same time and memory address again... but the third time it passed with no errors.

Is this something to be worried about? Can this used CPU I got possibly be damaged and it survived all of those tests but then crapped out during a random test on OCCT? Is it even the CPU or the RAM that's at fault here?

I tried Googling about this and most of the answers I got are that one should not be getting any WHEA errors whatsoever... but almost all of those were in regards to people overclocking their CPUs and the OC being unstable, usually the advice was to turn down the OC and/or increase voltage, neither of which I can do since I am not OCing and the BIOS does not let me adjust any such settings. All of these were in regards to consumer CPUs/RAM as well.

I did however run into a forum post from another user who had a Xeon/ECC system, and they were told that correcting those errors is what EEC RAM is supposed to do. So does that then mean my system is fine? Or is this still a cause for concern? Would it even be my CPU or my RAM in this case? I find it hard to believe that my RAM passed days of testing when I installed it a month ago, as well as the PC being on 24/7 for that whole month without any errors, and then with the new CPU all those tests still passed but a single OCCT CPU test managed to catch a possible defect in either my RAM or my CPU.

On the other hand though, now that I am checking my Event Log I see that there was a ton of "Event 2: WHEA-Logger" during when I was doing the Prime95 testing (Prime95 itself never showed any errors though) with very little details. The event log just says "A corrected hardware error has occurred" and to check the data section for details, which was pretty sparse on the details anyway.

The only other times I can find WHEA errors in my event log, which are all Event 2s, are around the time I installed that new RAM about a month ago and did Prime95 stress testing on it. I didn't do additional CPU stress testing at the time since I still had the same CPU I had been using for nearly two years now back then.

Here are screenshots of the errors: View: https://imgur.com/a/lWaRti9
 

Cyber_Akuma

Distinguished
Oct 5, 2002
456
12
18,785
I didn't look too closely, but since the previous CPU didn't give me issues and the pins are on the mobo I would assume they aren't bent.

Also, if they were bent, wouldn't it give an uncorrectable WHEA error instead of a corrected WHEA warning?
 
Sorry I guess i am not explaining myself correctly, I am talking about looking at your cpu connectors itself. The gold can be scraped off sometimes and expose little pins. Any kind of error can be thrown for a bad cpu. This is just to rule it out and move on to next item. Corrected error means firmware fixed the detected hardware error before it went bsod. Uncorrected means the hardware error couldn't be fixed and it went to bsod. Unfortunately what hardware is causing the problem is not being listed. The only other thing I can suggest is to reduce the amount of ram and retest or go the one stick of ram at a time to rule out a bad or troubled stick.