Dual PSU system unexpected shutdowns

Status
Not open for further replies.

jpsh

Commendable
Mar 8, 2016
2
0
1,510
System Specs
Motherboard: ASUS Z9PE-D8 WS
Ram: 8x Samsung M393B2G70BH0-YK0 (16GB DDR3 1600 RDIMM REGISTERED ECC)
Processor: 2x Intel Xeon E5-2680 (v1 Sandy Bridge EP) 130w TDP
Video Card: 4x ASUS GTX 780 Ti 3GB DC2
PSU: 2x Seasonic Prime 1300w Gold Rated
Surge Protector: Tripp Lite LC1800 Voltage Regulating Surge Protector

I've been having issues with the system powering down under load for a while now and trying to get to the bottom of it once and for all.

I basically have it down to Motherboard, CPU's, or RAM. but I feel like the ram is least likely and the CPU's are pretty unlikely also as I've been able to keep system up for solid 5 days with both CPUs at 100% Load. Below is what I've tried.

PSU
Originally had 2x Lepa G1600 Gold PSU so I replaced with 2 new Seasonic PSUs listed above.

I'm using 4 cables on each psu (2 cables per video card) as recommended for video cards with TDP over 225w.for the pci-e connectors 1 cable splits into 2x 8pin connectors

Surge Protector
Also wanted to rule out the surge protector so get a brand new Tripp Lite LC1800, I've also tried plugged directly into outlet.

video cards
I can boot the system no problems and keep GPU under load with one card installed. I get shutdowns under video load with a second card installed in any combination of 2 pcie slots for the first cpu. 3 or 4 cards installed 50/50 won't let me boot into windows unless safe mode or nvidia drivers uninstalled.

Drivers
I Used DDU to uninstall the nvidia drivers (Display, HD Audio, 3D vision, PhysX). Then I was able to perform a regular boot with all 4 cards in so installed nvidia drivers 388.43 from windows updates restarted and powered off before it got to the welcome screen. Rinsed and repeated with the following driver version and no difference.
390.77
387.92
385.12
381.65

Bios
When I started I had Bios version 5601 installed, so I tried latest version 5802 and no difference, so I tried one version previous 5701 no difference.

harddrive
I ran chkdsk /x /r (I have an HDD for boot since bios takes 30 seconds to post anyways) and it found no errors.

add2psu adapter
I tried a different dual psu adapter and I was about to boot into windows with nvidia drivers installed and got really excited. I could put load on any individual card and it was fine but when I put load on all cards instant shut down.

video cards + PSUs in another system
I am able to run all 4 cards max load using new Seasonic PSUs on an old system running with out issues Using nvidia drivers 390.77. So that proves to me the video cards and PSUs are good


Other things I've noticed
I've also noticed that using a wattmeter that I can never manage much over 400w system draw with out it crashing. and when powered off completely the system still draws about 12w but on my old system using both new psu when powered off it was only drawing 0.1w.

What I'm going to try:
I have a know good 4x4gb ddr 3 ram kit I could test with to rule out that but not sure about mobo vs cpu. Since I can run the system for a 5 days straight with both cpu's at 100% utilization with zero crashes but I suppose the cpu's pcie controller portion could be bad? or should I just replace the motherboard? any thoughts?
 
I wondered what would happen if the memory controller on the processor went bad with a newer processor that has the equivalent of the north bridge on the chip. Don't think you have that type of setup looking at the chipset of the motherboard and based on the age of the equipment. Looking at the picture, it looks like you have a fairly standard north bridge type of chipset. At any rate, the old system takes a beating with all that power and hardware. I do wonder if you have blown out the north bridge/chipset of the motherboard.

*****
wikipedia says about the chipset:

The chipsets contain a 'memory controller hub' and an 'I/O controller hub', which tend to be called 'north bridge' and 'south bridge' respectively. The memory controller hub connects to the processors, memory, high-speed I/O such as PCI Express, and to the I/O controller hub by a proprietary link. The I/O controller hub on the other hand, connects to lower-speed I/O, such as hard discs, PCI slots, USB and Ethernet.
*****

You have narrowed it down to the motherboard or a single part of a processor. Seems to me logical that more could have gone wrong with the motherboard. Anyway, if the board isn't the issue, you could turn around and resell it "New OOB" if it's new or just keep it in case if you are attached to the computer.

100% up to you what you do o/c...
 


I haven't narrowed it any further than mobo, ram, or cpu(s) and unfortunately I have been unable to test this week. Hopefully I'll get a chance tonight though. The other night though I did take a good look at the motherboard and couldn't find any capacitors that looked off although the motherboard has all solid capacitors.

That is an interesting thought though.

Tonight I plan on running IPDT (Intel Processor Diagnostic Tool) which I just learned about last night. I also realized that one of the on board usb dongles is loose so I'm going to put a piece of index card in it to make sure it's not making a short. If that doesn't work I'll try my other ram kit because I really don't feel like running memtest on 128GB. and if that doesn't work I guess I'm getting a new motherboard.

I really appreciate the thoughts AtlBo, thanks!
 
Status
Not open for further replies.