random reboots on custom server

mmd123

Reputable
Oct 3, 2014
11
0
4,510
so....I had to rebuild my storage server into a 4u server chassis, and the thing was working perfectly (albeit installed to a wood table with nuts and bolts) and as soon as I installed it into the server chassis all sorts of weird crap started happening. to explain whats going on, that I cannot understand rhyme or reason to:

half the time the system will post just fine, I can get it to load bios just fine and on several occasions I can even get it to fully boot into a live cd environment and SEEMINGLY have it operate just fine...
where as the other half of the time, it will post fine, load into bios,
(and then half the time it does load bios it randomly power cycles for no apparent reason while in bios)
, or while doing the full memory post test it power cycles in the middle of the memory post test seemingly between the 4GB and 8GB marks regardless of the memory I have installed.

for starters, the hardware I'm TRYING to run, followed by all the hardware I've tested and trouble-shooting steps I've taken are the following:

dual 1366 socket Tyan motherboard (Tyan S7012GM4NR / S7012GM4NR-B)
dual 1366 xeon 6 core cpus at 2.8 Ghz (they worked in a past build with a prior motherboard so I know they are on the hardware support list)
80 gigabytes of ddr3 ecc memory at 10600 speeds comprised of 4 8GB sticks and the rest 4GB sticks similar to these, for a total of 16 dimms (apparently some of these are in there as well seeing as they are in my ebay purchase history and this system is the only ddr3 system I have bought parts for online at all...)
2U server heat-sinks and 1U full copper heat-sinks (thinking its an overheating issue but no difference noted either way)

as for the troubleshooting hardware I've swapped to test:

all the addon cards installed, and all removed, as well as one card a time each time to try and find out if an addon card I cant replace is whats causing these intermittent issues or not, no difference noted in any variation of the cards or no cards

LSI SAS 9210-8i SAS/SATA HBA
flashed to IT mode for use as an HBA for freenas and NOT raid mode
sas expander for port multiplying for freenas from the HBA card above
two of these 10gb fiber cards for point to point connectivity with the clients I access this thing from.


all the ram, none of the ram (to see if it even recognizes if there is or is not ram) and single sticks per memory bank sets comprised of 6 different pairs and tests of single sticks of memory...again, no difference noted for any variation of the components that registered anything to me.

tried two separate motherboards (same exact board make and model just two separate boards) and even updated the newest board I bought thinking it was a bad board (esd or something) to the latest bios, no changes noted.

swapped to the dual quad core xeons that run at 2.26 vs the six cores I'm trying to get to work on the thing and did on the last board before the new chassis, that run at 2.8 and tried all these same exact steps for both sets of cpus (been trying to get this danged thing working again for over 2 weeks of troubleshooting now and after exausting my experience hands on of 20+ years and counting and consoling a friend, he said post on a site like this, because I'm at a loss totally and completely now)

tried two totally separate sets of power supplies, including the exact pair (yes I know, I had two power supplies rigged together in a server, not a good idea, but I'm broke and had them and they worked via isolation of the main board components and the storage drives to the other power supply so I did what I had to and could afford) that I had this thing working perfectly fine with, with the dual 6 cores, and all the same hard drives, on that wood board, as well as a silverstone 1.5Kw power supply that has MORE than ample power output for everything as confirmed with room to spare, by outer visions power supply calculator, giving over 5 amps wiggle room on every rail and then some.

Ablecom SP762-TS SuperMicro PWS-0050 760W 3 x SP382-TS Redundant Power Supply
silverstone 1.5Kw power supply


totally cleaned off and replaced the thermal paste on all the cpus, ONE TIME BOOT difference noted, given that the exact next boot after that thermal paste replacement, from dried on paste that was crusty and had to be "cracked" off the heatsink and cpu was replaced with more liquid-y paste that was included with a water cooling component I bought for another system ages ago....after that first time boot with the replaced thermal paste however, symptoms went back to the normal issues I'm facing and having issues with and have yet to go back to that one time fix scenario.


so....I'm at a total loss here, all my 20+ years first hand experience, and I'm wanting to scream and rip my hair out because I have literally no idea what is going wrong here or why when it was working fine when it was installed directly to a wooden shelving unit from Walmart, vs a 200$ 4u server chassis that this stuff is designed to work inside of...and aside from buying a total new system of these exact parts, I have tried every trouble shooting step I can think of or have heard to try or ever seen or heard of others trying, short of a total rebuild that I cant afford.

please....I'm at my wits end here, somebody please help me here!

I need to figure why this is happening and need help troubleshooting whats going wrong here in order to pinpoint the issue and fix it.