i7 980X disaster!!

sartannis

Honorable
Apr 29, 2013
9
0
10,510
Hey everyone,

Nearly 3 years ago I put together and high end system for a customer who wanted to use it for 3d and video editing.

The system is as follows

Intel i7 980X CPU
Asus P6T WS Revolution
12 GB DDR 3 1066Mhz Ram (3x4GB)
Bluray LG dvdrw
2TB Seagate barracuda Hard Drive
Asus GTX 470 video card
850W tru power thermaltake Power supply

This system has been crashing, in a very intermittent way, since it was built.

The two most frequent blue screens are Stop 0x00000101 (secondary processor error) and 0x0000009C (machine check exception). They occur usually on shutdown and almost always after the system has been running for a long period of time.

The problem is the system can go for a week, sometimes even longer without an issue, the issue then becomes worse for a time then almost disappears... this makes troubleshooting a nightmare. So far the motherboard has been RMA'd and replaced, the cpu has also been replaced, the video card has been replaced, the power supply has been replaced (I have tried a 1500w supply just in case). Also the bios is up to date, the windows installation is fresh (hard drive also replaced) windows 7 64bit. I have tried older drivers as well as current drivers to no avail.

I have tried adjusting the CPU Vcore after doing some research on the stop Stop 0x00000101, this actually seems to fix that problem but now the system is back to the 0x0000009C errors instead.

The ram has been replaced at Intel insistence to ram matching what they recommended in so far as speed, voltage and even brand.... no change.


So far over the past 2.5 years I have spent over 30 hours on this systems and hundreds replacing parts... Our company has lost way too much on this system. Now the manufacturers warranty is about to expire on all of the hardware

I have been putting together customer systems for this company for 10 years and Ive been in I.T for fifteen. This is the first time I have ran into a problem that I just CANNOT fix 🙁 Any help or suggestions that anyone can provide would be greatly appreciated!!!

 


I have in fact ran memtest. No errors and over 30 passes, also the ram has been replaced several times (most recently to a different brand and slower clock speed)
 
You've covered everything. I do 3DSMAX stuff as well, I'm putting my money on the application he's using, maybe some conflict in the 980X or the GPU using its cuda cores. It could also be some rouge drive he may have installed along side of 3dsmax. Because you changed all the parts i really thing its a software conflict with the hardware your using. I know you pain too, not being able to repeat the crash so at least you can identify the issue.

because i think it is software, maybe install only 1 thing at a time as the person is using it. And not install things until its needed. Perhaps this will help you track down what the issue is or at the very least elimate the possibly that it is software.
 


I wish it were software. I have actually simply left the system with no programs running for a couple of days and went to reboot it and the error comes up. The blue screen can also happen when using the internet or video editing software or even during a cold boot. Most often it happens during shutdown though... This also happens after a completely fresh load of windows 7, no third party drivers installed whatsoever
 


I have actually removed all the hardware from the enclosure with no change to the errors
 


The CPU has been replaced, I thought that as well. And no I havent ran linux at all on this system.
 


Yes the cpu has been replaced. I will look into the link you provided for the machine check errors. Thank you.
 
I had an i7-980X for a few years and I feel it was one of the better CPUs that I have owned. I have had a few issues with it but it was always a motherboard issue and possibly a motherboard/bios issue that was eventually fixed with the use of an Asus Rampage 111 Extreme board.
When you say that the motherboard was replaced was it with the same model? It may be that the P6T WS is the issue and that the motherboard design itself is at fault.
 


I think that is the best possibility. The replacement was the same model and I have wondered if its some kind of incompatibility despite the fact both intel and asus states that its fully compatible. I have not been able to find a different 1366 mainboard threw our suppliers and its doubtful my boss will want to foot the bill for an entirely different board. Might not have a choice though.
 
Have you tried swapping the cpu for a (supported) xeon and some ecc ram?
I for one would have not built a workstation class machine without ecc ram.

Also check the hdds for remapped sectors or other smart errors. I had a problem pretty similar to yours once and it turned out to be the hdd.
 


Thanks for the reply. This system is used for video editing, 3d rendering as well gaming. Nothing professional as far as the 3d and editing go. The mainboard does not support ecc and a xeon is not appropriate for this system. In fact the customer could have gone with a standard i7 cpu and been better off. He choose to spend $1000 on this cpu against my recommendations. If we were to bring in a xeon cpu, a workstation class mainboard and ecc ram I would A: be stuck with all his old components or B: end up with a xeon cpu, mainbaord and ecc ram that I would likely not be able to sell to anyone else. We do deal with some business class stuff but its all small busniness of 5-20 PCS max with enerty level server systems (I live in a city with a population shy of 50,000). In the end the problem is that all the hardware has been replaced, all of it is supposed to be compatible and it just doesnt work properly. I am fairly certain at this point that it comes down to a desgin issue between the main board, cpu and or video card... its frustrating that the cheap systems do not have these issues yet the "cutting edge" stuff (which this was three years ago) are far more prone to these probelms. Its like buying a $200,000 car and every 500 km it stalls and dies for no reason. Just silly IMO.
 
do me favor and disable virtual memory and see if you problem presents itself less often. Thats what clued me into my nightmare problem being the hdd dispite it passing several tests. I went from sometimes hourly, mostly daily to weekly just by disabling virtual ram.