My High-End HP Z440 Workstation crashes while playing High/Low end games. Please help.

Masaqre

Reputable
May 20, 2015
6
0
4,510
Hello,

I have been experiencing crashes with my desktop for a while now while playing high end, and sometimes even low-end games. However the problem is more frequent with high-end games.

I have read on these forums and on other websites that it might be Memory or Power related.

I am running memtest86 as we speak, but the ram is new and i am not expecting that its bad.

Specs:
Brand: HP Z440 Workstation
EVGA 4GB GDDR5 Geforce GTX 980 SC
32 GB DDR4 2133 mhz HP* Ram
Intel Xeon E5-1650 v3 3,50GHZ
700W HP Gold 90%

The games run fine at 60+ FPS with no framedrops, then all of a sudden the game freezes, my monitor goes black and tells me i lost connection. Then i have to kill the pc with the power button in order to be able to boot it again. The problem is also VERY inconsistent because sometimes it crashes straight away and sometimes it doesnt at all, with the same games in the same areas with the same settings.

Solutions i have tried:
1. I have downloaded the EVGA Overclocking tool in order to turn my Superclocked card back by 50mhz, this looked like it had solved the problem but since i started playing the witcher 3 wild hunt it started to show itself again. I also created a custom fan profile for the card since the fans did not seem to turn on by themselves. The temperature of the card averages at 64 degrees celsius with very demanding games. With fan speeds up to 60%.

Please help,
Bart

PS: I will be actively watching this post during the upcoming days. And if theres an expert out there that needs to guide me through a process i will be more than glad to add him or her on skype or something similar.
 



Bart,

This is puzzle, because the symptoms seem to have various signs of driver conflict, registry problems and memory hardware. I do not think it is power supply related.

1. Because the memory hardware is the most fundamental, I would suggest trying to eliminate that first as a possibility. The first action would be to restore the system to the original configuration and try to duplicate conditions of earlier failures. Review carefully the new memory type and model also especially if it is not HP supplied as the model could be correct but with different timing. Also, the memory configuration is very particular, and note the sequence of placement if the module sizes vary. It's even possible that mixing single and double rank RAM can have an effect. It does seem as though you would be seeing error messages

2. Review the drivers installed on the system in case there is a residual driver from an original GPU, assuming the GTX 980 SC is a substitute. If the system had for example, a Quadro originally, there could be a conflict with the GeForce driver. If this system had a different GPU originally and you still have it, replace the original back into the system as a filter to this possible driver conflict and do a "clean install" (removes previous drivers) of the driver..

The fan situation is interesting, but I think the system may be resistant to software fan controllers- they certainly don't work on Dell Precisions. the temperature of 64C on the GTX 980 doen't seem extraoridinary to the point of thermal shurdown. On a Dell Precision T5400, I saw the Quadro 4000 at 95C during rendering and the system fan speed was not dramatically increased. It was increased, but I did not hear the change. Workstation are typically somewhat ruggedized for long, high performance running scenarios.

Just a couple of ideas to try and narrow the problem a bit.

Let us know what happens and we'll hope it's the simplest thing!

Cheers,

BambiBoom

HP z420 (2015) > Xeon E5-1660 v2 six-core @ 3.7 /4.0GHz > 16GB DDR3 ECC 1866 RAM > Quadro K2200 (4GB) > Intel 730 480GB > Western Digital Black WD1003FZEX 1TB> M-Audio 192 sound card > Logitech z2300 > Linksys AE3000 USB WiFi > 2X Dell Ultrasharp U2715H (2560 X 1440) > Windows 7 Professional 64 >
[ Passmark Rating = 4918 > CPU= 13941 / 2D= 823 / 3D=3464 / Mem= 2669 / Disk= 4764]

Pending upgrade: HP /LSI 9212-4i PCIe SAS /SATA HBA RAID controller, 2X Seagate Constellation ES.3 1TB (RAID 1)

HP z420 (2013) > Xeon E5-1620 four core @ 3.6 /3.8GHz > 24GB DDR3 ECC 1600 RAM > AMD FirproV4900 (1GB) > Seagate 500GB > Linksys WMP600N WiFi
[Passmark system rating = 2372 / CPU = 9001 / 2D= 712 / 3D= 1353/ Mem= 2261 / Disk= 712]

Dell Precision T5500 (2011) > Xeon X5680 six -core @ 3.33 / 3.6GHz, 24GB DDR3 ECC 1333 > Quadro 4000 (2GB ) > Samsung 840 250GB /WD RE4 Enterprise 1TB > M-Audio 192 sound card> Linksys WMP600N PCI WiFi > Windows 7 Professional 64> HP 2711x (1920 X 1440)
[ Passmark system rating = 3339 / CPU = 9347 / 2D= 684 / 3D= 2030 / Mem= 1871 / Disk= 2234]

Pending upgrade: PERC H310 PCIe SAS /SATA RAID controller, 2X WD Black 1TB (RAID 1)(Converts disk system from 3GB/s to 6GB/s




 
1. The original memory was 2x8 Dimms of 2133 HP memory, i ordered two more of exactly the same dimms and added them to the machine, although they were a bit different the model number was the same. Also they have been placed in 1,3,6,8 dimms. As instructed in the HPZ440 Manual. (I said kingston memory, i made an error. It is hp Memory.) If im correct, the memtest should show if there are any problems with the current memory configuration. Also, i have checked the timing and they were either the same or close enough to be compatible. My machine has also been running fine apart from these crashes.

(Be aware, my knowledge about compatibilty comes from reading lots of forum posts and just informational posts in general. So tell me if im wrong about something i write.)

2. I will do this as soon as memtest is done. It has been going for a solid hour and a half now.

As for the temperature, i have the fans running at a custom profile but even without the program the card has ever reached a maximum of 76 degrees. In my opinion this should not be enough to fry a card.

Thanks for your answer!
 


The fans go to rest mode, so waay down. Also, i have updated the bios of my pc now, it seemed to be fixed but after 3 hours and 45 minutes of play it crashed again. However, this time the computer rebooted and showed me this screen: http://imgur.com/Oix0b2w

I believe this might confirm that its the graphics card?
 


That indicates a communication issue between the card and motherboard. Power issues generally cause the fan to ramp up all the way if it makes it past driver initialization. Stopping would usually man the card. Check event viewer and go into Windows logs/system and application and let me know what you find.
 


Mentions of an unexpected shutdown at the reboot moments and this warning from time to time:

The driver \Driver\WUDFRd failed to load for the device WpdBusEnumRoot\UMB\2&37c186b&1&STORAGE#VOLUME#_??_USBSTOR#DISK&VEN_GENERIC-&PROD_USB____CRW-CF#MD&REV_1.00#201312261010&0#.

 


Bart,

It's important progress to see the specific error message. This refers to a fatal error (Completion Timeout) on PCIe Slot 2.

On Pg. 4 of the z440 manual:

Slot 2:
PCI Express Gen3 x16
Full height, Full-length (with extender)

> and this is indeed the Primary GPU slot.

Looking into the nature of this error, I believe it may be useful to try going into the BIOS and set Slot 2 to run at PCIe v2.0 at least as a test. The setting may also be labeled: "5GB/s".

This is a bit conjectural on my part but seems to be the most promising solution so far, thanks to the specific error message that gives a cause and location. Would you like to try resetting to PCIe 2 for Slot 2? I can't see a way that resetting Slot 2 to PCIe v2 could damage anything in trying and could always be reset if unsuccessful.

Here's hoping the whole thing is only a few keystrokes!

Cheers,

BambiBoom
 


Try removing anything not necessary to operate the pc.
 


Hey Bambi,

I believe i've seen the same solution posted somewhere on the web already. However the gtx 980 is a PCI-E 3.0 card. So this solution wouldnt work. http://www.evga.com/Products/Product.aspx?pn=04G-P4-2981-KR

Greetings,
 
Putting your Graphics Card Slot into PCIe 2.0 mode WOULD work. PCIe 3.0 operates at 2x the frequency and therefore 2x the bandwidth of PCIe 2.0, and EVERY PCIe 3.0 device is backwards compatible.

Putting a PCIe 3.0 x16 Slot into PCIe 2.0 Mode would essentially limit the Bandwidth to exactly the same as PCIe 3.0 x8, and from Consumer CPU/Chipsets with Dual-GPU Setups we can see that for most gaming Workloads, the Bandwidth Penalty you take from essentially cutting your GPUs interconnect to the System in half is negligible