This is a long story but you need the history and I need your great minds to collaborate with me here. Please bare with me. I'm posting in the graphics card section because that was the issue beforehand, and like the title says, what are the odds?
I have a custom built that runs excellent - when it runs. UserBenchmarks at 98% and 97%...
I'm seeking your help because I'm having some deja vu issues and I don't know if it's me- am I going crazy, do I have the worst luck in the world, did we replace the wrong parts? Or is it something else?
To top it off, I think I am up against a deadline, timewise (June 30th).
Here is my build:
Case- Phanteks Enthoo Pro M SE
PS- Thermaltake Smart Pro RGB 750W Zero Fan
MOBO- MSI X99 Gaming Pro Carbon LGA 2011-3 (refurb)
RAM- Corsair Vengeance LPX 32GB (8x4GB) Quad-Channel DDR4 2400MHz C14
CPU- Intel Xeon E5-1620 v4 Broadwell-EP Quad Core CPU @ 3.5GHz
Cooler- Enermax Liqmax II 240 "Front/Pull" config
NMVe- MyDigitalSSD 240GB 80mm BPX Pro m.2 PCIE
GPU- MSI GeForce GTX 1070 Armor OC 8GB
SSDs- 1x240GB Patriot, 1x240GB HyperX Cloud 9, 1x120GB Kingston SSDNow! v300
Storage - WD Passport USB 3.0
So I set everything up, converted the internal drives to GPT. Clean installed Windows 10 and I believe everything is GPT/UEFI mode. I installed all of MSI's (somewhat old) drivers, configured BIOS settings, ran XMP profiles, got all my softwares running, and things were great.
Then things started going downhill.
It would, rarely, not POST.
Sometimes it would POST, but not boot (freeze during the "circling circles" loading screen).
Often times it would boot, but crash with a white or pink screen after a random amount of time - regardless of activity, could be gaming, youtube, microsoft word, etc..
Eventually, it became predictable, every morning I would boot it up, wait for it to get to the desktop, crash, and then hard power off (just tap the power button once) and boot it up again.
Oddly, it would run perfectly fine after that intial crash...but that's no way to live... not after $1100 of blood, sweat, and tears.
The Pagefile was enabled, size was large enough, but crash dumps weren't being written out.
Event viewer showed lots of critical events in the past hour, then kernel-power "the computer rebooted but was not shut down cleanly" or something to that effect. The only other events close to that were event id 14 nvlddmkm... I read a lot about this and tried some troubleshooting. I do use a steamlink hardware, to stream my PC from the bedroom to the living room. Crashes seem to happen regardless of whether I'm streaming, or using the computer "locally".
Desperate for answers I reverted back to some (age old) drivers. They definitely reduced performance in game and benchmark by about 20%... but they seemed to relieve at least temporarily the crashing and the nvlddmkm error. But then the issue started back up again. I tried updating drivers, it only made the issue more prevalent. I tried updating the card's video bios, same issue. I tried checking and tweaking all sorts of settings and issues - from power profiles to hardware encoding/decoding settings in browsers to modifying the TDR values....sfc, dism, etc. etc. I finally decided to RMA the card.
I sent it in to MSI and in its place popped in an old R7 370 4GB. While the card was much slower, (GTA V now playing at 15 fps hahaha), it worked with absolutely no issues. No reboots, no freezes, nothing.
So the GTX 1070 card was definitely bad, right? Or is it the software (drivers)?
I noticed though with the R7 in that my NVMe drive was idling at temps around 70c. It must be thermal throttling under load, I thought. So I tried to update the firmware, and it was unsuccessful. I contacted the company and they RMAd it for me. Sent me back the Pro version, and it had a firmware update so I went ahead and did that. Much better temperatures, so I would have peace of mind with that.
By the time I'm done with that, I got back from MSI a replacement GTX 1070 Armor OC 8GB (Grade B). It was a refurb.
Now I'm taking no chances, right? A new NVMe, up to date firmware... I did a clean install of the latest Windows 10 the media creation tool would give me. I used SDI to install only the latest, best matching drivers, with no overlap or redundancy. Although I did let Windows 10 auto-detect and install the video card drivers (by mistake, I forgot to unplug the network cable).
So... history repeats itself...
Get all my settings tweaked, programs installed, backups created... life is grand! Like I said, benchmarked at 98% and 97%... but now things are happening again. The same sorts of things as before.
I tried to update the video card drivers, but they wouldn't install. I did some research and figured out that Windows update installed the DCH version of drivers. No biggie, I will leave them on the DCH ones and I will set all my settings as before, maximum power, no power management for pci, fan curves, etc. etc.
Now problems are starting to arise.
It always posts, and maybe once or twice it has not booted to the desktop. But now once at the desktop... I noticed my mouse lags a little sometimes (4k @ 60hz resolution), and it looks like the video card is working harder than the previous one I had (shows 6 or 7% GPU use idling at desktop, RMAd card was 0-3%). Sometimes it looks like the cpu is in use, then it will clear up and not be an issue.
I can game in 4k, GTAV with a steady 45fps (up to 70plus depending on what's happening). But now, like before, I will get a random reboot. Could be watching youtube, or streaming content to the steamlink (setup in my living room so we can watch TV shows, movies, etc.) and it just reboots. Crash dump isn't written out, and I see a couple of event id 14s in the event log (not as frequently as it was before though).
I loaded BIOS optimized defaults, and fearing the video card again, I downloaded superposition https://benchmark.unigine.com/superposition to stress test it. I also downloaded MSI afterburner and GPU-Z. I let afterburner do its smart overclock and it created a nice curve, topping out around 2000MHz core clock. I then loaded up GPU-Z and ran superposition. It ran through successfully multiple passes. The sensors in GPU-Z all seemed perfectly normal, except for PerfCap Reason. From what I understand, this is the thing that's holding you back, what's capping your perfomance. Mine said something like PwrRel or VolRel... the tooltip said something to the effect of "reliable power". I'm not sure what that means or if it is even significant.
So here I am... confused as what to do next. That's why I'm asking you all. What do you think? I can do any troubleshooting you want and provide logs and screens and videos... any help or guidance would be appreciated.
I don't want a system that has to daily crash and reboot, in order to be "stable" especially after spending hard earned money on it. I don't want to keep RMAing random parts (having to pay to ship it is very annoying), and I don't know if there's something I could be missing... is my logic bad? The issues went away with an AMD card... they resumed on a clean install... is it NVIDIA's terrible drivers? This event ID 14 nvlddmkm issue has a long history... could MSI have given me two faulty cards in a row? Could it be my power supply isn't up to it? My motherboard is a refurb from MSI also, could that be it?
Please let me know what information you need and what steps to take. "Help me Obi-Wan Kenobi, you're my only hope..."
I have a custom built that runs excellent - when it runs. UserBenchmarks at 98% and 97%...
I'm seeking your help because I'm having some deja vu issues and I don't know if it's me- am I going crazy, do I have the worst luck in the world, did we replace the wrong parts? Or is it something else?
To top it off, I think I am up against a deadline, timewise (June 30th).
Here is my build:
Case- Phanteks Enthoo Pro M SE
PS- Thermaltake Smart Pro RGB 750W Zero Fan
MOBO- MSI X99 Gaming Pro Carbon LGA 2011-3 (refurb)
RAM- Corsair Vengeance LPX 32GB (8x4GB) Quad-Channel DDR4 2400MHz C14
CPU- Intel Xeon E5-1620 v4 Broadwell-EP Quad Core CPU @ 3.5GHz
Cooler- Enermax Liqmax II 240 "Front/Pull" config
NMVe- MyDigitalSSD 240GB 80mm BPX Pro m.2 PCIE
GPU- MSI GeForce GTX 1070 Armor OC 8GB
SSDs- 1x240GB Patriot, 1x240GB HyperX Cloud 9, 1x120GB Kingston SSDNow! v300
Storage - WD Passport USB 3.0
So I set everything up, converted the internal drives to GPT. Clean installed Windows 10 and I believe everything is GPT/UEFI mode. I installed all of MSI's (somewhat old) drivers, configured BIOS settings, ran XMP profiles, got all my softwares running, and things were great.
Then things started going downhill.
It would, rarely, not POST.
Sometimes it would POST, but not boot (freeze during the "circling circles" loading screen).
Often times it would boot, but crash with a white or pink screen after a random amount of time - regardless of activity, could be gaming, youtube, microsoft word, etc..
Eventually, it became predictable, every morning I would boot it up, wait for it to get to the desktop, crash, and then hard power off (just tap the power button once) and boot it up again.
Oddly, it would run perfectly fine after that intial crash...but that's no way to live... not after $1100 of blood, sweat, and tears.
The Pagefile was enabled, size was large enough, but crash dumps weren't being written out.
Event viewer showed lots of critical events in the past hour, then kernel-power "the computer rebooted but was not shut down cleanly" or something to that effect. The only other events close to that were event id 14 nvlddmkm... I read a lot about this and tried some troubleshooting. I do use a steamlink hardware, to stream my PC from the bedroom to the living room. Crashes seem to happen regardless of whether I'm streaming, or using the computer "locally".
Desperate for answers I reverted back to some (age old) drivers. They definitely reduced performance in game and benchmark by about 20%... but they seemed to relieve at least temporarily the crashing and the nvlddmkm error. But then the issue started back up again. I tried updating drivers, it only made the issue more prevalent. I tried updating the card's video bios, same issue. I tried checking and tweaking all sorts of settings and issues - from power profiles to hardware encoding/decoding settings in browsers to modifying the TDR values....sfc, dism, etc. etc. I finally decided to RMA the card.
I sent it in to MSI and in its place popped in an old R7 370 4GB. While the card was much slower, (GTA V now playing at 15 fps hahaha), it worked with absolutely no issues. No reboots, no freezes, nothing.
So the GTX 1070 card was definitely bad, right? Or is it the software (drivers)?
I noticed though with the R7 in that my NVMe drive was idling at temps around 70c. It must be thermal throttling under load, I thought. So I tried to update the firmware, and it was unsuccessful. I contacted the company and they RMAd it for me. Sent me back the Pro version, and it had a firmware update so I went ahead and did that. Much better temperatures, so I would have peace of mind with that.
By the time I'm done with that, I got back from MSI a replacement GTX 1070 Armor OC 8GB (Grade B). It was a refurb.
Now I'm taking no chances, right? A new NVMe, up to date firmware... I did a clean install of the latest Windows 10 the media creation tool would give me. I used SDI to install only the latest, best matching drivers, with no overlap or redundancy. Although I did let Windows 10 auto-detect and install the video card drivers (by mistake, I forgot to unplug the network cable).
So... history repeats itself...
Get all my settings tweaked, programs installed, backups created... life is grand! Like I said, benchmarked at 98% and 97%... but now things are happening again. The same sorts of things as before.
I tried to update the video card drivers, but they wouldn't install. I did some research and figured out that Windows update installed the DCH version of drivers. No biggie, I will leave them on the DCH ones and I will set all my settings as before, maximum power, no power management for pci, fan curves, etc. etc.
Now problems are starting to arise.
It always posts, and maybe once or twice it has not booted to the desktop. But now once at the desktop... I noticed my mouse lags a little sometimes (4k @ 60hz resolution), and it looks like the video card is working harder than the previous one I had (shows 6 or 7% GPU use idling at desktop, RMAd card was 0-3%). Sometimes it looks like the cpu is in use, then it will clear up and not be an issue.
I can game in 4k, GTAV with a steady 45fps (up to 70plus depending on what's happening). But now, like before, I will get a random reboot. Could be watching youtube, or streaming content to the steamlink (setup in my living room so we can watch TV shows, movies, etc.) and it just reboots. Crash dump isn't written out, and I see a couple of event id 14s in the event log (not as frequently as it was before though).
I loaded BIOS optimized defaults, and fearing the video card again, I downloaded superposition https://benchmark.unigine.com/superposition to stress test it. I also downloaded MSI afterburner and GPU-Z. I let afterburner do its smart overclock and it created a nice curve, topping out around 2000MHz core clock. I then loaded up GPU-Z and ran superposition. It ran through successfully multiple passes. The sensors in GPU-Z all seemed perfectly normal, except for PerfCap Reason. From what I understand, this is the thing that's holding you back, what's capping your perfomance. Mine said something like PwrRel or VolRel... the tooltip said something to the effect of "reliable power". I'm not sure what that means or if it is even significant.
So here I am... confused as what to do next. That's why I'm asking you all. What do you think? I can do any troubleshooting you want and provide logs and screens and videos... any help or guidance would be appreciated.
I don't want a system that has to daily crash and reboot, in order to be "stable" especially after spending hard earned money on it. I don't want to keep RMAing random parts (having to pay to ship it is very annoying), and I don't know if there's something I could be missing... is my logic bad? The issues went away with an AMD card... they resumed on a clean install... is it NVIDIA's terrible drivers? This event ID 14 nvlddmkm issue has a long history... could MSI have given me two faulty cards in a row? Could it be my power supply isn't up to it? My motherboard is a refurb from MSI also, could that be it?
Please let me know what information you need and what steps to take. "Help me Obi-Wan Kenobi, you're my only hope..."