Hi,I'm also having this problem, it's been very frustrating. I actually RMA'ed my GPUs thinking they were the culprit, not a week after the replacements arrived the problem began again. It's infuriating as I will be in the middle of work and the PC will suddenly lock up.
I've noticed the problem for me tends to be worse if I work a full day in UE4, Substance Painter or any other GPU intensive program and then leave my PC on overnight. That seems to lessen the occurrences a little bit. A little bit being key there. I usually experience these crashes every other day, sometimes daily, but after a lot of tweaking the crashes happen much less often now, but they still do happen.
Another thing I've found that helped a little bit was to up the voltage a little bit to the SoC along with frequently restarting the GPU driver (Win+Ctrl+Shift+B). I'm not sure which made the most difference, but if I was a betting man I'd say it's the frequent restarting of the driver. But if someone else wants to try both of these out and see if they experience less crashing maybe can find out.
I've run through all the basic troubleshooting steps, like testing the cards in another PC, testing this system with an older AMD GPU, and all the hardware seems to be OK. Just found this thread, so at least I now know I'm not alone in this.
Now, if I'm not mistaken, some motherboards with multiple PCIe slots will have some of those slots using chipset lanes instead of CPU lanes, has anyone tested using a slot that passes through the chipset vs directly into the CPU?
I've been thinking of trying a single card in every PCIe slot and see if that makes a difference.
CPU: AMD Threadripper 3970X
Motherboard: Asus Zenith II Extreme
RAM: 256 GB Trident Z Neo
GPU(s): 2x Nvidia Titan RTX
Motherboard bios: Latest
Drivers: up to date
Device manager event:
Event ID 14/nvlddmkm
0d02(31c8) 00000000 00000000
Could you please watch the below video and tell me if that is the same thing you are experiencing as well.
If yes, please raise this issue with both nvidia and AMD.
Also, does setting the power management in nvcp to high perfromance help, that seems to be the only work around at this time and seems to be working for everyone who's tried it, I personally haven't done it due to the extra heat and power consumption, not ot say it is justa work around and things should work stock. Looking at your work case, you maybe ok with setting it to high performance.
Just letting you know.