• Now's your chance win big! Join our community and get entered to win a RTX 2060 GPU, plus more! Join here.

    Meet Stan Dmitriev of SurrogateTV on the Pi Cast TODAY! The show is live August 11th at 2:30 pm ET (7:30 PM BST). Watch live right here!

    Professional PC modder Mike Petereyns joins Scharon on the Tom's Hardware Show live on Thursday, August 13th at 3:00 pm ET (8:00 PM BST). Click here!

Question [Hang/Freeze/Crash] - Event ID 14 nvlddmkm, AMD+NVIDIA

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Jul 2, 2020
2
0
10
0
Hi All.

Same issue with the below hardware. New PC build, only 2+months old.
  • Intel i9-9900K
  • RTX 2080 TI EVGA XC Ultra
  • Asus ROG Strix Z390-F Gaming
 

Diceman_2037

Distinguished
Dec 19, 2011
15
1
18,515
0
ok, Did you watch the video I posted, like I said the system isn't hanging, I could still use the pc everytime, just everything was moving slow, once closing MSI afterburner even solved the issue. Two, things, if its a hardware defect, why am I able to constantly stress bnoth the gpu and cpu and not have an issue and why did it get resolved by closing MSI AB.
That is a common behavior associated with IO hangs as a result of bus error/retries or the device no longer responding to lspm commands properly.

Wouldn't a hardware defect with the controller be more apparent, like often causing instabilities rather than once is few weeks to months.
Not since PCIE users AER.

The bus will correct everything it can through FEC and fall back to packet resends ineivatbly.

Consider how slow downloading a file can be on faulty router via TCP, where TCP requests the packets to be resent.
 
May 26, 2020
7
0
10
0
That is a common behavior associated with IO hangs as a result of bus error/retries or the device no longer responding to lspm commands properly.



Not since PCIE users AER.

The bus will correct everything it can through FEC and fall back to packet resends ineivatbly.

Consider how slow downloading a file can be on faulty router via TCP, where TCP requests the packets to be resent.
What do you advise then for those meeting this problem of error 14 nvlddmkm with the combo cpu amd and gpu nvidia?

Waiting for updates or rma the processor directly or rma something else?

I don't know what to do with this problem and I'm getting lost in all the possible solutions.
 

Aravind92

Honorable
Apr 1, 2014
616
5
11,015
20
That is a common behavior associated with IO hangs as a result of bus error/retries or the device no longer responding to lspm commands properly.



Not since PCIE users AER.

The bus will correct everything it can through FEC and fall back to packet resends ineivatbly.

Consider how slow downloading a file can be on faulty router via TCP, where TCP requests the packets to be resent.
Gotcha, I will wait for it to happen again then, since AGESA 1.0.0.5 update, I've not had the issue as such but the Event viewer recorded a nvlddmkm error on the 3rd of June but not accompanied by the stuttering behaviourn as I was using the pc at the time and id not notice anything, was surprised to see the error there. Will hold for a little while to see if BISO updates help, if not, cpu RMA then, can you tell me how we can explain the issue to AMD though, will they immediately accept a RMA?

And hey thank you for taking the time out to explain it here. Guess, people whose boards have not been recieving updates should rma the CPU wihtout waiting on it, yeah, that will fix the issue? How likely it is to get a second CPU with the issue?

EDIT : Look at my comment below.
 
Last edited:

Aravind92

Honorable
Apr 1, 2014
616
5
11,015
20
So, was just googling about the issue and found a thread on nvidia forums, pretty sure this is the guy I spoke with on reddit. BTW, he said it happened on AGESA 1.0.0.6 as well and he switched to an intel pc now and gave his Ryzen board ram and cpu to someone else. Looks like, he actually changed his cpu and motherboard with the retailer after the issue started, Diceman even you seem to have commented on it. I am at loss now, I had rmaing the cpu and board as the final ditch, now that looks bust as it happened after he switched both. Looks like its a software/ BIOS issue or maybe he got faulty PCIE controller on two of his cpus in a row, how likely is that.

Maybe we should be looking into this in another way. I don't know really after this.


And a quote from another reddit user as per his conversation with nvidia support.

I contacted with NVIDIA and they told me that is an incompatibility issue that they're aware of it and it's more common with RTX series but it's gonna be hard to fix since it's on AMD's part and should be resolved with a BIOS fix from their part, they told me that it's because the use of virtual threads on multithreading in ryzen that doesn't link well with NVIDIA processes

Now this, amkes some sencse as people have mentioned disabling smt does solve the issue.

What do you make of this Diceman, when I contacted them about the martter, they said its forwarded to the research team or somehting along the lines, but I am sure nothing came of it.
 
Last edited:
Jul 3, 2020
3
0
10
0
after agesa updates the issue is still there, before when the pcie crashes it never recovers and will make the system unusable, now it will just go into quick blackscreen or app crash and recovers back to normal. ryzen 3000 have pcie issues and amd is just keep on doing work a rounds, the past WHEA errors are also related to pcie on which amd decided to hide the errors to stop users from complaining and now this agesa updates that just reset or recover pcie but not fixes the problem.
 
Jul 3, 2020
2
0
10
0
after agesa updates the issue is still there, before when the pcie crashes it never recovers and will make the system unusable, now it will just go into quick blackscreen or app crash and recovers back to normal. ryzen 3000 have pcie issues and amd is just keep on doing work a rounds, the past WHEA errors are also related to pcie on which amd decided to hide the errors to stop users from complaining and now this agesa updates that just reset or recover pcie but not fixes the problem.


I have the same issue. BUT only after updating nvidia drivers to 451.48 from 446.14.
Stutters on 446.14. Black screen on 451.48.

Motherboard: MSI MPG X570 Gaming Edge WiFi
CPU: Ryzen 3600
GPU: KFA2 RTX2070Super Black Edition
Bios: Latest beta with agesa combov2 1.0.0.2

Amd gpu rx570 on my setup works with no gpu issues, but sometimes wifi dissapears. With nvidia gpu i have no wifi issues (SIC!)

P.s sorry for my english :)


Edit:
Before drivers update i have another error code.
Before:
\Device\Video3
0cec(3098) 00000000 00000000

After:
\Device\Video3
0d02(31c8) 00000000 00000000
 
Last edited:
Jun 14, 2020
7
1
10
0
So far after changing the power settings I have not had a single crash, but then again, that could just a coincidence as the crashes are very sporadic

I just got my replacement EVGA 2080 Super via Advanced RMA, so I will probably keep it on high power for a few weeks and then turn it back down and see what happens

Funnily enough the new card has solved the EVGA fan grinding issue

I feel like next time I might just go Intel again...
 
Last edited:

Aravind92

Honorable
Apr 1, 2014
616
5
11,015
20
after agesa updates the issue is still there, before when the pcie crashes it never recovers and will make the system unusable, now it will just go into quick blackscreen or app crash and recovers back to normal. ryzen 3000 have pcie issues and amd is just keep on doing work a rounds, the past WHEA errors are also related to pcie on which amd decided to hide the errors to stop users from complaining and now this agesa updates that just reset or recover pcie but not fixes the problem.
Sorry, I am not getting it, nothing happened when the error was recorded on event viewer, no app crashes, no black screen nothing when it happened the last time.

Either way, are you saying it will not be fixed by changing the cpu, all ryzen 3000 cpus coming with defective pcie controller? one thing that baffles me is why it doesn't happen when the pc is under load? And why does is get fixed by putting nvidia's power management setting to prefer maximum performance?
 
Jul 3, 2020
3
0
10
0
Sorry, I am not getting it, nothing happened when the error was recorded on event viewer, no app crashes, no black screen nothing when it happened the last time.

Either way, are you saying it will not be fixed by changing the cpu, all ryzen 3000 cpus coming with defective pcie controller? one thing that baffles me is why it doesn't happen when the pc is under load? And why does is get fixed by putting nvidia's power management setting to prefer maximum performance?
the problem is related to pcie power savings, in normal scenario the pcie will go into low power mode when there is not enough load. there are ways to stop this pcie switching, force maximum performance in nvidia control panel, having more than 1 monitor also increases load, turning off browser hardware acceleration to stop the gpu/pcie from bouncing back between power saving and not.
 

Aravind92

Honorable
Apr 1, 2014
616
5
11,015
20
It just happened again after 2 months, seems to have happened on 3rd June 2020 based on event viewer, but then it did not stutter, now it did, Was starting to watch a live stream on discord and it started stiuttering, this is becoming a real annoyance and not able to confirm what is causing this is the worst, going to update windows nbvidia driver and BIOS update now.lets see.
 
Jun 14, 2020
7
1
10
0
Question for you all, what browsers are you using?

I don't know if its an unrelated issue but since turning the power settings back to optimal I have am having performance issues in Firefox but not chrome. Googling "Firefox nvidia" seems to show people having driver crashes only when running Firefox

EDIT: I think I have some actual reproducible problems now

I use Blue Iris NVR software, and using the WebUI sometimes the video will jerk down to 5fps and then back up to 35fps (Well above double the FPS of the feed!) every second causing horrible playback

When I look into X1 to see the clocks, its going from 300Mhz all the way to 1650MHz every second which is directly inline with the jerking

If I load up the Blue Iris WebUI in chrome, it sits at around 400-500MHz constantly and has no issues with playback

With the power settings turned to max in the NVIDIA control panel, the issue goes away entirely
 
Last edited:
Reactions: fluidz
Jul 3, 2020
2
0
10
0
Question for you all, what browsers are you using?

I don't know if its an unrelated issue but since turning the power settings back to optimal I have am having performance issues in Firefox but not chrome. Googling "Firefox nvidia" seems to show people having driver crashes only when running Firefox

EDIT: I think I have some actual reproducible problems now

I use Blue Iris NVR software, and using the WebUI sometimes the video will jerk down to 5fps and then back up to 35fps (Well above double the FPS of the feed!) every second causing horrible playback

When I look into X1 to see the clocks, its going from 300Mhz all the way to 1650MHz every second which is directly inline with the jerking

If I load up the Blue Iris WebUI in chrome, it sits at around 400-500MHz constantly and has no issues with playback

With the power settings turned to max in the NVIDIA control panel, the issue goes away entirely
Using chrome. Dont'have this issues, only event 14 sometimes
 

Aravind92

Honorable
Apr 1, 2014
616
5
11,015
20
Question for you all, what browsers are you using?

I don't know if its an unrelated issue but since turning the power settings back to optimal I have am having performance issues in Firefox but not chrome. Googling "Firefox nvidia" seems to show people having driver crashes only when running Firefox

EDIT: I think I have some actual reproducible problems now

I use Blue Iris NVR software, and using the WebUI sometimes the video will jerk down to 5fps and then back up to 35fps (Well above double the FPS of the feed!) every second causing horrible playback

When I look into X1 to see the clocks, its going from 300Mhz all the way to 1650MHz every second which is directly inline with the jerking

If I load up the Blue Iris WebUI in chrome, it sits at around 400-500MHz constantly and has no issues with playback

With the power settings turned to max in the NVIDIA control panel, the issue goes away entirely
I am using firefox as well.
 

RacAtat007

Distinguished
Aug 8, 2012
194
4
18,695
2
SOLVED WITH BIOS UPDATED

Specs -
CPU: R7 3700x
GPU: Gigabyte RTX 2060 6gb
Mobo: ASUS AM4 TUF Gaming X570
RAM: 16 GB Corsair Vengeance 3200 (2x8)


I was having exact same issue as described. My PC would randomly stutter really bad then usually freeze forcing me to power down and restart. This happened once or twice over a few weeks but seemed to get more common over time with no cause I could track down. I tried changing anything I could think of with no luck but after a BIOS update it seems to have been fixed. It's been about 3 weeks without the issue happening at all. If you have the same board as me I'm currently on BIOS ver 2203 with no issues. Hope this helps
 
May 18, 2020
19
0
10
0
One of these fixed it for me

updating X570 I AORUS PRO WIFI (rev. 1.0)
Lastest bio F20b

GPU maximum performance in Nvidia settings

and

Power saving PCIE Management off
 
Jun 14, 2020
7
1
10
0
Its been a while since I installed my card that has come back from RMA, and I'm declaring it fixed for me (knock on wood)

So the fix was a new card
 
Jul 16, 2020
2
0
10
0
I'm also having this problem, it's been very frustrating. I actually RMA'ed my GPUs thinking they were the culprit, not a week after the replacements arrived the problem began again. It's infuriating as I will be in the middle of work and the PC will suddenly lock up.

I've noticed the problem for me tends to be worse if I work a full day in UE4, Substance Painter or any other GPU intensive program and then leave my PC on overnight. That seems to lessen the occurrences a little bit. A little bit being key there. I usually experience these crashes every other day, sometimes daily, but after a lot of tweaking the crashes happen much less often now, but they still do happen.
Another thing I've found that helped a little bit was to up the voltage a little bit to the SoC along with frequently restarting the GPU driver (Win+Ctrl+Shift+B). I'm not sure which made the most difference, but if I was a betting man I'd say it's the frequent restarting of the driver. But if someone else wants to try both of these out and see if they experience less crashing maybe can find out.

I've run through all the basic troubleshooting steps, like testing the cards in another PC, testing this system with an older AMD GPU, and all the hardware seems to be OK. Just found this thread, so at least I now know I'm not alone in this.

Now, if I'm not mistaken, some motherboards with multiple PCIe slots will have some of those slots using chipset lanes instead of CPU lanes, has anyone tested using a slot that passes through the chipset vs directly into the CPU?
I've been thinking of trying a single card in every PCIe slot and see if that makes a difference.

Specs:
CPU: AMD Threadripper 3970X
Motherboard: Asus Zenith II Extreme
RAM: 256 GB Trident Z Neo
GPU(s): 2x Nvidia Titan RTX
Motherboard bios: Latest
Drivers: up to date

Device manager event:

Event ID 14/nvlddmkm

\Device\00000192
0d02(31c8) 00000000 00000000
 

Aravind92

Honorable
Apr 1, 2014
616
5
11,015
20
Hi Guys,

I've contacted both Nvidia and Amd's support on this matter. AMD responded saying it is sent to their engineering team for them to research.

With Nvidia, I got the case escalated to 2nd level, the 1st level agent hardly understood the issue and honestly wasn't very helpful, just giving me the generic troubleshooting steps. But the 2nd level agent has been working and researching on the matter.

Could you guys please contact them so that both the companies are aware that it is a widespread issue and swapping hardware components doesn't help. With nvidia please have it escalated.
 
Reactions: fluidz

ASK THE COMMUNITY

TRENDING THREADS