Question NVMe drive acting weird, DPC_Watchdog_Violation and more ?

Skrybe

Prominent
Jun 22, 2021
35
4
545
Apologies in advance, this will be long because I want to provide as much info as possible.

I tried to copy my Witcher 3 GOG setup folder from my Samsung 970 EVO 1TB NVMe SSD (boot drive) to my Samsung 860 QVO 4TB (data drive). The folder is about 49GB. When I tried to copy it threw an error about one of the .bin files (not able to read I think) and giving an option to skip which I took. Not long after the file copying speed dropped to 1mb and after about a minute the PC crashed with a DPC_Watchdog_Violation.

I let it sit for about 5 minutes to give it a chance to "collect some error info" but it never progressed past 0% so I rebooted. On reboot the PC just hung at a black screen for more than a minute before finally coming up with an error about no available boot drives, please reboot and set one in the bios. (Sorry I didn't capture exact wording).

Checking in the BIOS, my NVMe drive had vanished and my first boot device was an 860 QVO (which is not a boot drive). I checked all the settings I could find and no NVMe drive. I tried CSM on/off, changed boot options from legacy to uefi first and none of the changes helped at all. Every time I rebooted it wouldn't detect the NVMe drive.

I pulled the NVMe drive out of the PC to have a look at it for visible damage but couldn't see any. I left the drive sitting on my desk for a couple hours while I tried to arrange to try to drive in a friends PC. Put the drive back into my PC so I could take it to my friends and I tried another boot, and lo and behold it booted into Windows just fine.

I immediately ran Samsung Magician and it says the drive health of all my drives is good and the temps are normal. I can't really tell what I should be looking for in the SMART info to see if something is going wrong. But the "Critical Warning" value is 0 which seems good at least. There are values in the "Media Errors" and "Number of Error Information Log Entries" but I don't know whether they're problematic. I can post the SMART values here if it's of use.

I thought I'd copy files off the NVMe drive to my QVO to make sure they're backed up in case the drive actually dies and I've hit copy errors again on several files (like 100ish MB video files I shot on holiday). I skipped them and the copy finished correctly. I went back and checked those files and they all play just fine in VLC. I tried copying them again individually and they copied just fine. I haven't tried copying the Witcher installers again because I'm a bit worried I'll hose my PC again if I do. I will give it a try after posting this though.

Checking the system even log I can see a bunch (several dozen) of errors "The device, \Device\Harddisk5\DR5, has a bad block." Harddisk5 is the NVMe drive. Looking at the times they correspond to the copying I just mentioned. Going back further there seem to be about a dozen similar errors at the time I got the DPC_Watchdog_Error. The last entry before the crash appears to be a bad block error. The next entry in the log is several hours later and corresponds to when I managed to successfully boot it again.

I've run a chkdsk on the NVMe drive and it says "No errors found". Just run SFC and "did not find any integrity violations".

In terms of hardware I've had the NVMe drive for 2 years and never had problems with it. The QVO drive for well over a year and again no problems. Got Kingston 64GB (2x32GB) HyperX Fury 3200MHz DDR4 RAM and have had it for more than 6 months. I have recently upgraded the bios on the mobo (Asus Crosshair VII Wifi) so that I could use a Ryzen 5800X on it. But that's been running just fine for a few weeks now. I also switched out a GTX1070 for an Asrock Phantom Gaming D RX6900XT and again that's been running happily for a few weeks. PSU is "only" a Corsair AX750i but even as peak I've never seen power draw over 620W (joy of digital monitoring) and that's when benchmarking or gaming. Running Windows 10 64bit which is obviously up to date. No overclocking, though I am running the DOCP settings for my RAM.

So what should my next step be? I'm trying to figure out whether my NVMe drive is failing or whether there is another hardware problem. Or maybe it's a windows or driver problem. I suppose it's possible that the crash was just a random thing, but the fact the NVMe drive disappeared from BIOS for about two hours, and after managing to get it back it's showing some bad blocks worries me. I had an old Sata SSD die on me and basically you get no warning. So I'm trying to determine whether I should buy a new drive or try to deal with Samsung about warranty on this one.
 
Last edited by a moderator:
Apologies in advance, this will be long because I want to provide as much info as possible.

I tried to copy my Witcher 3 GOG setup folder from my Samsung 970 EVO 1TB NVMe SSD (boot drive) to my Samsung 860 QVO 4TB (data drive). The folder is about 49GB. When I tried to copy it threw an error about one of the .bin files (not able to read I think) and giving an option to skip which I took. Not long after the file copying speed dropped to 1mb and after about a minute the PC crashed with a DPC_Watchdog_Violation.

I let it sit for about 5 minutes to give it a chance to "collect some error info" but it never progressed past 0% so I rebooted. On reboot the PC just hung at a black screen for more than a minute before finally coming up with an error about no available boot drives, please reboot and set one in the bios. (Sorry I didn't capture exact wording).

Checking in the BIOS, my NVMe drive had vanished and my first boot device was an 860 QVO (which is not a boot drive). I checked all the settings I could find and no NVMe drive. I tried CSM on/off, changed boot options from legacy to uefi first and none of the changes helped at all. Every time I rebooted it wouldn't detect the NVMe drive.

I pulled the NVMe drive out of the PC to have a look at it for visible damage but couldn't see any. I left the drive sitting on my desk for a couple hours while I tried to arrange to try to drive in a friends PC. Put the drive back into my PC so I could take it to my friends and I tried another boot, and lo and behold it booted into Windows just fine.

I immediately ran Samsung Magician and it says the drive health of all my drives is good and the temps are normal. I can't really tell what I should be looking for in the SMART info to see if something is going wrong. But the "Critical Warning" value is 0 which seems good at least. There are values in the "Media Errors" and "Number of Error Information Log Entries" but I don't know whether they're problematic. I can post the SMART values here if it's of use.

I thought I'd copy files off the NVMe drive to my QVO to make sure they're backed up in case the drive actually dies and I've hit copy errors again on several files (like 100ish MB video files I shot on holiday). I skipped them and the copy finished correctly. I went back and checked those files and they all play just fine in VLC. I tried copying them again individually and they copied just fine. I haven't tried copying the Witcher installers again because I'm a bit worried I'll hose my PC again if I do. I will give it a try after posting this though.

Checking the system even log I can see a bunch (several dozen) of errors "The device, \Device\Harddisk5\DR5, has a bad block." Harddisk5 is the NVMe drive. Looking at the times they correspond to the copying I just mentioned. Going back further there seem to be about a dozen similar errors at the time I got the DPC_Watchdog_Error. The last entry before the crash appears to be a bad block error. The next entry in the log is several hours later and corresponds to when I managed to successfully boot it again.

I've run a chkdsk on the NVMe drive and it says "No errors found". Just run SFC and "did not find any integrity violations".

In terms of hardware I've had the NVMe drive for 2 years and never had problems with it. The QVO drive for well over a year and again no problems. Got Kingston 64GB (2x32GB) HyperX Fury 3200MHz DDR4 RAM and have had it for more than 6 months. I have recently upgraded the bios on the mobo (Asus Crosshair VII Wifi) so that I could use a Ryzen 5800X on it. But that's been running just fine for a few weeks now. I also switched out a GTX1070 for an Asrock Phantom Gaming D RX6900XT and again that's been running happily for a few weeks. PSU is "only" a Corsair AX750i but even as peak I've never seen power draw over 620W (joy of digital monitoring) and that's when benchmarking or gaming. Running Windows 10 64bit which is obviously up to date. No overclocking, though I am running the DOCP settings for my RAM.

So what should my next step be? I'm trying to figure out whether my NVMe drive is failing or whether there is another hardware problem. Or maybe it's a windows or driver problem. I suppose it's possible that the crash was just a random thing, but the fact the NVMe drive disappeared from BIOS for about two hours, and after managing to get it back it's showing some bad blocks worries me. I had an old Sata SSD die on me and basically you get no warning. So I'm trying to determine whether I should buy a new drive or try to deal with Samsung about warranty on this one.
Might have just been a bad connection with the slot.
Run the samsung diags as a test.
Run your normal stuff and watch the temps.
Check if trim is enabled in the OS.
Set optimize to run on a schedule.
 

Skrybe

Prominent
Jun 22, 2021
35
4
545
I tried running the Magician diagnostics when I managed to get it to boot again. You can't run diagnostics on the nVME drive, it's not supported. The QVO drive was fine though.

Temps are all fine. Trim is enabled and has been for as long as I can remember. Optimise is set to weekly already.

I hope you're right about it just being a bad connection, though I can't see how considering I'd been using it for two years in that slot. But fingers crossed maybe that's all it was. Any further ideas on things to check would be appreciated.
 
I tried running the Magician diagnostics when I managed to get it to boot again. You can't run diagnostics on the nVME drive, it's not supported. The QVO drive was fine though.

Temps are all fine. Trim is enabled and has been for as long as I can remember. Optimise is set to weekly already.

I hope you're right about it just being a bad connection, though I can't see how considering I'd been using it for two years in that slot. But fingers crossed maybe that's all it was. Any further ideas on things to check would be appreciated.
Some reading.
https://www.samsung.com/semiconduct...ng_Magician_6_2_0_Installation_Guide_v1.2.pdf
 

Skrybe

Prominent
Jun 22, 2021
35
4
545
Just realised there is actually a 6.3 version out now (update check in magician wasn't detecting it). I'll install that and post results shortly.

Thanks for your help.
 

Skrybe

Prominent
Jun 22, 2021
35
4
545
Ok, smart results for the NVMe drive.

S.M.A.R.T.​
6/24/2021​
Model Name​
Samsung SSD 970 EVO 1TB​
Serial Number​
S467NF0K600236F​
Drive Type​
NVMe​
Result​
Byte​
Description​
Raw Data​
Status​
0​
Critical Warning​
0x0​
OK​
2:1​
Temperature (C)​
36​
OK​
3​
Available Spare​
0x58​
OK​
4​
Available Spare Threshold​
0xa​
OK​
5​
Percentage Used​
0x2​
OK​
47:32​
Data Units Read​
0x37070f3​
OK​
63:48​
Data Units Written​
0x6648d30​
OK​
79:64​
Host Read Commands​
0x5b700d0c​
OK​
95:80​
Host Write Commands​
0x9a139a06​
OK​
111:96​
Controller Busy Time​
0x1652​
OK​
127:112​
Power Cycles​
0xa4​
OK​
143:128​
Power On Hours​
0x2f83​
OK​
159:144​
Unsafe Shutdowns​
0x2b​
OK​
175:160​
Media Errors​
0x5a​
OK​
191:176​
Number of Error Information Log Entries​
0x137​
OK​
195:192​
Warning Composite Temperature Time​
0x0​
OK​
199:196​
Critical Composite Temperature Time​
0x0​
OK​
201:200​
Temperature Sensor 1​
0x135​
OK​
203:202​
Temperature Sensor 2​
0x13d​
OK​
205:204​
Temperature Sensor 3​
0x0​
OK​
207:206​
Temperature Sensor 4​
0x0​
OK​
209:208​
Temperature Sensor 5​
0x0​
OK​
211:210​
Temperature Sensor 6​
0x0​
OK​
213:212​
Temperature Sensor 7​
0x0​
OK​
215:214​
Temperature Sensor 8​
0x0​
OK​
 

Skrybe

Prominent
Jun 22, 2021
35
4
545
And SMART results for the 860 QVO.

S.M.A.R.T.​
6/24/2021​
Model Name​
Samsung SSD 860 QVO 4TB​
Serial Number​
S4CXNF0M419126E​
Drive Type​
SATA​
Result​
ID​
Description​
Threshold​
Current Value​
Worst Value​
Raw Data​
Status​
5​
Reallocated Sector Count​
10​
100​
100​
0​
OK​
9​
Power-on Hours​
0​
96​
96​
17195​
OK​
12​
Power-on Count​
0​
99​
99​
54​
OK​
177​
Wear Leveling Count​
0​
97​
97​
19​
OK​
179​
Used Reserved Block Count (total)​
10​
100​
100​
0​
OK​
181​
Program Fail Count (total)​
10​
100​
100​
0​
OK​
182​
Erase Fail Count (total)​
10​
100​
100​
0​
OK​
183​
Runtime Bad Count (total)​
10​
100​
100​
0​
OK​
187​
Uncorrectable Error Count​
0​
100​
100​
0​
OK​
190​
Airflow Temperature​
0​
77​
48​
23​
OK​
195​
ECC Error Rate​
0​
200​
200​
0​
OK​
199​
CRC Error Count​
0​
100​
100​
0​
OK​
235​
POR Recovery Count​
0​
99​
99​
9​
OK​
241​
Total LBAs Written​
0​
99​
99​
78829502697​
OK​