Question Since upgrading to NVMe SSD, frequent system failures (BSOD), long wake-up from hibernation

Feb 16, 2019
9
0
10
Hi,

I previously ran my computer with Windows 7 on a 240 GB Sandisk SSD Plus in SATA, with an Intel i7 6700K CPU, Asus Maximus Hero VIII and 4x4GB of RAM ; it worked generally fine, and I made extensive use of the hibernation function, usually not doing a full reboot for weeks or even months. A few months ago, I installed a Samsung 950 Pro NVMe / PCI-Express SSD (purchased used but it's perfectly operational according to HD Sentinel), and since then I've had several issues :
frequent system crashes / BSOD (after 2 to 10 days of hibernation / wake-up cycles) with no explanation, no error warning in Event viewer, no particular pattern, except that sometimes the display flickers briefly a few seconds before ; sometimes the error on the blue screen is “PAGE_FAULT_IN_NONPAGED_AREA”, but the last one was “PFN_LIST_CORRUPT” (it's displayed very briefly so it's tricky to make note of the displayed informations or reach for a camera to take a picture before the reboot) ;
the delay of waking-up from hibernation is way longer than it used to be (I expected it to be blazingly fast !), about 5 minutes on average, while the boot time seems about normal and the wake-up from sleep is almost instantaneous ;
– SATA devices are no longer recognized automatically, or at least not in a consistent manner : sometimes I insert a HDD (I have several hot-swap bays on my case) and have to manually check for new devices in the Device manager, then the drive letter pops up, but then it happens that another HDD which was connected just disappears, and I have to unplug it and plug it again, re-check for new devices, hoping that it won't make another one disappear... (Beside the SSD, I currently have 5 HDDs connected ; all of them are fine according to HD Sentinel.)
[=> That third problem was partly solved when I upgraded Intel Rapid Storage Technology driver : the newly connected SATA drives are recognized, but still sometimes I can't get them to unmount so as to safely remove them, even though no particular process is accessing them.]

Other than that, I haven't noticed a spectacular performance improvement, it pretty much feels the same as before with regards to reactivity (only when I do something special like searching a keyword through the entire system partition with WinHex does it proceed much quicker, but even that is not a night-and-day difference) ; the only significant upside is that I now have an extra SATA port to plug an extra HDD... (Which is one of the reasons why I installed a PCI-E SSD in the first place, and also one of the reasons why I opted for that motherboard even though I'm not interested in current video games, as I tend to distrust external enclosures, which can make the drives run too hot, often have flimsy contacts and a poor quality power supply. But if that is the only improvement it's not worth the 80€ I paid for it.)

If I remember correctly, I cloned the Sandisk SATA SSD to the Samsung PCI-E SSD with GParted from a Lubuntu live system. I had to tinker with the UEFI settings quite a few times, don't remember exactly what I changed, only that it was a huge hassle. Since I didn't have the drive's documentation, I didn't even know that specific drivers were required ; I installed the most recent version and there is no warning in Device manager. Since I installed that drive at the end of the summer, I figured that maybe some of those issues could be caused by overheating (it reached a max. temperature of 55°C on September 5th), but now it couldn't be a factor (according to HD Sentinel it hasn't reached more than 49°C since October 25th and there's no correlation between the dates of highest temperatures and the dates of system failures).

So, what can I do to troubleshoot those issues, and fix them, short of going back to the SATA SSD ?
Did I do something wrong, or something generally not advised ? In particular, is cloning a Windows system with GParted expected to work or are there known issues ? Are there specific known issues when cloning to a NVMe SSD ?
Are there specific known issues with this particular model of SSD and this particular motherboard ?
Could it be something else entirely ? (On another forum there were suggestions about defective memory modules, or an insufficient / defective power supply, but both types of issues are unlikely to appear precisely when I installed the new drive...)

Thanks!
 

Colif

Win 11 Master
Moderator
Can you set up windows to create a small minidump after next BSOD - see here and after the next BSOD, go to C windows/minidump
copy that file to documents
upload the copy from documents to a file sharing web site, and share the link here and I will get someone to convert file into a format I can read
 
Feb 16, 2019
9
0
10
I made the required change but got this warning :
https://www.cjoint.com/c/IBqlLLY6FxL
---------------------------
Propriétés système
---------------------------
Windows n’a pas pu enregistrer les données qui auraient facilité l’identification des erreurs système, car votre fichier d’échange actuel est désactivé ou est inférieur à 1 mégaoctets. Cliquez sur OK pour revenir dans la fenêtre des paramètres pour la mémoire virtuelle, activer le fichier d’échange et affecter à la taille une valeur supérieure à 1 mégaoctets, ou bien cliquez sur Annuler pour modifier la sélection de vidage mémoire.
---------------------------
OK Annuler
---------------------------
Yet there is a swap file on C: with a current size of 13399MB (set to “géré par le système” / “managed by the system”, disabled on all other drives connected).
 

Colif

Win 11 Master
Moderator
-------------------------- System Properties --------------------------- Windows could not save data that would have made it easier to identify system errors because your current swap file is disabled or is less than 1 megabyte. Click OK to return to the virtual memory settings window, enable the swap file and set the size to a value greater than 1 megabyte, or click Cancel to change the dump selection. --------------------------- OK Cancel

hmmm

can you go to disk management and take a screenshot - upload it to an image sharing web site and show image here? Was the NVME drive the only drive installed when you put win 10 on?
 
Feb 16, 2019
9
0
10
Disk Management screenshot :
https://www.cjoint.com/c/IBrecK0TZDL

As I wrote in the first post, it's a Windows 7 system, and I cloned it from a SATA SSD, it was not a new install. I don't remember exactly how I proceeded, but most likely I removed all the other drives for this operation, as I'm aware that cloning partitions is always somewhat risky.

When there's a failure / BSOD, usually the machine is in light use (using Firefox, writing...), or even idle (the most recent happened yesterday morning while I was asleep), it's not correlated with a particularly intense workload ; there are typically several programs opened at the same time, but the CPU usage is usually around 5%. It could happen while the system was still on the SATA SSD, but generally after weeks or months without reboot, certainly not at this frequency.

Again, the other problem is that the wake-up from hibernation is very slow, I have no idea if those two problems can be related. When I turn on the computer, the Windows logo appears very slowly (~ 2 FPS), then remains displayed for 4-5 minutes while the storage device activity LED flickers continuously, then suddenly the password screen appears and I can access the desktop. It's puzzling because considering the read rate of this drive, at least 1500MB/s, it should take about 8 seconds to load the 12GB hiberfil.sys to memory, and 1 minute to parse the whole 100GB system partition (which has no reason to happen during start up as far as I know, even CHKDSK parses only key filesystem structures). Is there any way to know what is happening and why it's taking so long ?
I found warnings regarding this slow wake-up time in the Event Viewer, under Diagnostics-Performance :
https://www.cjoint.com/c/IBreJ762lxL
If that's any relevant, here's what I find when I click on the Details tab for the latest warning :

- System
- Provider
[ Name] Microsoft-Windows-Diagnostics-Performance
[ Guid] {CFC18EC0-96B1-4EBA-961B-622CAEE05B0A}
EventID 300
Version 1
Level 1
Task 4003
Opcode 36
Keywords 0x8000000000010000
- TimeCreated
[ SystemTime] 2019-02-16T20:05:48.308000000Z
EventRecordID 1232
- Correlation
[ ActivityID] {038F5C70-F800-0006-6063-84588DC5D401}
- Execution
[ ProcessID] 1716
[ ThreadID] 8224
Channel Microsoft-Windows-Diagnostics-Performance/Operational
Computer Gabriel-PC
- Security
[ UserID] S-1-5-19
- EventData
StandbyTsVersion 1
StandbyAppCount 48
StandbyServicesCount 24
StandbyDevicesCount 184
StandbyStartTime 2019-02-16T13:12:25.381475800Z
StandbyEndTime 2019-02-16T13:13:07.256475800Z
StandbySuspendTotal 41875
StandbySuspendTotalChange 0
StandbySuspendQueryApps 0
StandbySuspendQueryAppsChange 0
StandbySuspendQueryServices 0
StandbySuspendQueryServicesChange 0
StandbySuspendApps 2841
StandbySuspendAppsChange 0
StandbySuspendServices 10
StandbySuspendServicesChange 0
StandbySuspendShowUI 0
StandbySuspendShowUIChange 0
StandbySuspendSuperfetchPageIn 28
StandbySuspendSuperfetchPageInChange 0
StandbySuspendWinlogon 1
StandbySuspendWinlogonChange 0
StandbySuspendLockPageableSections 0
StandbySuspendLockPageableSectionsChange 0
StandbySuspendPreSleepCallbacks 0
StandbySuspendPreSleepCallbacksChange 0
StandbySuspendSwapInWorkerThreads 0
StandbySuspendSwapInWorkerThreadsChange 0
StandbySuspendQueryDevices 0
StandbySuspendQueryDevicesChange 0
StandbySuspendFlushVolumes 115
StandbySuspendFlushVolumesChange 0
StandbySuspendSuspendDevices 4011
StandbySuspendSuspendDevicesChange 0
StandbySuspendHibernateWrite 34864
StandbySuspendHibernateWriteChange 9664
ResumeStartTime 2019-02-16T19:58:25.669891300Z
ResumeEndTime 2019-02-16T20:05:44.374918900Z
StandbyResumeTotal 438705
StandbyResumeTotalChange 127427
StandbyResumeHibernateRead 438085
StandbyResumeHibernateReadChange 127439
StandbyResumeS3BiosInitTime 0
StandbyResumeS3BiosInitTimeChange 0
StandbyResumeResumeDevices 620
StandbyResumeResumeDevicesChange 0
StandbyRootCauseDegradationGradual 0
StandbyRootCauseImprovementGradual 0
StandbyRootCauseDegradationStep 114688
StandbyRootCauseImprovementStep 0
StandbyIsDegradation true
StandbyIsTroubleshooterLaunched true
StandbyIsRootCauseIdentified true
 
Last edited:
Feb 16, 2019
9
0
10
So... it happened again tonight... used to piss me off but sadly I'm getting used to it... é_è
I was watching a video, there was a warning pop-up saying that Firefox stopped functioning, the video froze, and bam, then a split second later came the sodding BSOD. It was very quick but I could read that the error name was “SYSTEM_SERVICE_EXCEPTION”.
There's no new file in Minidump, nothing in the Event Viewer.
Any clue ?
 
Feb 16, 2019
9
0
10
“Can you upload the screenshots to imjur? As the links you gave are flagged for malware on chrome and i've never heard of the website. ”
Damn, that's right, same with Firefox... Used it for years, always worked fine until today... I must be cursed... ‘O_O
And I don't like to go to Imgur as it always displays a gazillion of “meme” image previews to which I'm attracted like a moth to a neon light...
Those ones should work :
“Run a memtest and see if it finds any errors”
I'll try Memtest later – it has to run when no system is loaded, right ?

[NerdyComputerGuy] “also did you basically clone the OS and files from a HDD to an SSD and then use the SSD in the same computer?”
[winlog666] “did you copy the OS?”
Again, I cloned the partitions from a SATA SSD to a PCI-E SSD using GParted (the “system reserved” partition, the 100Go Windows 7 partition, and another 100Go partition for temporary storage, kept the same sizes), then removed the SATA SSD and booted Windows from the PCI-E SSD from then on.
Yes it's the same computer otherwise (except for the storage devices which are moved and removed quite regularly).

I haven't said before : the process of hibernation (copying RAM to SSD and turning off the computer) is very fast, as it should. Only the wake-up process (turning on the computer and copying hiberfil from SSD to RAM) is abnormally slow.
 
Feb 16, 2019
9
0
10
error-gif-corbeau.gif


Well, that's frustrating... 😕
 
Feb 16, 2019
9
0
10
So... noone here has any more clue ? Is there another forum that would be more suited for that particular issue ?

I still haven't done the memory test (tried earlier this morning with memtest on a small SD card but the pesky USB memory card reader wasn't recognized and the Windows session was resumed, now I have other stuff to do), but I doubt it's the culprit.
 
Feb 16, 2019
9
0
10
So :
I made the memtest analysis, one complete pass, it found no error. Is it enough to warrant that the memory sticks are fine ? On another forum they recommend to run the full analysis for 8 passes, and in this guide they recommend to test each stick and each slot independently... seems a bit overkill to me...

Also :
A few days ago, on Feb. 22, I saw the early signs of an impending system failure (subtle instability of the display, i.e. shaking / flickering in some areas of the screen – I don't see this always, sometimes it just crashes right away), I quickly initiated the deep sleep / hibernation process, and it worked. Then the next morning when I resumed the session, it seemed to work well, only Firefox had shut down (I don't know if it has shut down before the system went into hibernation or right after the session was resumed), I did two backups with 7Zip, no sign of trouble (usually files saved right before a failure appear empty, or at least the part that should have been added appears as “null” bytes when I open the file later on). But when I opened Firefox, as soon as the “Recover session” screen appeared, Firefox froze, then as it wouldn't close I tried to shut down the process in Task manager, but the Task manager window froze as well, I tried to open the start menu but the task bar froze, then every open window which I could still access and bring to the fore froze as soon as I tried anything, like simply hovering over the adress bar with the mouse pointer. And yet the mouse and keyboard were still responsive. But as I could do absolutely nothing at that point I had to reboot.
Then on Feb. 23 I had another system failure with an error I hadn't seen yet : BAD_POOL_CALLER. Then on Feb. 24, Firefox crashed (which does not happen often), and I decided to not restart it, to see what would happen. And it turns out that I haven't had a new global system failure since then – I only started Firefox earlier today. So, could Firefox itself be a major cause of those frequent system failures ? If so, that would have been reported by many users having a similar setup... I currently have version 65.0.1 64 bits. I usually don't proceed with proposed updates right away, I wait at least until I need to restart Firefox for other reasons (like a crash of the application or a global system failure...), and sometimes delay the download of the update by clicking on “later” when the nagging pop-up appears. Could the fact of running a slightly outdated version have some bearing with that kind of issue ? I have many tabs in my main profile (~200), but usually only 3-4 are active simultaneously, and normally the others shouldn't even be loaded to memory. Right now, there are 7 Firefox processes running, using from 30 to 500MB each.

I still have nothing new in the “Minidump” folder, despite the fact that the option appears activated (but each time I disable it then re-enable it, I get that same warning I copied earlier). Does anyone know how to solve this issue ? If not, where could I find the exact phrasing of the warning I copied above in original english text, so that I could do a Web search with better odds of finding relevant information ?

If noone has any clue whatsoever, which other forum(s) would be recommended to ask about that kind of issue ?

Thanks.


A collection of BSODs pictures :
2018/07/28

2018/09/17

2018/11/04

2018/11/05

2018/11/08

2018/11/08 (2)

2018/11/08 (3)

2018/11/08 (4)

2018/11/18

2018/12/23

2018/12/24

2019/01/07

2019/01/14

2019/01/28

2019/02/11

2019/02/13

2019/02/19

2019/02/21

2019/02/23


End of Memtest analysis (2019/02/25) :
 
Feb 16, 2019
9
0
10
Well, seems like noone gives a damn at this point, but : I had no sodding BSOD for about 10 days, between Feb. 24th and March 6th. On Feb. 22nd I have updated Malwarebytes Antimalware, which activated a 14 days trial period, during which the background protection features were active. Could it be related with those issues, or is it most likely a coincidence ?

Now I'm gonna sit by the front porch, grab a beer and FHRITP (just a test, to see if anyone is still reading ! ;^p).