Question How to log temperature and/or RAM usage to diagnose system crashes?

itm

Distinguished
Apr 10, 2004
215
2
18,695
My Windows 10 system has become extremely unstable over the last week or so. It usually runs 24/7, but in the last week it has been unresponsive when I returned to it in the morning - both displays were blank, and the system would not respond to Ctrl-Alt-Delete or any other keyboard input. I can still ping it from other machines on the LAN, and can also access shared drives on it from other machines. I've noticed that the machine had been running at 98-99% RAM usage most of the time, despite having 32GB of RAM and not many apps being open, so I suspect that the crash is being caused by the system running out of RAM. The machine is Ryzen 7 2700X / Gigabyte B450 Aorus Elite. The other possibility is that it has been overheating.
Does anyone know of a way of regularly writing RAM usage and/or system temperature to a log file, so that I can see what the situation was around the time of each crash?
 

itm

Distinguished
Apr 10, 2004
215
2
18,695
I run regular malware scans and there's nothing being flagged at the moment. Task Manager usually shows Google Chrome as the largest consumer of ram - e.g. it's using 1.5GB at the moment (and overall system RAM usage is 60% - of the 32GB available).
What's making it hard to diagnose is that the system is crashing when it's dormant (i.e. overnight) - hence why I want to log items to a file that I can refer to after I reboot.
 

Ralston18

Titan
Moderator
Look in Reliability History and Event Viewer for error codes, warnings, and even informational events.

Reliability History is much more user friendly and presents a timeline (i.e., "overnight") format that can be very revealing.

Right clicking any given entry may provide additional information about what happened. That additional information may or may not be helpful.
 
My Windows 10 system has become extremely unstable over the last week or so. It usually runs 24/7, but in the last week it has been unresponsive when I returned to it in the morning - both displays were blank, and the system would not respond to Ctrl-Alt-Delete or any other keyboard input. I can still ping it from other machines on the LAN, and can also access shared drives on it from other machines. I've noticed that the machine had been running at 98-99% RAM usage most of the time, despite having 32GB of RAM and not many apps being open, so I suspect that the crash is being caused by the system running out of RAM. The machine is Ryzen 7 2700X / Gigabyte B450 Aorus Elite. The other possibility is that it has been overheating.
Does anyone know of a way of regularly writing RAM usage and/or system temperature to a log file, so that I can see what the situation was around the time of each crash?

HWInfo64 will log it's sensor readings to a .CSV file that you can then use to make charts and graphs with Excel or similar application.
 

itm

Distinguished
Apr 10, 2004
215
2
18,695
Many thanks for the recommendations. I have never come across Reliability History before. It shows a daily pattern of system failures starting from the date of the Windows updates on October 21st. I tried uninstalling the main Windows feature update which was installed on that day, but the system crashed again the following morning and the update just re-installed itself. Microsoft's update policy is pretty infuriating sometimes.
I also tried removing the Security Update which was installed on that day but Windows was not able to remove it.
So the daily pattern of failures doesn't show a consistent pattern of a particular application failing. Here's what it reported at the time of the crash on each day:
  • Day 1 of problems (@15:05) - The program chrome.exe version 95.0.4638.54 stopped interacting with Windows and was closed. To see if more information about the problem is available, check the problem history in the Security and Maintenance control panel.
  • Day 2 of problems (15:06) OneDrive.exe Stopped working - APPCRASH
  • Day 3 of problems (@01:57) MoUSO Core Worker Process Stopped working - Problem Event Name: BEX64
  • Day 4 of problems (@03:23) nvcontainer.exe Stopped Working.

There was also at least one Hardware Error in the log for each of the days that I had problems, but the technical details didn't shed much light on it. This is what it says for each:
A problem with your hardware caused Windows to stop working correctly.
Problem Event Name: LiveKernelEvent
Code: 144
Parameter 1: 3003
Parameter 2: ffffcf0bac72b6b0
Parameter 3: 40010000
Parameter 4: 0
OS version: 10_0_19043
Service Pack: 0_0
Product: 256_1
OS Version: 10.0.19043.2.0.0.256.48
Locale ID: 2057

The timing of the hardware errors does not correspond to the times of the system crashes, however. Most of them were reported at the time that I rebooted the machine.

I've installed HWiNFO64. As far as I can see I can save a snapshot of the system at a point in time (using the Report option), but I don't see any way of maintaining an ongoing log that I can use to retrospectively diagnose a system failure after the event. Is this possible?
 

itm

Distinguished
Apr 10, 2004
215
2
18,695
So I think I might be homing in on the problem...memory usage seems to leap whenever a large file copy operation takes place. For example, copying about 850MB of files from the local disk to a LAN share caused a memory increase of 20% (>6GB), and this memory was not released when the copy completes. As I have overnight backups scheduled, that might explain why the machine was always dead first thing in the morning.
So my first thought was antivirus software. I had AVG (free edition) installed, so I uninstalled it. I also had Panda Dome installed (but disabled), so I uninstalled that as well. When I retried exactly the same file copy operation it did not cause a significant increase in RAM usage, so I'll see how things are for the next 24 hours.....
 

TRENDING THREADS