Hello fellows - Charles here again.
I appreciate the comments, but I should clarify some of my own comments. But I do acknowledge the thought you both have put into this.
1. As a point of reference, I am a VERY experienced hardware engineer, having worked with ultrahigh reliability designs in many places, including NASA, Aerojet/Rocketdyne, Sierra Nevada, and Honeywell. It is very much in my wheelhouse to know and limit failure modes. And as a consultant, I've gotten very good at the craft. They hire me to tell them why it hurts and how to fix it. In military and aerospace designs, they work with the assumption that a comic error is
going to happen, and how to deal with it. This is why they require MIL-SPEC parts, having added features like Rad-hard or redundant configurations. The higher the altitude, the better the chance of a soft failure. Electronics on planes have data fails more than they do on the ground.
What is the most reliable type of transistor for radiation? A simple P-channel MOSFET. The carriers are holes, which is the absence of an electron. There is nothing for gamma rays to hit.
2. As many of your observations show, there are many sources of failures, including the power supplies (differential and common mode noise, regulation, conducted noise), AC line quality (yep, I gotta UPS), and even user / assembler issues such as ESD damage or supply-chain damage (shipping and warehousing: from vibration, humidity, forklift handling, etc.). A computer is very complex, and the success of it running correctly has to do with everyone who played a part in it and/or touched it: concept engineering, design, manufacturing, and distribution. Did the person at Best Buy drop it on the floor right before you bought it? They undoubtedly added micro-cracks in the solder joints. Oh I hate BGAs.
3. OMG the number of problems I have found coming back to the power system. My earliest was a Xeon configuration in which the HDDs would just die. The failures seemed to be right after power up. Put a scope on it - and found the PSU 12V rail was overshooting to nearly 20V at power up. PITA. How about power line transients? And as CPU and memory voltages get lower, the motherboard power quality is going to be even more of a factor.
4. I am not implying that ECC is an end-all be-all fix, but I find it aggravating that a technology that has been around since at least 1990 is not de facto in our computers now (29 years later). The hold-off is only due to wanting more profit. If most computers had ECC now, it would be a cost-parity

. I found the wikipedia page for ECC entertaining. I did not know Cray originally left it out of their earliest parallel systems, and had to add it in later. Who knew?
https://en.wikipedia.org/wiki/ECC_memory
5. I never never overclock my main machines, and am very careful about heat and dust buildup. I have overclocked some old machines just for fun, though. What's that smell?
Short of issues caused by my tinkering, such as overclocking, I have never had a BSoD on any of my home PCs that I have built for myself or family. My grandmother currently uses an Intel NUC that I built and mounted to a monitor and it has been working great.
Hmm. You my friend have won the Wintel lottery. I learned to be a backup and save fiend in the W95/W98/W2000 days. When W98 first came out, the FAT32 driver had a bug. By the time I figured out there was a problem, it had blitzed my file - and the OS.
The mass majority of people do not use RAID either and those that do in the mainstream typically utilize RAID 0, even though its pointless with SSDs these days. RAID 1 only protects from hardware level failure, not potential corruption due to a missed bit so its not even in the same league as ECC which has dedicated hardware for that purpose.
ReFS is still new and still is not bootable. The mass majority of people, mainstream, uses NTFS still. Hell I would bet most server still use NTFS as the cost and hassle to migrate to ReFS is currently not worth the benefits it has over NTFS.
Just because a technology like RAID has limitation does not mean it should be ignored. RAID has been around for a long time also, and yes, it has a lot of barnacles on it. That is why Sun/Oracle started making ZFS and MS started making ReFS. Both of these are intended to catch hardware level soft errors using data redundancy (CRCs) and other features such as COW (copy-on-write) to get around the need for RAID controllers requiring batteries for retaining any unwritten data in a power failure scenario. And batteries fail a lot also, so I call that a "D'oh!". They should use EDLCs (supercaps) which have an insane lifetime. Just don't get them hot.
I think that there are several 'NIXs that can boot into ZFS natively such as FreeBSD / FreeNAS and Ubuntu. MS has had some setback on ReFS, which was supposed to originally be in W7 (remember Longmere?). BTW: I have been using RAID for at least 15 years. And I still have my old files from 1989. Everything before that, was, well, pre-bedroom fire days ...
And yes, most BSODs are due to drivers. Ever wonder why hardware for the Mac costs typically more? Same of iOS app development. Apple is very good at keeping garbage out of their customer universe.
You're confusing an issue of life-safety with the (often minor) inconvenience of computer unreliability.
If you are oinly using a PC for games, then it is only an inconvenience, although when it happened to my mom, it was a hassle for me because she was 600 miles away. But if you corrupt a Ph.D dissertation paper or kill a presentation on your laptop as you fly to a trade show, it can be a calamity. I think you overlooked my point that these features in autos are now standard. I believe having ECC in PCs should be standard. It is very old technology.
IIRC, Microsoft tried to make ECC memory mandatory for Windows Vista-qualified PCs. I'm guessing they got too much push back from OEMs.
MS did that because they found that they were getting blamed for BSOD failures that were hardware caused and not from the OS. I remember that there was a report around then that showed that in one generation of CPUs (not saying who!) that the memory corruption was happening in cache, not actually in memory. That really bites.
-Chas