Question Newb question about SSD reliability

Involute

Distinguished
Apr 30, 2010
43
0
18,540
I'm spec'ing a new Windows PC to replace my current eight-year old one. The current one has a 1TB SSD (75% full) for software and a 4TB two-HDD RAID 1 (44% full) for files. The vendor for the new system recommends a 2TB SSD for software and an 8TB SSD for files. Both SSDs are NVMe PCIe Gen4 M.2. Eight years ago, SSDs weren't recommended for files due to longevity issues. The vendor says that current SSDs will last as long as the PC, even in relatively high write frequency applications like file disks, as long as you don't exceed 80% of capacity. Is this true? I've never had to replace a drive in my RAID in eight years. Can I expect the same reliability from a single SSD? Any other issues I should be aware of? Thanks.
 
I'm spec'ing a new Windows PC to replace my current eight-year old one. The current one has a 1TB SSD (75% full) for software and a 4TB two-HDD RAID 1 (44% full) for files. The vendor for the new system recommends a 2TB SSD for software and an 8TB SSD for files. Both SSDs are NVMe PCIe Gen4 M.2. Eight years ago, SSDs weren't recommended for files due to longevity issues. The vendor says that current SSDs will last as long as the PC, even in relatively high write frequency applications like file disks, as long as you don't exceed 80% of capacity. Is this true? I've never had to replace a drive in my RAID in eight years. Can I expect the same reliability from a single SSD? Any other issues I should be aware of? Thanks.
"Eight years ago..." ?

My house systems have been SSD only for over a decade.
Exactly none of them have died due to longevity issues.


SSDs do have a theoretical limit on write cycles.
In the warranty, they will list some number, along woth the years the warranty is valid.
300-600-1200 TBW. Something like that.

It is HIGHLY unlikely you will reach that warranty TBW in normal use. Or even ever.

My current C drive (1TB Samsung 980 Pro) has been in 24/7 use for 22912 power on hours. Almost 3 years.
The warranty TBW number is 600TBW.

In those 22,912 power on hours, it has 33.8TBW. So about 5% of that 600TBW.
Extrapolated out, at this use rate, this 980 Pro will reach the 600TBW value in approx 2075.
I think I'll be safe.

People make a big hooha about "SSD wearing out.
But that is like having a car warranty of "5 years, 2 million miles". And you drive 15,000 miles per year.


Now....lets talk about this RAID, and what you're doing about a real backup situation.
 
Thanks for the detailed reply. The RAID isn't really for backup but for convenience in the event of a disk dying. I still do a full system backup every night to an external HDD. In fact, that's one of the things compelling the new system. It takes about seven hours to backup 2.6 TB via my USB 3.1 port to the HDD. It does it between 11:30 p.m. and 6:30 a.m., outside my normal work window, so it's not a problem. It will be eventually, though, as the file only gets bigger. I suspect the HDD RAID is the main bottleneck and switching to an SSD in the new system will allow at least a 3x improvement in speed.
 
It takes about seven hours to backup 2.6 TB via my USB 3.1 port to the HDD.
There are MUCH better ways to do this.

You don't need to back up the entire drive, every night.
Incremental is your friend.

I use Macrium Reflect.
A Full drive Image, followed by a series of Incrementals.
The incremental is ONLY the changes since the last Incremental.

Literally, a minute or two per drive.

My system has 6x drives (all SSD), approx 3.5-4TB total consumed space.
Each drive on its own schedule, every 30 minutes from 1AM to 4AM.
The Incremental on each takes only a minute or two.

This goes to a folder tree on my NAS, onto an HDD, across the regular gigabit LAN.
 
Every nand chip in a ssd has a limited number of writes in it's lifetime.
The first ssd's were 40mb in size so the total number of available writes were very small compared to today's drives measured in terabytes.
Early drives did have an issue with longevity.
No longer an issue today.

The rationale for 80% full was that the remaining nand blocks had to be rewritten to arrange for more space at a greater frequency.
No longer applicable today and the magic number is now over 90%

As to reliability, no moving parts is a plus with a ssd. Even raid 1 exposes you to perils such as a fire, inadvertent deletion, ransomware, viruses and such.
The best protection is at least an EXTERNAL backup.
 
I'm on Macrium Reflect, too. I tested incremental a number of years ago (might not have been on Macrium), and it was fine if I ever wanted to restore my entire system. There have been times, however, when I've wanted to restore a single file, and I found it difficult to locate a particular file in an incremental backup. It seemed like I had to know which increment the latest version was in, if it wasn't in the initial backup. Maybe there's a way to scan the main + increments as a single backup, but I couldn't find it when I did the test.

At the time, my backups were much smaller, so it was relatively quick and easy to just backup everything, and I stuck with it. Even if I switched to incremental now, the initial would still take seven hours, just not every night.
 
I'm on Macrium Reflect, too. I tested incremental a number of years ago (might not have been on Macrium), and it was fine if I ever wanted to restore my entire system. There have been times, however, when I've wanted to restore a single file, and I found it difficult to locate a particular file in an incremental backup. It seemed like I had to know which increment the latest version was in, if it wasn't in the initial backup. Maybe there's a way to scan the main + increments as a single backup, but I couldn't find it when I did the test.

At the time, my backups were much smaller, so it was relatively quick and easy to just backup everything, and I stuck with it. Even if I switched to incremental now, the initial would still take seven hours, just not every night.
Yes, the initial takes a while.
Once.

And yes, you have to sort of know when the file in question was last modified or created, to find that Incremental.
There is also Differential, as opposed to Incremental. ALL the changes since the initial Full Image.
 
Aren't you supposed to do a new initial periodically, so you don't end up with hundreds of incrementals? If so, how often?
In Macrium, you can set a specified number of Incs.

Mine is set to 30. So, 1 month.
iHP7JkV.png
 
People make a big hooha about "SSD wearing out.
But that is like having a car warranty of "5 years, 2 million miles". And you drive 15,000 miles per year.
On the other hand, if I’m backing up from an internal SSD to an external HDD, I’m not solving my backup speed problem. Since I backup 2.6TB every night, switching to an external SSD would solve the speed problem, but I’d blow through the external SSD’s TBW pretty quickly (< five years), wouldn’t I? Any suggestions (other than switching to incremental or differential backups)?
 
On the other hand, if I’m backing up from an internal SSD to an external HDD, I’m not solving my backup speed problem. Since I backup 2.6TB every night, switching to an external SSD would solve the speed problem, but I’d blow through the external SSD’s TBW pretty quickly (< five years), wouldn’t I? Any suggestions (other than switching to incremental or differential backups)?
Changing to an external SSD is still going through the USB bus. Slow.

Backing up the whole thing every night IS the problem. There is really no reason to do that.
The vast majority of that data does not change from day to day. That is the reason for Inc and Diff.

In addition, the whole drive copy every night does not give you any depth. 30 days of Incrementals means you can recover to any state in the last 30 days.
Your procedure gives you...yesterday.
 
SATA can be hotswapped. Use an internal SATA drive in a drive sled and it will be as speedy as any internal SATA drive. If you are imaging, that's mostly sequential so HDD is plenty fast enough.

Ideally, rotate different sleds each night and bring one of them offsite (such as back home with you) in case of fire.
 
SATA can be hotswapped. Use an internal SATA drive in a drive sled and it will be as speedy as any internal SATA drive. If you are imaging, that's mostly sequential so HDD is plenty fast enough.

Ideally, rotate different sleds each night and bring one of them offsite (such as back home with you) in case of fire.
I've noticed very compact sleds for SATA SSDs. Is there typically room for a heatsink in those? Am I likely to need one?
 
While PCIe hotplug is a thing, it is generally only used in Enterprise class hardware as it requires BIOS support. Theoretically it is possible for a card to stay powered in the PCIe slot (which takes care of the BIOS enumeration part) with a slot to plug M.2 drives in the back but I'm not aware of any such devices.

There is one subset of PCIe that is required to have hotplug in Thunderbolt 3, 4 and 5. There are commercial NAS-like disk enclosures that plug into Thunderbolt rather than ethernet.
 
USB is fine, if you have the CPU to spare for it. But if you ever plug that enclosure into a NAS or router, you're likely to be really disappointed. At least UASP is supposed to reduce CPU usage by up to 80% but it's mostly a Windows-only thing. There is a ridiculously small list of hardware that can do UASP in Linux.

As to your original question, the Jedec spec for consumer SSDs is they are supposed to be able to retain data for at least a year unpowered at elevated temperatures, after the rated TBW has been written to them. How long will a brand new drive written to only once last--10 years maybe? Nobody knows because there is no spec for it. But most still consider SSD to be unsuited for long-term offline/unpowered cold storage, though it should be just fine if you power it up once a year.

The old SSDs from 15 years ago were most durable because they were 50nm and SLC so rated for 100,000 P/E cycles per cell.
34nm MLC dropped this to 5,000 cycles and 25nm MLC further to 3,000 cycles. By the time 840 Evo on 19nm TLC dropped life to 1,000 cycles, a firmware update to rewrite the cells more frequently (to further use up the limited cycles) was found to be needed to avoid data corruption, as it proved leakier than expected. 3D-nand reset the clock as it used effectively a larger process size at first but has since been miniaturized again to the 20nm range too. And now even QLC (yuck)

What's changed to allow this? The controllers now have far more computing power and can perform far better error correction and detection of weakening cells which can be seamlessly remapped to spare area to hide failing cells from you (a situation that used to just brick the SSD). And the cost has come down so far that they allow truly enormous capacities which make using up those fewer cycles less likely. I mean those old SLC drives were only up to 64GB... for $800 ($1,175 in today's money).

The other thing about SSDs is it's way harder to recover data from them. If you accidentally delete a file from a HDD it will be there until the space is overwritten, but the garbage collection in modern SSDs is so efficient that it will be permanently gone before you can find your undelete program and run it. And of course there's no Recycle Bin on an external USB drive