News Unpowered SSD endurance investigation finds severe data loss and performance issues, reminds us of the importance of refreshing backups

I have a 10 year old OCZ SATA III SSD I've have had off for 3 years. When I saw this, I took it and the M.2 NVMe that housed the OS that I had paired with said drive and powered them on. To my surprise, there was nothing gone. No corruption. No issues. Granted mine were improperly stored in the laptop I put them in and placed on a shelf but there you have it.

I have 2 more like it. Not powered on in 3 or more years. I'll let those sit longer.
 
I had a *powered* hard drive a while back that started having issues, but only on older files. Turns out the file system (and disk) were well over 10 years old and the drive did nothing to 'freshen up' data, so newer written data was fine and really old files had a few errors. (I ran some utility or other to read and rewrite the entire drive, then it was fine again.)

I went through some IDE drives where i did a read test, then zero'ed them, and the REST of my drives -- including one where last use was close to 20 years ago -- were fine. (One had stiction or stuck bearing or something, it didn't spin up until i gave it a thwack. But THEN ran fine. This was like a 80MB -- yes MB not GB -- drive though so probanly close to 35 years old. And who knows how long it was sitting.)

Cold storage is trickty! I'd 'trust' HDD over SSD to not spontaneously lose the data, but the bearings could stick, stiction, if it's a modern high capacity drive the helium could leak out (my 16TB drive I have now... not for cold storage, for live storage... doesn't have a SMART measure for estimated helium remaining so I'm REALLY hoping it gives warning signs when it's time is due ratrher than just dropping dead... if indeed that's what happens.). The SSD is more prone to loss but the HDD has all those moving parts to have something go wrong with.

It sounds like if that's the goal, tape is still a good way to go. And making sure to use.a standard format so you can still find a compatible drive in many years if your old one fails (or, they've changed connectors and your old drive will no longer hook up (past examples, SCSI tape drives, or those ones that'd hook to floppy controller. Good luck plugging that in now.)
 
  • Like
Reactions: nostall and Heiro78
Gosh! I'd be a LOT more inclined to get my reliability and lifetime data from a source like BackBlaze.com, which rates both hard drives and SSD used in a data center environment. Right now they report over a total of 300K+ drives and ~27M drive days. You need big numbers and lots of data to properly understand reliability and failure rates. That said, the info here is interesting because drives kept unpowered in NOT something that most sources report.
--Ed--
 
  • Like
Reactions: hwertz
I've been storing things on unpowered SSD for years and have yet to have any issues that wouldn't arise even being powered. Mostly bit rot on things like audio files. I also keep a large HDD with storage, check it every year or so and have had zero issues with it. I was rather surprised to learn that optical media is actually a fairly poor way to keep archived info. Years ago we were told they "lasted forever" and then you find out about oxidation...
 
I've been storing things on unpowered SSD for years and have yet to have any issues that wouldn't arise even being powered. Mostly bit rot on things like audio files. I also keep a large HDD with storage, check it every year or so and have had zero issues with it. I was rather surprised to learn that optical media is actually a fairly poor way to keep archived info. Years ago we were told they "lasted forever" and then you find out about oxidation...
Same here. I have several SSDs with old backups and files on them and never have had any corruption or other problems after several years without powering them on. I recently took a look at several of them that hadn't been powered on for at least 3 years and everything seemed to be just fine.

And in my opinion, this performance degradation thing is quite misleading. I mean, of course if the drive is filled with corrupted files and errors the performance will take a hit, but just format it and you gonna have it back like new. You won't make me believe that SSDs that have been on a store shelf for more than 2 years should all be thrown away because of some "performance degradation" for not being powered on.
 
Note that the fails were after drives had been written FAR beyond the factory TBW rating.



And then left sitting for 2 years.


I see. Apparently, I missed the part that mentioned they used the drives beyond the expected number of R/W and then created a situation to make an article about. Thanks for the clarity on this shoddy excuse for a subject.
 
I see. Apparently, I missed the part that mentioned they used the drives beyond the expected number of R/W and then created a situation to make an article about. Thanks for the clarity on this shoddy excuse for a subject.
Written far beyond... has been the case in every article or report that says "Unpowered SSDs lose data after <whatever>! "

Used, abused, then left on a shelf for years.
 
The article is actually milder / less scary than it should be.

My wife had a laptop (used IBM/Lenovo Thinkpad) which originally had an HDD, which I upgraded to an SSD (Plextor PX-512S2C), back in 2017. Windows 7 OS install, some applications, and then light usage (roughly 2 TB written, total, meaning average of 4x writes to each block, not allowing for write amplification). And, we're talking MLC, not TLC.

Laptop was "retired" 3 years later (replaced by a brand new Lenovo Thinkpad), in early 2020. Was powered down.

In late 2023, roughly 3.5 years later, she accidentally deleted a file on the new Laptop, and powered up her retired laptop, expecting to be able to copy the file over.

Nope. Laptop refused to boot. I got involved ("Honey..."). Removed SSD from laptop, connected to a desktop, and started running chkdsk /r.

Two and a half days later, chkdsk finally finished: several files were corrupt, but, after returning the SSD to the laptop, it would boot, and the desired file was intact.

I ended up getting an industrial/server 512G SSD replacement, with rated "5 years powered down," copied everything which survived over to it (and made a backup), put it in her laptop, made it bootable, and plan on powering up her laptop at least every 2 years, and re-running chkdsk /r, just in case something similar happens again. After reading this, I powered it up and ran chkdsk: it ran quickly, with no problems.

(She's not good with restoring from backup, and doesn't see why she should not be able to just power up the old laptop.)

She does have a point: Compare this with the first Laptop I bought her: a Gateway Solo from 1999. It has an HDD, was powered down for at least 12 years, and booted / ran chkdsk with no problem (we tried it when we discovered the problem with her Thinkpad's SSD).

So, yeah, light usage, MLC, good brand, and, uncorrectable "bit rot" after only 3.5 years powered down. I did not expect that.

Would definitely like to see more articles about retention performance of powered down SSDs. While ECC has gotten better (more correctable) over the years, the proliferation of TLC and especially QLC (just try to find an MLC drive, these days) must have had a negative impact.
 
Another thing to consider is that *most consumer SSDs do not proactively refresh the weak NAND pages even when powered on, so there is no difference in the SSD being powered on versus powered off for data retention.

I see a lot of people missing this and assuming the phenomenon only affects cold storage SSDs.
 
I see. Apparently, I missed the part that mentioned they used the drives beyond the expected number of R/W and then created a situation to make an article about. Thanks for the clarity on this shoddy excuse for a subject.
They didn't "create a situation to make an article about". They tested both the heavily written SSDs, and some that weren't so they'd have multiple data points. Even the ones that were written to once showed that data correction was taking place, they already had flipped some bits, just in a quantity that the error correction could still correct for.


The article is actually milder / less scary than it should be.

So, yeah, light usage, MLC, good brand, and, uncorrectable "bit rot" after only 3.5 years powered down. I did not expect that.
Indeed. I would rather (without paying the $$$$$$$ of the enterprise SSD) have a reliable SSD (good amount of TBW, and data retention) than having them be faster and faster but maybe it's going to lose your data. I'm still using almost entirely HDD storage, because the couple times I dabbled with SSD my write load was heavy and I burned them out in like 3-6 months. (These were some cheapies.. like $20 128GB SSD or whatever.. but I mean, really... 3-6 months.)


Another thing to consider is that *most consumer SSDs do not proactively refresh the weak NAND pages even when powered on, so there is no difference in the SSD being powered on versus powered off for data retention.
Oof. I've had that occur on a few HDDs years ago, but as far as I know all HDDs for years have rewritten on weak read, and most run scans that over time scan the whole disk and rewrite weak sectors (the ones that don't do that autmoatically do it on request through smartctl or whatever your equivalent is in Windows.)
 
I have a 10 year old OCZ SATA III SSD I've have had off for 3 years. When I saw this, I took it and the M.2 NVMe that housed the OS that I had paired with said drive and powered them on. To my surprise, there was nothing gone. No corruption. No issues. Granted mine were improperly stored in the laptop I put them in and placed on a shelf but there you have it.
Older drive models used NAND chips with larger cells and fewer bits per cell. This made them generally less susceptible to this problem.

If I understand correctly, the shift to 3D NAND basically involved two different cell designs. I think those were charge-trap and floating gate. Intel & Micron went one way, while Samsung and SK Hynix went another. The design used by the Korean firms turned out to be better, with Micron eventually switching over to it. I don't know about Intel/Solidigm, but I only bought one of their NAND-based drives since then, and that was a datacenter model with enough overprovisioning that I don't really care.
 
I had a *powered* hard drive a while back that started having issues, but only on older files. Turns out the file system (and disk) were well over 10 years old and the drive did nothing to 'freshen up' data, so newer written data was fine and really old files had a few errors. (I ran some utility or other to read and rewrite the entire drive, then it was fine again.)
Early drives were less likely to do background maintenance, because it was less of an issue. If you had an older OS on it, you might have to manually enable Trim (unless the drive is so old that it doesn't even support it).

One thing I'd caution people against is naively charging it with a block-level operation to "freshen" an old drive. I made that mistake and almost killed a drive. What apparently happened is that some NAND cells hadn't been written since either early in the drives service or maybe since it was manufactured. Those were the blocks that failed. When the drive discovered the errors, it relocated which blocks it could recover, thus burning through its reserve of remaining capacity. Once I noticed this, I halted the operation in time to see the reserve capacity was almost gone.

Then, what I did was to manually run Trim (Linux command: fstrim, which is safe to use on mounted filesystems). After that, I reran badblocks and it had no more errors. What fstrim does is to tell the drive's firmware which blocks don't hold any data. Apparently, doing this also causes the drive to rewrite them (probably with zeros or something) and that's why subsequently reading them no longer caused any errors.

BTW, this whole issue affects USB sticks and SD cards, not just SATA SSDs. Some of those support Trim, so make sure to try trimming first, before doing a (preferably) file-level or block-level operation to freshen them.

I'd 'trust' HDD over SSD to not spontaneously lose the data,
Don't. I think they're not guaranteed to retain data in cold storage for any longer than the warranty period. I think retention is affected by temperature, so basement storage is much better than attic, although if your basement is very damp then maybe double-bag the drives with a fresh silica gel packet.

if it's a modern high capacity drive the helium could leak out
Fortunately, most drives don't contain helium. I think that's mainly used to improve energy efficiency, but it's a waste of a non-renewable resource. I sort of doubt loss of helium in your drive will affect its operation, but I don't know too much about this particular topic.

The SSD is more prone to loss but the HDD has all those moving parts to have something go wrong with.
With HDDs, what I worry about is an array failure. I run RAID-6, which means I can recover from up to 2 drive losses and effectively mitigates against mechanical failures, in small arrays. However, if the entire stack becomes corrupt, then you're still hosed.
 
I've been storing things on unpowered SSD for years and have yet to have any issues that wouldn't arise even being powered. Mostly bit rot on things like audio files.
LOL, what? No, if the SSD hasn't exceeded its wear-out reserve, then keeping it powered should not allow for bit-rot of anything.

I also keep a large HDD with storage, check it every year or so and have had zero issues with it. I was rather surprised to learn that optical media is actually a fairly poor way to keep archived info. Years ago we were told they "lasted forever" and then you find out about oxidation...
All I'll say about that is not all optical media is equal. It also matters how you burn it and how you store it. I'd trust good optical media to sit in a vault for a lot longer than I'd trust an unpowered HDD, that's for sure.

Thanks for the clarity on this shoddy excuse for a subject.
It's not shoddy. Totally legit subject.

Used, abused, then left on a shelf for years.
Even the "fresh" drives experienced high error rates, but were still recoverable. That strongly suggests they were headed for the same fate, just not as fast.
 
Last edited:
Who uses NAND for cold storage?
The article's author mentioned he had a PC in his home that he left unpowered while he was away for 6 months. After this period, it was unreadable.

I think a lot of people might unintentionally stumble into a situation like that. Maybe kids away at college or someone studying or working abroad.

Others might simply put a SSD in an external enclosure and leave it in a drawer or on a shelf for too long. You might be aware that NAND uses electrical charge-storage to represent state, but the average user probably doesn't even think about how it works.
 
  • Like
Reactions: lmcnabney
just format it and you gonna have it back like new. You won't make me believe that SSDs that have been on a store shelf for more than 2 years should all be thrown away because of some "performance degradation" for not being powered on.
You have to use the SSD vendor's tool to do a low-level reformat, if you want any chance of restoring its reserve of spare blocks. Even then, if the NAND cells have suffered heavy wear, an old drive will wear out at an accelerated rate. It also might not retain written data as well, when left in an unpowered state.
 
I ended up getting an industrial/server 512G SSD replacement, with rated "5 years powered down,"
I've never, ever heard of or seen such a thing. Not among sever SSDs, at least. I guess I can imagine that for specialty industrial applications.

Many years ago, Intel used to specify the guaranteed power-off data retention of its storage products. This was back in the MLC era, and the spec was 90 days. In most circumstances, you'd get quite a lot longer than that, but it did shock me that it was so low.

In the past 5+ years, I've never seen this spec provided by them or any other datacenter SSD makers. They simply do not expect you to use those drives for any sort of cold storage. They're made to sit in servers running 24/7.

Would definitely like to see more articles about retention performance of powered down SSDs. While ECC has gotten better (more correctable) over the years, the proliferation of TLC and especially QLC (just try to find an MLC drive, these days) must have had a negative impact.
Agreed. Do note that early TLC drives weren't as good as newer ones. It wasn't really until QLC drives became fairly common that NAND manufacturers had improved their cell designs to the point where TLC drives were actually good. Basically, you want to be one level of cell density behind the cutting edge.

BTW, I expect PLC to be a horror story, with people losing data after as short as only a couple months.
 
Another thing to consider is that *most consumer SSDs do not proactively refresh the weak NAND pages even when powered on,
No, they've all been doing self-refresh for quite a long time. It's been standard for probably a decade, by this point.

This is one factor affecting idle power usage of SSDs. The controller isn't just passively sitting there.

If this was a big concern, enterprise would have come up with a rack solution to store drives and trickle feed them juice to maintain reliability.
No, because SSDs haven't been a cost-effective option for cold/near-line storage. Maybe that could be changing soon, due to HDD densities stalling out, but there so far hasn't been a good reason for them to use SSDs like that.
 
An article I'd like to see is one examining how long it takes a SSD to do a background refresh. In particular, is it enough to turn on a machine frequently, for short amounts of time? Like, when the SSD is self-refreshing, does it bother to remember where it left off? Most likely, that's something that varies by brand and model.

I think the only way to ensure it does a self-refresh is via SMART or by manually doing a filesystem or block-level refresh.
 
No, they've all been doing self-refresh for quite a long time. It's been standard for probably a decade, by this point.
*most consumer SSDs do not automatically refresh NAND cells that have decayed from leakage over time. This has been proven via measuring the gradual read speed reductions on SSDs of "old" data. There do seem to be exceptions to this though, Crucial MX500's seem to be one of the only consumer drives that proactively refreshes NAND pages. Also a large portion of enterprise SSDs implement a temporal refresh algorithm.

Almost all consumer SSDs do have read disturb refresh algorithms implemented, but that that won't help the data from decaying over time unless it is incessantly read over and over.

This is probably the most informative thread on the subject out there at the moment:
https://forum.level1techs.com/t/ssd-data-retention/205692
 
*most consumer SSDs do not automatically refresh NAND cells that have decayed from leakage over time. This has been proven via measuring the gradual read speed reductions on SSDs of "old" data. There do seem to be exceptions to this though, Crucial MX500's seem to be one of the only consumer drives that proactively refreshes NAND pages.
I don't know if you or whoever you were quoting got confused, but even the Crucial M500 (from 2013) had self-refresh. There was a bug in one version of their firmware, where it didn't run if you had ASPM enabled, so they advised people either to disable ASPM or upgrade their firmware.

The kind of densities modern SSDs have wouldn't be possible if they didn't self-refresh.
 

TRENDING THREADS