[SOLVED] Preventing stored SSDs from losing data

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

tmpc1066

Distinguished
Oct 29, 2016
71
5
18,535
When storing an SSD with data on it, my understanding is that it must be powered up periodically to prevent data loss. The questions that I can't seem to find a definitive answer to is whether it needs to be connected to a computer when powering up, and whether it actually needs to be read by the computer. If this is true, must the entire drive be read?

I also read one source that indicated that the housekeeping firmware in some SSDs will automatically read all of the data and rewrite it if it finds that it is weak. This same source indicated that this will vary from manufacturer to manufacturer. Anybody know anything about this or where I can find definitive answers?

You may wonder why I'm using SSDs instead of HDDs. The reason is that these aren't normal backups. They are drop in replacement boot drives containing authorized licensed installs of music software I own. It's an insurance policy against the software not being supported someday or the installers not running on the old operating systems I need for this software. I have backup computers as well.

By the way, these are Mac drives if it matters.

Am I nuts? You bet! o_O
 
Solution
When storing an SSD with data on it, my understanding is that it must be powered up periodically to prevent data loss. The questions that I can't seem to find a definitive answer to is whether it needs to be connected to a computer when powering up, and whether it actually needs to be read by the computer. If this is true, must the entire drive be read?

I guess I missed this thread and I see now you have had lots of answers and research, but I'll still give my quick (and educated) take:

When you power on the drive it pushes the firmware and boot code to SRAM and does a variety of tasks. For example, if you had a power loss event with data-in-flight the drive will determine this and restore existing lower page values before...
Well, I'm blown away by this info, Maxxify. I assume the link to NewMaxx on Reddit is you as well. I have been looking for detailed info on SSDs for a very long time. How did you come by all of this?

I am sure I will have questions, but it will take me a while to get through all of this to even know what to ask. So, thanks Maxxify.
Again, I'm blown away!!! o_O

Yep, that's me.

A good place to start is Google Scholar coupled with something like sci-hub (which has not indexed articles from 2021, but is fine for anything older) or unpaywall, etc. If you are good at narrowing your search (including using reasonable date ranges) you will eventually find articles and patents that cover these mechanics in some detail. I've posted many on my subreddit. For example, Micron has multiple patents that deal with power loss events (and a white paper as well) where it describes the mechanism by which data-in-flight, if writing to TLC for example, is handled after a power loss (using a differential memory device, in the case of the MX500 for example). There's other resources that talk about other methods of restoring the original (lower page) values through redundancy, compression, etc. These operations are done on power-on and will be repeated until completed (there is confirmation on writes for example) which no doubt may be the root cause of some "dead" drives but I digress, the controller must be able to handle this without host/OS interaction for obvious reasons.

But as stated, there's a translation/abstraction layer anyway, so it's clear the memory device is self-sustaining. TRIM as the example command I gave is something done to improve efficiency through host information which again is more prominent with NVMe especially in enterprise as I mentioned, for example zones namespaces (ZNS) and directives (e.g. stream directive). However with asynchronous I/O for example, the drive might acknowledge the host I/O commands as being completed when in reality the drive rearranges the data to improve efficiency (e.g. interleaving, reordering of I/O, etc). So at the fundamental level you have a lot that is handled by the drive itself.

Anyway, within Scholar you will find articles that discuss the start-up/boot process of a SSD with its SRAM for example, and this includes the restoration of data. How this is specifically handled depends on the drive and firmware. However, more generally, you can find information on the block-level tracking for wear, details about ECC (e.g. LDPC) and RAID/parity, etc, from, which you can extrapolate the rest. For example there's many articles that cover different algorithms for managing wear-leveling, static vs. dynamic for example, but these have some basic principles such as keeping dates for last-access, last-erase, number of accesses, read retry history (LLR table), and such to determine the best time to rewrite data when balancing performance (read latency) with data retention (as few rewrites as possible). To give you a direct example: the MX500 has on this forum gotten flak for doing too many writes while WD drives recently were accused of doing too few (stale data having huge performance penalties). One could say that in the former case, endurance is being traded for performance while in the latter endurance is the priority at the cost of performance. So in that way firmware design can impact how this is managed.
 

Pc6777

Honorable
Dec 18, 2014
1,124
21
11,465
Some newer ssds with multi levels cells might be worse for cold storage/archival than older ones. I would use hard drives If you just need cold storage, hard drives are cheaper and better for putting in a drawer for a few years then picking it back up. Plus power outages can kill ssds. I only use ssds for hot storage. If you want to be safe rewrite everything from another drive with the same data every year or so. The data could last 5 years unpowered for all we know, but with this new multi cell technology and the uncertainty of flash storage I would aim for an yearly rewrite every 6 months if you feel like it. And power it on at least like every few months or so for a little. If it's not a lot of data and you can afford it, look into verbatim m-disc Blu rays, they get up to 100 gigs per disc and it's the most permanent data archival solution you can find besides maybe enterprise level stuff.
 
Well, I'm blown away by this info, Maxxify. I assume the link to NewMaxx on Reddit is you as well. I have been looking for detailed info on SSDs for a very long time. How did you come by all of this?

I am sure I will have questions, but it will take me a while to get through all of this to even know what to ask. So, thanks Maxxify.
Again, I'm blown away!!! o_O

As an example of what I meant with my reply above, see these patents from SK hynix I posted earlier today on my subreddit.

10,936,421 details how to recover from a sudden power-off when programming pages, as they detail:

"when sudden power-off occurs, whether there is a high probability of a program disturb of unselected pages sharing a word line coupled to a selected page among the pages in rebooting, and output a command to perform an over-write operation for programming data in the selected page or skip the over-write operation, based on a result of the determination.”

A given word line for TLC, for example, actually contains three pages (three bits per cell), lower and upper pages or least/center/most significant bit. So this patent says that, basically, you determine how far you were into the programming sequence - with latter stages refining the MSB for example (data-in-flight), which has a narrower voltage threshold - and then if necessary you over-write (rewrite) the lower (already-written or data-in-place) pages. This occurs after the drive reboots and it states this is done by the "controller configured to control ... the memory device to perform an over-write operation." More detail is given in the patent but it's clear this is done without host interaction. In fact there is another patent listed here (10,915,256) that deals with the mapping table on power loss as well. Considering many DRAM-less NVMe SSDs utilize host memory buffer (HMB) for mapping, this is one place where they might be a bit less reliable because of the extra latency/overhead of going to system memory, for example.
 
If you want to store your data offline for a long time, go with the old HDD.

I don't know how long a HDD could hold the data, some technical sheets say 10 years, but I have a couple of HDD with Windows 2000 and XP isntallation that I didn't touch for almost 15 years. When I connect them to a computer evething works just fine.
 

InvalidError

Titan
Moderator
I don't know how long a HDD could hold the data, some technical sheets say 10 years, but I have a couple of HDD with Windows 2000 and XP isntallation that I didn't touch for almost 15 years. When I connect them to a computer evething works just fine.
Older HDDs have relatively low enough density that nearby bits have very little influence on each other and that allows bits to survive a very long time without being rewritten. The newest high density drives on the other hand have to use fancy new techniques like HAMR to allow stronger magnetization of much smaller magnetic domains to get bits to stick around long enough to be viable. You probably don't want to leave an SMR drive powered down for multiple years at a time either.

Nothing is permanent unless you have backups of backups. (Edit: and periodically check those backups' integrity.)
 
Last edited:
  • Like
Reactions: lvt