SSD endurance: In the real world, does it matter?

williamvw

Distinguished
Feb 4, 2009
144
0
18,680
On Tom’s Hardware, we see charts like this one detailing SSD endurance:

SSDMWI_zps8191e229.jpg


This is great stuff, and it’s very useful for helping to establish deployment expectations and TCO estimates. But I've also never personally heard of someone in the SMB world wearing out an SSD. Ever.

I was speaking with a CTO yesterday who's in charge of about 60 seats, and he was telling me that he's held off on bringing SSDs into his org because of endurance worries, even though he's putting up a new SQL server that gets a lot of daily writes and would benefit from SSD speeds. "I mean, I've got 12-year-old HDDs still in service," he told me. "Is an SSD gonna do that for me? I really don't know."

I couldn't answer him. So I'm asking you guys. Are your business SSD getting hammered enough to raise actual endurance concerns? If so, what apps are generating all those petabytes within the SMB segment?

Thanks!

 
I doubt you'll get an answer over on this side of the forum. For the most part we have questions from novices, gamers, DIY'ers, and enthusiasts. We very very rarely have IT questions.

Have you tried the IT Pro section of the forum?
 
This subject interests me aswell, if you happen to create a thread in another section, link it in this thread aswell please.
Otherwise, I'm the recent owner of a samsung 840 evo 120gb ssd, i'll post endurance results in a couple of years.
 

USAFRet

Titan
Moderator
Independent SSD endurance test:
http://techreport.com/review/26058/the-ssd-endurance-experiment-data-retention-after-600tb

600TB of data written to multiple drives. Kingston, Samsung, Corsair, Intel.
After 600TB of data written, only a couple of them are seeing reallocated cells.

For a realworld consumer level comparison, my current boot drive, Kingston HyperX 3k 128GB, has had <4TB total written to it in almost 18 months.

How much data do you expect to be written daily/monthly/yearly?
 


Well, in the real world, you can have a database server that can have GB or TB written each day. So yes, the endurance of the SSD selected would be very important. That is why Intel had SLC in "enterprise" drives, and they have MLC-HET today. However, before even purchasing a new storage system for a high-traffic system, you will have to carefully examine how much data is being written daily on average as well some peak data writes. That will help you determine what you actually need as far as endurance goes in a SSD-based storage system.

Enterprises also would like to get 4-5 years of use out of their generally high-dollar purchase. Now, I don't know who expects to have something for 12-years; that's a bad anecdotal example, if you ask me. Enterprises all have hardware refresh cycles in their budgets; I've never heard of one who realistically thought they could keep equipment going past 12-years (barring mainframes).
 

bakrob99

Reputable
Feb 18, 2014
12
0
4,510
My PNY 128Gb SSD ran out of the ability ti WRITE after 35 months of use. I am an active trader - and receive every tick (transaction) on the Emin S&P and NQ futures plus others every trading day - I guess the number or ticks requiring data storage would be n the order of 250,000 per day x 240 days per year x 3 years or 180 million, plus the normal writes created by Windows7. Not sure if this is a lot - but the drive was only 70% full yet was unable to write any more data (Subsequently replaced under warranty by PNY - great!) In the meantime I am using a Samsung EVO 120GB SSD and expect to have it last 2-3 years as well. Replacing a drive of this price to gain performance is a reasonable trade off for me.
 

Mikey_07

Distinguished
Jun 2, 2009
23
0
18,510


 

wlp3333

Honorable
Feb 25, 2013
25
0
10,540
I think that database systems might use RAM caches that can sometimes prevent writing from disk or pooling disk writes which might makes the SSD drives last longer. This topic needs more investigation.
 

dgingeri

Distinguished
I'm an admin in a test lab for enterprise level storage products. As such, I caretake for a lot of storage products of various ages, but we don't have any valuable data on them. I've learned a lot over the last 4 years about reliability and data integrity.

If a CTO is bragging about having 12 year old storage, he should probably be fired for putting company data in such a vulnerable situation. Storage products in general should be replaced once they exceed the "bathtub curve." In the first few months, the reliability of storage products hits a low point, and then begins to rise after about 3 months. For some reason, drives just tend to fail in the beginning more often than later. After that 3 month point, they become very reliability, and tend to have a failure rate of less than .1% per year, until just before the 5 year point. At about 4.75 years old, the failure rates begin to rise again. It doesn't get to the same level as the beginning until about year 7, though. After 7 years, the failure rate skyrockets, as does the cost of replacement.

At 12 years, the likelihood of a single drive failure in a year is pretty big, around 30%. The likelihood of a dual drive failure is also pretty high at around 27%, since a failure of one drive starts a rebuild process to a hotspare, if it was configured right when it was set up, and stresses the rest of the drives. It's almost safer at the 12 year point to not have a hot spare because the rebuild process is so stressful that other drives failing becomes an almost certainty.

I currently have about 30 raid arrays with 500Gb and 750GB drives that are hitting the 7 year point. I have had to rebuild them repeatedly because of multiple drive failures. I have about 2-3 per week where one drive failed and the rebuild process killed another drive, and the whole RAID 5 set is useless. Since they are test units, and don't hold any real data, I don't mind the extra work. It keeps me busy and gives the testers some storage they can use when it is working. When one fails like this, they simply move to other storage, copy the test files over, and keep testing. This has shown me, though, that I do not ever, as a systems admin, want to keep using storage past the 7 year point. My 40+ arrays with 1TB drives are just hitting the 6 year old point, and I am having to change out about 5 drives per week with those, but I rarely have to rebuild from multiple drive failure. If I were a real IT admin, I would insist the company replace them now before we have a catastrophic failure and lose all the data at once.

So, going back to your original question, the SSDs should last long enough to outlast your RAID trays themselves. They are about as reliable as other enterprise storage, but will fail more reliably, if that makes any sense. Instead of waiting for drive failure rates to hit that wall and cause a major loss of data, you can tell management "these drives are coming into their last 10% of writes and have to be replaced before they can't be used anymore." So, the point is, do you prefer the looming danger of uncertainty when your storage is going to fail or the looming certainty of knowing that it definitely needs to be replaced.

That's the difference between SSDs and hard drives in IT.
 

williamvw

Distinguished
Feb 4, 2009
144
0
18,680


That is a stellar perspective. Thank you so much for offering this. I will definitely find a way to pass this on. :)
 

blohm85

Reputable
May 16, 2014
70
0
4,640
i mean in my opinion i just find it depends on what your using it for if your just using it for basic day to day stuff and not constantly adding and deleting stuff a quality ssd like a Samsung ssd will be a great upgrade to your rig
 

dkulprit

Honorable
Nov 29, 2012
314
0
10,860
Here is testing on a couple major brands. Depending on what you do it doesn't really matter. I doubt you will write anywhere near what they are doing. This is a test of 1 pb+. Some of the drives failed after 700TB of data. That is a lot of data to read/write. On top of that they are intentionally blasting these things at a rate that a typical home user woukd never even come close yo matching, whoch stresses them out more, so some of the drives that failed before the 1 pb could possibly go beyond if in a real world situation.

Now, don't get me wrong, i've had some terrible ssd's that didn't even come close to that before failing. But i bought them cuz they were cheap and i didnt need them to last.

So draw your own conclusions, but if you are a typical home user and you are blasting an ssd with more the 200tb's of data, I'd be more concerned about what you are doing with so much data.

http://techreport.com/review/26523/the-ssd-endurance-experiment-casualties-on-the-way-to-a-petabyte
 

jasonc2

Reputable
May 26, 2014
19
0
4,520
I think most of the manufacturer's numbers don't matter so much as a) quality of the device itself (if their working devices have a specified MTBF but 40% of their devices are DOA or early failures due to faulty hardware, then who cares about their numbers) and b) usage patterns.

In particular, something I see done sometimes, and probably about the worst thing you can do to an SSD, is populating a new SSD from a sector-by-sector clone of another drive without forcing a trim afterwards. Even if the filesystem you write to the drive has a significant amount of "free space", from the SSDs point of view, you've filled up the entire drive, reducing its working pool to either 0 or whatever mandatory over-provisioning space it happened to have. Every single drive I have seen set up this way has failed within 2 years for typical home use, or 6 months for heavy write situations. I had 3 Crucial M4's, two were cloned this way and failed hard within 10 months after about 30TBs of writes. One was not a clone and is going strong after 3 years with only a few reallocations and 100TBs of writes.

It's such an easy pre-emptive thing to do with such a huge benefit. If the filesystem is NTFS and the platform is Windows; one trick is to create a file to fill up the remaining free space on the drive and then delete it (with specialized tools you can do this with minimal writes). If the filesystem is ext3 or ext4 and the platform is Linux, running fstrim once on the device is all it takes.
 

christinebcw

Honorable
Sep 8, 2012
472
0
10,960
I don't mind DOAs at all. I'd much rather have immediate failure than waiting for days/weeks for some shoddy assembly to accumulate into a failure. The Rebuild Time on a DOA unit is Zero, for me. Gimme that, any time.

The one thing we're studying is the greatly lowered utilities cost of SSD replacing HDDs. They don't quite pay for themselves in the first few months, but in a year? It's pretty amazing. Instead of throwing TVs out windows, we may start chucking A/C units out there.
 

TRENDING THREADS