News Study Shows SSDs Are More Reliable than HDDs

naughty_mage

Commendable
Aug 23, 2018
3
1
1,515
0
what idiot did the math on this article? "Counting the annual failure rates of Blackblazes drives; all of the company's hard drives netted a failure rate of 10.56% "
Umm, 172k drives and a total failure count of 348 drives is not even a fraction of a percent. You don't total up all the average failure rate percentages of the drives, you take the total drives and divide by total failed drives. as they show at the bottom, 0.85 % not bloody 10+% geez.
 
Reactions: phenomiix6

chaz_music

Distinguished
Dec 12, 2009
62
9
18,565
5
what idiot did the math on this article?
Had to laugh at the quote. This is very typical of misunderstood engineering math. But the number in the article is correct. The trick wording is "annual failure rate" and how you calculate that. In reliability statistics, this is called FIT rate.

The drive population in the study is 172K unit, but the population ** age** is much older than 12 months. The average HDD age in the study is 49.63 months but the average SSD age is only 12.7 months. They are calculating FIT rate (failures in time). By using their numbers, I calculated a slightly larger FIT than they did. With FIT rate, you can also approximate MTBF (mean time between failures).

Another note: Having spent a large portion of my career working with reliability, I always find it interesting that we miss the entire picture with "study bias". In this case, the design and failure modes are not being checked for any other possible differences that also are effecting the numbers. If most of these SSD were actually SATA and SAS type SSDs, those also have the nice feature that they are totally enclosed so the user does not touch the PCB.

In other words, the risk of ESD damage is much lower (static discharge).

Due to legacy HDD design structure and no industry interest to address the failure modes due to packaging, almost all rotating HDDs have their control PCB exposed. So the user / installer can inadvertanly touch it. ESD damage is VERY pervasive in the electronic industry and is considered the highest single failure mode other than user error.

If they parse the data into which SSDs are enclosed SATA/SAS versus open PCB M.2 types, they might actually see somthing very interesting. I would bet money that the M.2 SSD have a higher annual failure rate than SATA type SSDs.
 

spongiemaster

Respectable
Dec 12, 2019
1,352
629
2,060
0
Had to laugh at the quote. This is very typical of misunderstood engineering math. But the number in the article is correct. The trick wording is "annual failure rate" and how you calculate that. In reliability statistics, this is called FIT rate.
You're right, but it was still an idiot that wrote the article. The company is Backblaze, yet the article calls them Blackblaze every single time including the image credit to them.
 

Heat_Fan89

Prominent
Jul 13, 2020
229
86
670
3
The article brings up a good point. How many of those SSD's are being used primarily as boot drives? If that's the case as the article points out the SSD should be more reliable. It would have been better to compare both the SSD's and HDD's under similar workloads to have a more accurate picture.
 

USAFRet

Titan
Moderator
Mar 16, 2013
144,679
8,675
175,340
22,579
The article brings up a good point. How many of those SSD's are being used primarily as boot drives? If that's the case as the article points out the SSD should be more reliable. It would have been better to compare both the SSD's and HDD's under similar workloads to have a more accurate picture.
In the context of Backblaze use, "boot drive" is not a thing.
 

watzupken

Notable
Mar 16, 2020
474
198
870
1
I can agree to this fact that SSDs tend to be more reliable that HDD. At least based on my hardware experience over 2 decades, the switch to SSDs is probably the best thing that happened for PC. Not only does it make everything snappy and fast, I also don't experience that much of issues with HDD such as bad sector, need to defrag and failing drives over the last decade since my first SSD, the Intel X25 G2, which by the way is still alive and healthy despite being around 10 years old. HDD in the past can last for a long time, i.e. I have drives that lasted more than a decade, but current ones are quite crappy, especially the case if you are buying cheap ones. So the downward race to cut cost, accelerated the demise of HDD.
 
The title seems misleading after you read the article.
Measuring failure rates between data sets who's age is so drastically different will, of course, yield different results. Then you have the use case of the data sets (boot drive vs data drive)...:rolleyes:
 

Heat_Fan89

Prominent
Jul 13, 2020
229
86
670
3
The title seems misleading after you read the article.
Measuring failure rates between data sets who's age is so drastically different will, of course, yield different results. Then you have the use case of the data sets (boot drive vs data drive)...:rolleyes:
Which is why I asked the question. Is it fair to compare a 1.5 yr old drive that's solely used as a boot drive vs a much older drive that is been pounding on a continual basis? I would say that the SSD would fair much better.
 
Reactions: alceryes

spongiemaster

Respectable
Dec 12, 2019
1,352
629
2,060
0
Which is why I asked the question. Is it fair to compare a 1.5 yr old drive that's solely used as a boot drive vs a much older drive that is been pounding on a continual basis? I would say that the SSD would fair much better.
None of the drives are boot drives. The age difference is a valid critique.

The one thing I'll say on hard drive reliability. Every time I've had an SSD die, there was no warning and 100% data loss for the drive. When I've had mechanical drives fail, there are almost always warning signs that things are going south and often I can get all the data off before replacing it. For that reason, I doubt I'll ever use any SSD's for long term storage or backups.
 
The one thing I'll say on hard drive reliability. Every time I've had an SSD die, there was no warning and 100% data loss for the drive. When I've had mechanical drives fail, there are almost always warning signs that things are going south and often I can get all the data off before replacing it. For that reason, I doubt I'll ever use any SSD's for long term storage or backups.
You should always have a backup - HDD or SSD but, I agree with this.

I had a very sneaky issue with a WD NVMe drive that turned out to be a failing controller (worked with WD support and they replaced it under warranty). Everything was reporting fine with the drive (including SMART) except that every once in a while the drive would just 'take a timeout'. Not become unrecognized or anything like that, just slow to a crawl for 10-20 seconds. It was playing havok with Windows and programs until I got it replaced.
 
Reactions: Phaaze88

naughty_mage

Commendable
Aug 23, 2018
3
1
1,515
0
"Blackblaze uses SSDs as boot drives alone in its servers, so these SSDs could also have less of a workload compared to the actual hard drives where are constantly being used to backup client data. " refuting spongiemaster and USAFRet's statement that boot drives aren't a thing. Servers DO have boot drives to run the OS and store other critical information on the server configuration.
So SSDs that are VERY limited use of read/writes and are also consistently MUCH lower hours of operation cannot be compared to the POINT of the operation, the large mechanical drives holding all the data and performing all the read/writes.
Idiot writing article also called it Blackblaze instead of Backblaze 9 times out of 10 if you count the image captions...they got it right ONCE!
 

spongiemaster

Respectable
Dec 12, 2019
1,352
629
2,060
0
"Blackblaze uses SSDs as boot drives alone in its servers, so these SSDs could also have less of a workload compared to the actual hard drives where are constantly being used to backup client data. " refuting spongiemaster and USAFRet's statement that boot drives aren't a thing. Servers DO have boot drives to run the OS and store other critical information on the server configuration.
So SSDs that are VERY limited use of read/writes and are also consistently MUCH lower hours of operation cannot be compared to the POINT of the operation, the large mechanical drives holding all the data and performing all the read/writes.
Idiot writing article also called it Blackblaze instead of Backblaze 9 times out of 10 if you count the image captions...they got it right ONCE!
I didn't say servers don't have boot drives.

That said, I was still wrong in my other comment. All the other Backblaze reports I had looked at in the past were only data drives. However, all the drives in this particular comparison are boot drives. Both SSD's and mechanical. These are the 2nd and 3rd sentences of the most recent quarterly report:


"Of that number, there were 3,187 boot drives and 172,256 data drives. The boot drives consisted of 1,669 hard drives and 1,518 SSDs."

Again, 2nd and 3rd sentences. Drive counts match those in the posted chart. So I don't know why this Tom's article has the following:

"Also, Blackblaze uses SSDs as boot drives alone in its servers, so these SSDs could also have less of a workload compared to the actual hard drives where are constantly being used to backup client data. "

Find some writers that care THG. As of this post, the name of the company still hasn't been fixed in the article. This whole article is terrible journalism.
 
Reactions: TJ Hooker

TJ Hooker

Champion
Ambassador
Had to laugh at the quote. This is very typical of misunderstood engineering math. But the number in the article is correct. The trick wording is "annual failure rate" and how you calculate that. In reliability statistics, this is called FIT rate.

The drive population in the study is 172K unit, but the population ** age** is much older than 12 months. The average HDD age in the study is 49.63 months but the average SSD age is only 12.7 months. They are calculating FIT rate (failures in time). By using their numbers, I calculated a slightly larger FIT than they did. With FIT rate, you can also approximate MTBF (mean time between failures).
Err, none of that is relevant for the calculation that Backblaze did for the value of 10.56% failure rate for HDDs. For calculating that %, the HDD count is 1669 and the failure count is 44. The measurement period is 3 months (1/1/2021 - 31/3/2021). The failure rate for that period is 44/1669=2.64%. Converting that to an annualized basis: 2.64% x (12/3) = 10.56%. The age of the drives is irrelevant in this instance.

The article misrepresented the statistic, saying it was the failure rate for all HDDs. In reality, it was just the HDDs that were used as boot drives during Q1 2021.

Edit: Although now I'm kinda curious what sort of calculations you did that ended up with similar results despite seemingly using completely different data...
 
Last edited:

ASK THE COMMUNITY