[SOLVED] 3rd Storage drive dying in the span of 11 months

slavi_asenov2002

Reputable
Apr 11, 2018
109
5
4,695
3
I have made a post a couple of months ago with the same stuff going on, same model drive, same everything.
https://forums.tomshardware.com/threads/drives-failure-in-the-span-of-3-months.3760384/#post-22683580

It is 11 months exactly as my Seagate 2tb HDD died in the span of 30 minutes after a game started crashing, not responding.
As I have described in the previous post, I got a 2TB Samsung 870 EVO as a replacement that died like 3/4 months after I bought it. The health suddenly dropped - the bad sectors become more and more and in like 1 hour all data stored there was unusable.
The images of HD sentinel on the 1st Samsung drive - https://prnt.sc/4VLp5pAlX_W2 , https://prnt.sc/x037D_CmhwQo
And same one for the replaced Samsung drive - https://prnt.sc/0yxC_rtzvHf9 , https://prnt.sc/yqTXurx0e7M0

The 1st Samsung drive was given to the shop and later sent to some service in another country, which determined that the cause of the failure is "uncertain", they sent me a replacement.
When I got it I wondered what could be the cause of it and I wondered about my PSU, PSU cables and so on. As I was not able to afford a new better PSU then I just used the power cable used by the ADATA SSD (which had no problems and still has no problems) and used the one that were my failed drives on the ADATA.
And boom my second Samsung drive is dying as I am writing this. I have contacted the shop again they will send it again to the service, however I baffled, annoyed and to be honest angry to myself for not upgrading my PSU just because it "works".

As this is the third drive that is dead for these 11 months I am beginning to wonder what on earth is happening? I am assuming the PSU has deteriorated and it cannot handle spikes and in those spikes it does kill something (in this case the SSD)?
Any ideas on possible causes?

I remember that today the electricity stopped unexpectedly a couple of times. i turned on my PC - electricity stops for 5 min - I wait another 10 - turn it on, work - stops for 5 min - wait .... so on and so on. That was 4 times this day. To be honest it is not that uncommon and I have one of those power strip with protection for my PC and Monitors. I think that it is worth noting.


System Specs:
MB: Asrock AB350M Pro4 - bought June 2018 new
CPU: Ryzen 5 2600(now stock it was OC'ed before for some time 4.0 1.26v) - bought June 2018 new
GPU: RX 5700 XT (undervolted 1.075v) - bought December 2019 new
PSU: System power 9 600w - bought June 2018 new
RAM: 2x8gb Corsair Vengeance 3200mhz - bought around 2020 new
Storage: 120gb ADATA SU650 - bought around 2020 new
240GB Kingston A400 - bought around 2020 new
2TB Samsung 870 EVO - replaced on May 2022

Edit: I am now using Samsung Magician's diagnostic scans:
For now the "short scan" does not detect anything bad,
the "short SMART self-test" - does not detect anything
the "Extended SMART self-test" - fails
After this scan I got "Failing LBA" message
the "full scan" - failed - no log

Image of the SMART scans:
https://prnt.sc/dRfxzq-cuA-s
 
Last edited:

Darkbreeze

Retired Mod
Have all three drives that have had problems been connected to the exact same motherboard SATA header?

Have all three drives with problems been using the exact same SATA power cable from the PSU?

Yes, I would get a better power supply. Not only is that not a terrific unit, very low quality for a Be Quiet product, but it is also a year past it's three year warranty period and that means an already not great product now has a fair number of miles on it so any shortcomings it had when new are likely amplified now and among those could very reaslistically be high ripple and questionable voltage regulation and those are things that could have a real impact on directly connected devices like drives which lack the filtering components that might help to minimize such impact for devices that receive their power through the motherboard.
 

slavi_asenov2002

Reputable
Apr 11, 2018
109
5
4,695
3
Have all three drives that have had problems been connected to the exact same motherboard SATA header?

Have all three drives with problems been using the exact same SATA power cable from the PSU?

Yes, I would get a better power supply. Not only is that not a terrific unit, very low quality for a Be Quiet product, but it is also a year past it's three year warranty period and that means an already not great product now has a fair number of miles on it so any shortcomings it had when new are likely amplified now and among those could very reaslistically be high ripple and questionable voltage regulation and those are things that could have a real impact on directly connected devices like drives which lack the filtering components that might help to minimize such impact for devices that receive their power through the motherboard.
For the first question "Have all three drives that have had problems been connected to the exact same motherboard SATA header?" - Yes, the SATA data cable is the same and is connected to the same MB header.

For the second question "Have all three drives with problems been using the exact same SATA power cable from the PSU?" - The 1st failed (Seagate) and 2nd failed (the first Samsung SSD) they were using the same cable. This power cable was used only by the ssd. The current failing SSD is using the same cable as the other 2 were (the ADATA SSD is on the other cable)

When I build my PC it was using a 1060 so I took this PSU knowing that it would be enough. I do see myself now buying the Corsair RM850x 850W or 850W Seasonic Focus GX. The corsair one is cheaper so I will probably get that as I know it has some good reviews.

Your experience with the 870 Evo is far too common. Will there be a recall? Or a firmware "fix", as there was for the 840 Evo?

Samsung 870 EVO Not To Be Trusted!
https://goughlui.com/2022/08/20/notes-ssdraid-recovery-samsung-870-evo-not-to-be-trusted/
It is interesting indeed. I read the post that I gave and it puts me in some weird position if you can call it like that.
But the strangest of all things is that I have 2 friends running the 870 evo with only difference is that theirs are 1TB. They have absolutely no problem and here I am hitting my head.


---

Another thing to note is that I ran the full scan, it stopped unexpectedly (no wonder). It did not gave any log like the SMART tests but I opened HD Sentinel and the funny thing is that with the other 2 drives the bad sectors would be in the hundreds now but they are still 6. However the "Problems occurred between the communication of the disk and the host 30409 times." is the thing that has changed. Probably 50x times more than earlier.
 

Darkbreeze

Retired Mod
I have two 870 EVO drives that have been in one of my machines since these drives released in 2021 and I've not had any problems with them.

Considering this specific case involves the same problem on multiple models, I'm doubtful that the model is the problematic factor here.

First thing I would do is get or use a different SATA power AND data cable. Maybe there is a problem with one of them. I've seen cables cause problems or even ruin drives before.

I'd also try using a different SATA header on the motherboard if possible. Could be there is a problem with that SATA circuit on the board. Obviously this isn't a high probability possibility, but it happens sometimes.

After changing these things, retest to see if you still have a problem. I've seen bad cables, low power delivery and bad headers show up as failed tests before.

The RM850x is an excellent choice.
 
I'm not saying it's not possible, but I can't see how a flaky 5V power supply could be responsible for the SSD failures. All the onboard supply voltages are generated by tightly regulated switchmode supplies. These produce 2.5V and 1.2V for the NAND flash chips. There is also a boost converter which steps up the 5V supply to 12V. This Vpp voltage is required during NAND programming.

https://forum.hddguru.com/viewtopic.php?f=10&t=42829

Basically, as long as the 5V supply doesn't drop low enough to trigger a power reset, the switchmode converters should maintain stable output voltages.
 

slavi_asenov2002

Reputable
Apr 11, 2018
109
5
4,695
3
I have two 870 EVO drives that have been in one of my machines since these drives released in 2021 and I've not had any problems with them.

Considering this specific case involves the same problem on multiple models, I'm doubtful that the model is the problematic factor here.

First thing I would do is get or use a different SATA power AND data cable. Maybe there is a problem with one of them. I've seen cables cause problems or even ruin drives before.

I'd also try using a different SATA header on the motherboard if possible. Could be there is a problem with that SATA circuit on the board. Obviously this isn't a high probability possibility, but it happens sometimes.

After changing these things, retest to see if you still have a problem. I've seen bad cables, low power delivery and bad headers show up as failed tests before.

The RM850x is an excellent choice.

So basically:

The SATA data cable can be faulty.
The SATA power cable can be faulty(PSU).
The MB can be faulty (the SATA port on the MB).

Each one of them could be the cause of the bad sectors occurring on 3 separate drives. That is so... idk. The thing that I did is to change the SATA port that the data cable is connected to the only free one and I change the SSD to 2nd power cable (that my OS SSD is on). I still doubt the power cables just for the fact that the other 2 SSDs are not on the same cable but on different and they do not have a problem.. Still does not mean it is impossible, even the MB can be faulty, can't be?
I also doubt the SATA data cable to be the sole problem just because I have never heard of a data cable causing bad sectors, I have heard numerous times of SATA power cables to do that.

As my further actions, I should:
Change the data cable, change the sata power cable.
Change the PSU and basically hope the MB is not the problem (like here https://forums.tomshardware.com/threads/can-a-faulty-motherboard-cause-ssd-failure.3741122/post-22558114). (Not like the MB is the more expensive part from those 2. The PSU is like 150euros whereas the MB like GIGABYTE B450 AORUS ELITE v2 / MSI B450M PRO / ASRock Fatal1ty B450 Gaming K4 are all like 90euros. Of course there is the MSI B450 GAM PRO CARBON MAX WF which is 130 euros)

Edit: so I changed the sata data cable shortly like half an hour ago and for like 20 minutes I am just randomly turning on games playing them for 4-5 minutes, them. And so far nothing has crashed, the message "Problems occurred between the communication of the disk and the host 30814 times" was before I turned off to change the arrangement of the sata cables is still the same. I am going to do the Samsung's full scan so I can see if that number changes as when I tried the first time it went from like a thousand to 30k. And I will give an update after the scan is complete.
 
Last edited:

Darkbreeze

Retired Mod
I'm not saying anything can, or cannot, be the cause of anything............with any certainty. What I AM saying is that for whatever reason I have seen EACH of these things cause problems in the past. Anybody who says otherwise, is simply lacking in any real experience or simply hasn't encountered the same issues I have in the past. Or, they are full of crap. Not talking about fzabkar either, as I know he is highly knowledgeable when it comes to storage devices. But that still doesn't mean he's seen it all or that things that shouldn't be possible don't occasionally happen. They do, and often it is for reasons that we wouldn't have thought of or that don't seem directly applicable to the problem you are seeing, but that regardless of either of things, is the cause.

As far as the power source killing drives, frack, are you kidding me? It happens ALL the time.

There are myriad ways in which a faulty or poor quality power supply can kill or incrementally damage a drive. What things "should" do aren't always what they "do" do. A PSU that lacks or has a faulty protection circuit or has extremely poor voltage regulation, or is bombarding a device with ripple over time, can all certainly be contributors. Much as for ANY device connected to a questionable power supply including motherboards, graphics cards, etc.

Sure, the motherboard COULD be the problem but if we want to look at it from a logical viewpoint, and we always do, then a power supply that we know wasn't terrific when new, and is now past it's warranty period (Manufacturer felt that unit could only be counted on to remain reliable for three years, who am I to disagree with them about their own product), is a lot more likely to be the cause of any problems than a fairly decent motherboard that is not seemingly having any problems in regard to any other hardware or functionality. That still doesn't mean it can't be the board, or something else, but I'd prefer to play the odds in my favor and in this case the odds (And common sense) say replace a PSU that's "meh" with one that's good, even if you didn't have any problems happening yet at all because one thing that is incontrovertible is that a bad or poor quality power supply WILL, WILL, stand a very good chance of incrementally killing connected hardware over time. Especially if it has high ripple and the connected devices use capacitors. Which is almost everything.
 

slavi_asenov2002

Reputable
Apr 11, 2018
109
5
4,695
3
I ran the full scan. After the scan the number here "Problems occurred between the communication of the disk and the host 30814 times " has not changed, only the amount of the bad sectors has increased from 7 to 9.

My course of action will probably be to return the SSD again to the vendor and buy the RM 850x.
 

slavi_asenov2002

Reputable
Apr 11, 2018
109
5
4,695
3
The Short and Extended SMART tests do not transfer any data over the SATA interface. Instead, they run wholly within the drive. Therefore, you should see no change in the UDMA CRC Error Count, as was indeed the case.
This was the full scan not SMART test. The short smart test is returning no errors while the extended smart stops after finding a bad sector.
 
I'm not saying anything can, or cannot, be the cause of anything............with any certainty. What I AM saying is that for whatever reason I have seen EACH of these things cause problems in the past.
I'm just trying to get my head around this failure scenario.

In a 3.5" hard drive, the preamp on the headstack typically gets a +5V supply directly from the SATA power connector, sometimes via a small resistor and/or an LC filter. The filter would clean up some high frequency noise but would do nothing to alleviate sags or surges. The preamp also gets a stable, tightly regulated -5V supply which is derived from the +5V supply via a switchmode converter.

M.2 SSDs are supplied from +3.3V, and the onboard supplies are mostly generated by step-down switchmode converters. The one exception may be the NAND Vcc. This supply rail is sometimes derived from a switchmode step-down converter, but in other cases the SSD makes direct use of the 3.3V supply, albeit after filtering it with an LC filter.

I can see how an unstable supply could result in reading and writing problems in those case where the storage elements are directly exposed to the +5V or +3.3V supply, but the present case does not appear to be one of these.

Furthermore, Samsung's SSDs have an e-fuse at the +5V input. These clamp the output voltage at less than 6V, so the rest of the electronics is protected from surges. Of course, if there is a sustained overvoltage, this may kill the e-fuse (and often does), and in some cases it may punch through the e-fuse and damage other components.

BTW, I understand that bad PSUs, and user error, can kill SSDs and HDDs. In fact I have written several tutorials and FAQs on the subject. Moreover, I typically see at least one case per week in various forums.
 
Last edited:

Darkbreeze

Retired Mod
I have no argument with any of that, and again, I'm also not saying that the power supply IS the problem. I am simply offering a suggestion in regard to the PSU given it's probable low quality and age, and since we know power supplies are commonly the root of all evil, so to speak LOL, it seems like a good place to start since we know that as you say as well, power supplies can kill SSDs and HDDs. Whether that has any relevance to THIS specific use case, clearly I can't say it does or does not with any certainty.

So, what do you think has killed two drives and is killing a third one?
 
The data recovery professionals will tell you that Seagate's drives are the least reliable of all the HDD manufacturers. Moreover, they say that when they start to fail, they fail relatively fast.

As for that thread at techpowerup.com, one user had 14 x 870 Evos, and 7 of them failed in quick succession. If you just read the first 3 or 4 pages, I believe you'll come to the conclusion that this model is a dud.

I agree that normally one would suspect the power source, but to me, the OP's case is a coincidence.
 
Last edited:

slavi_asenov2002

Reputable
Apr 11, 2018
109
5
4,695
3
I find it strange that I have another old PC that is used for at least 5-6 hours a day has an old Seagate and Toshiba HDD and both are like on 0/10% health for 5-6 years have so many bad sectors and still I have no problem with them.

Of course I went on and bought the rm850x and have sent the SSD to the vendor. I will see what comes next, would they either send a new one or if there is an option to replace the model I would gladly not choose the 870 Evo but still with my "luck" with storage devices I feel like I will come up to another such failing storage device in the next year😀

Btw the SSD did not die like the other ones. Before I packed it for the vendor it was still running no problem, everything would open. However there were occasion when it would crash again and increase the "uncorrectable error count' from 20 to 35. So at least it did not die in the span of an hour like the other ones. But still a new post in such a short time is the least terrifying imo
 
I still have a 40GB Seagate that has ~50 bad sectors and a 13GB Seagate with ~100 bads. Both were working for many years with these bad sectors, and I eventually retired both of them in working condition. Seagate hit a problem with their 7200.11 series, and they never really recovered their reputation after that.
 

Darkbreeze

Retired Mod
I have about fifteen Seagate drives of various types. HDDs, SATA SSDs including consumer and Iron Wolf models and some NVME M.2 SATA and PCIe drives as well. Most of these other than the newer NVME models I've had for anywhere from a few to many years and they've lived lives in my regular PCs for a while and now have been in my QNAP 9 bay NAS box for the last few years and I've never had a single problem with any of them. In like 35 years of working on computers I've owned a crap ton of Seagate drives and in all that time I've maybe had two out of dozens and dozens that ever died within the warranty period and none that didn't die within the warranty period that I didn't simply end up retiring after six or seven years simply because I didn't feel compelled to trust them anymore besides which by then they'd generally become grossly outperformed by whatever was more recent. I probably have ten of them that I've retired sitting in one of the drawers under my workbench and I'm positive that if I pulled any one of them out they would still work completely fine.

That is not to say they don't fail. They do. ALL drives can and will fail as you know. The question is never IF, it is always WHEN. And I have never personally seen ANY brand that in my experience at least has shown itself to be more prone to failure than any other brand EXCEPT for the Deathstar drives back in the day and some models of Sandisk SSD that myself and a bunch of other people around here all experienced multiple failures with that were likely a bad production run of the that model when they had an enormous sale of them just a few months before the buyout by Western digital.

So, IMO if you buy a quality model by a reputable brand you generally have about the same chance of a premature failure as any other quality model by any other reputable brand. And as far as the Backblaze data goes, pfffffttt. Whatever. Almost any drive tech will tell you their data is seriously flawed and generally not relevant to consumer use anyhow.
 
I have faith in BackBlaze's statistics. Their drives operate in server equipment, which presumably means that they live in an airconditioned environment and are powered by quality power supplies. A desktop environment would be comparatively harsh.

But most of all, BackBlaze has racked up millions of drive-hours of usage whereas each of us can only provide anecdotal, statistically irrelevant, individual experiences.

The most recent statistics would suggest that Seagate drives have a failure rate which is 10 times greater than HGST (the WDC models are actually rebranded HGST helium drives).

https://www.backblaze.com/blog/backblaze-drive-stats-for-q3-2022/
 

Darkbreeze

Retired Mod
I don't. They run those drives 24/7 - 365, which consumers don't do, plus they run consumer drives in a commercial server environment that almost certainly never allows those drives to power or spin down like they would the majority of the time in a consumer PC and unlike enterprise or NAS type drives, those consumer drives were never meant to be run under those conditions.

I'll take the individual experiences that reflect real world consumer PC usage any day if we're talking about consumer drives in a consumer use case. When you talk about it often enough with 13 other moderators, a ton of veteran members and assisting on a daily basis with all manner of forum thread types for the last 8 years straight, you kind of get a really good feel for what everybody has experienced and taken as a whole makes for something more than acecdotal evidence. But I also understand where you are coming from too as some do feel as you do. I just don't happen to in THAT particular regard.

Besides which, I've read a number of articles from very well respected people in this industry that indicates there are very good reasons to believe that Backblaze's data is flawed.
 
Here is a web site which, at face value, appears to support what you are saying:

https://www.hardware.fr/articles/962-6/disques-durs.html

There is no significant difference in the return rates for the various HDD brands. However, I wonder if the stats merely represent the DOA failure rates or the failure rates that occur during the retailer's window for product returns. If so, then the stats can't really be used an indicator of long term reliability.
 

USAFRet

Titan
Moderator
Mar 16, 2013
161,032
13,270
176,090
24,450
Here is a web site which, at face value, appears to support what you are saying:

https://www.hardware.fr/articles/962-6/disques-durs.html

There is no significant difference in the return rates for the various HDD brands. However, I wonder if the stats merely represent the DOA failure rates or the failure rates that occur during the retailer's window for product returns. If so, then the stats can't really be used an indicator of long term reliability.
And even then, the "stats" only count on a fleetwide basis.
It has nothing to do with the singular drive on a persons desk. No matter what make or model.
 

ASK THE COMMUNITY