News Samsung 990 Pro SSDs Report Rapid Health Degradation

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
I have a Samsung 970 EVO drive that got a bad block and Windows write-protected it. It's replaced

However, my reasoning was that I was using PrimoCache on a small partition on the drive, which might have worn it out quickly.

No idea if that was an effect. PrimoCache did give me a significant perf boost from my magnetic drive, but I'll have to change plans if it can kill an SSD. Planning on getting a dedicated $20 SSD for this later on a PCIe adapter, but that's another subject.

Unlikely. That's like saying a swap file will wear out your drive faster because it keeps writing to the same area.

Due to a process called wear leveling, it will not wear out faster. You write to it the first time, you might write to nand cell 123456789. You write the same byte stream offset again, it ends up on cell 234567890. Write it again it ends up in cell 123459876. That's called wear leveling. It's designed to slow down cell degradation.
 
  • Like
Reactions: bit_user
But there doesn't seem to be any discussion of how the degraded drives are being used. Are they being used totally without heatsinks; or is there a motherboard heatsink? We know that heat can be a problem but no one reporting performance problems is saying whether or not their drives are mounted with heatsinks and what range of temperatures they normally operate at.

Quite true.

NAND actually writes easier with high temps. You have to pump it more and harder to write a specific voltage level when cold. But controllers and long term storage like colder temps.
 
  • Like
Reactions: bit_user and dwd999
I believe, it's nothing to worry about, as first few weaker cells dying early is quite normal, afterwards it should stay at 98-95% for very long time.
if it keeps dropping, then we should be worried. I would expect less of them on pricey drive, but, oh well.
 

Furbo

Prominent
Sep 17, 2022
6
3
515
Damn, my new 2tb 990 PRO is at 96% with 2173 GB writes.

While my two 1tb 970 EVO plus are at 100% and they are a year old.
 
  • Like
Reactions: bit_user
I've used Samsung SSD's exclusively for years now and never had any of these issues, nor have the folks I knew. Having said that, it's entirely possible they had a bad batch that got shipped out without being caught. This happens to every manufacturer so I always compare to how they've acted historically.
 
  • Like
Reactions: cyrusfox
I've used Samsung SSD's exclusively for years now and never had any of these issues, nor have the folks I knew. Having said that, it's entirely possible they had a bad batch that got shipped out without being caught. This happens to every manufacturer so I always compare to how they've acted historically.
Same here, minus the exclusivity. Samsung SSDs are famously good, and for a reason. I suspect the same as many other have speculated, bad batch/s.
 
  • Like
Reactions: cyrusfox

Palorim12

Distinguished
Their excuse for not further addressing the issue seems like an attempt to avoid having to address issues with low binned NAND that is being pushed to its limits by the controller.

A controller with good NAND management will have very reliable ways to monitor the NAND and measure the health before uncorrectable errors happen. Aside from that correctable errors are common with all flash storage, as well as disk based storage. The issue is every company has their own secret sauce for NAND management, wear leveling and tracking degradation from target reference.
All the end user has go to on is the wear and health info the drive maker spits out in the SMART data.

I stopped getting Samsung NVMe drives after the 970 pro, specifically due to their major downgrade in NAND, write endurance, and warranty.
Sounds more they're not further addressing because because their RMAs are done through a third party, in the Neowin writer's case, he's in Europe so it's done by a company called Hanaro, and they most likely just have factory tools and instructions provided by Samsung and how to test the units. I replied to the OP on overclock but I have experience with this type of support as I used to do it for a different company and they probably received all the info reported it to Samsung and in the meantime until Samsung tells him what to say they can't acknowledge it or do anything outside of just RMA when a customer asks for an RMA and if it passes the testing following Samsung's instructions that they have for factory testing they send it back as NDF.
 

bit_user

Titan
Ambassador
My 970 EVO and 980 Pro are reporting 98% and 97% respectively. However, I've noticed that whichever one is the OS drive ends up having bit rot corrupt a few files here and there every month (after running scandisk and checksumming against my backup). It's always old data files that haven't been accessed in a long time. But when used as a storage drive instead of an OS drive, the issue goes away.
That should never happen, unless a storage device is on its way out. It not happening "when used as a storage drive" should make no difference, unless there's some kind of firmware bug. Did you check that it was using the latest firmware? Unless I confirmed that it was a known issue and addressed in a new firmware version, I would immediately replace the device.

The SSD's own controller is supposed to do a patrol scrub of the blocks, and rewrite any with too many correctable errors.

Speaking of which, I wonder what SMART reports about the drive. Does it indicate any uncorrectable reads? Because there are other ways to get bad data - it could be an interface-level error or even a problem on the host hardware or software.

On Linux, filesystems like BTRFS and OpenZFS keep their own checksums and will tell you when you're getting bad data - so, no need to manually check against backups.

Maybe the controller does not parity check older files often enough for correction that they build up on unused data? If the controller only parity checks files that are used more often and less used files less often then I could see this happening.
No. NAND flash is based on charge-storage. Since the cell charge in modern NAND chips will decay much sooner than the warranty period of the drive, the controller must periodically scan & refresh the drive contents. A little bit like DRAM refreshes, but we're talking about days or weeks, rather than like microseconds.

This is also why you can't use modern SSDs for cold storage.
 
Last edited:

bit_user

Titan
Ambassador
I will say "drive life left" is a bit of a mystery as there is no clear standard spec here. Is it based on spare cells left? Or how quickly the nand cell is to flash to the appropriate voltage? Cold cells have to be pumped harder.
The SMART stats shown in post #25 include a stat called Available Spare. It would be nice to know what that was showing.

My guess is that the controller is getting a high rate of errors, during its background scrubbing process, forcing it to do lots of rewrites. This will elevate its internal write counters, even while the amount of host writes might remain fairly low. If I'm right, then the remaining life should continue to degrade even without any further host writes.

I believe, it's nothing to worry about, as first few weaker cells dying early is quite normal, afterwards it should stay at 98-95% for very long time.
You're assuming only a few blocks are bad, and that they'll get remapped to spare capacity. That might be correct, but it's not the only possible explanation.
 
Feb 1, 2023
1
1
15
I hope its just a report error. My drive health is as following:
ModelHealthPower On CountPower On HoursTotal Host ReadsTotal Host Writes
850 PRO 1TB​
98%​
1676​
40880​
N/A​
40305 GB​
960 PRO 1TB​
96%​
1049​
24125​
49239 GB​
48934 GB​
970 EVO Plus 2TB​
100%​
173​
4715​
3200 GB​
4485 GB​
990 PRO 2TB​
92%​
2​
242​
951 GB​
1308 GB​
(As of last night)

The 990 has less read/write yet it has worst health percentage when compare to others.
 
  • Like
Reactions: bit_user
Feb 4, 2023
1
0
10
93% after only one month of use. What a joke Samsung quality has become!

samsung-990-pro-health.png
 
Status
Not open for further replies.