[SOLVED] Mysterious throttling in HDD during zero-fill

Status
Not open for further replies.

guferr

Distinguished
Nov 15, 2009
44
5
18,535
My 1 TB HDD (WDC WD10SPZX-24Z10) is displaying a weird throttling behavior while I'm trying to wipe it.

It's writing speed is dropping dramatically for all regions after doing a long (> 10 GiB) zero-fill operation in Linux (Mint 18, running on a thumbdrive) using dd.

And, for some reason, if I wait a couple minutes and try again, it goes back to its normal write speed.

Obviously, the first thing I thought is that it should be a thermal throttle, but hddtemp said it was only at 46 °C right after a 330 GiB fill (which I interrupted for it was taking absurdly long).

Anyway, I let it cool to the lowest temperature it would reach while idle, 40 °C, then made a bash script looping these commands, repeating 0.98 GiB fills in the beginning of the HDD:

Code:
hddtemp /dev/sda
nice -n -20 dd if=/dev/zero of=/dev/sda bs=209715200 count=5 oflag=direct

and put a cloth over the laptop where the HDD is located to make it heat faster.

From 40 °C to 48 °C it kept stable writing speeds, from 60 to 56 MB/s every time.

Didn't make sense, so I let it cool down to 40 °C again, and gave this command, to make a single 19.5 GiB fill:

Code:
nice -n -20 dd if=/dev/zero of=/dev/sda bs=209715200 count=100 oflag=direct

now without the cloth nor anything to hinder the cooling.

Right at the end the temperature was 44 °C only, and I repeated the same script I ran before.

Now the speeds would vary from 22 to 20 MB/s, while the temperature kept constant at 44 °C.

I let it running for over 20 mins, the speed would never get better.

Then I stopped it for 2 minutes only. The temperature didn't even change, it was still 44 °C, and the write speed came back to 60-56 MB/s when I ran the script again.

Nothing here makes sense.

If this isn't thermal, why is it happening and why letting it quiet for 2 minutes make its speed come back to normal, while a 10-20 second interval isn't enough?

It just feels like it's something thermal, but makes no sense. Why repeating 0.98 GiB fills over and over, with a cloth to hinder cooling until it reached 48 °C wouldn't make it throttle, but a single 19.5 GiB fill without anything to hinder cooling does?

Not to mention this temperature is far too low for it to throttle too.

I'm clueless.

Note that the HDD was tuned with hdparm to disable power management and write cache, as it was going to be a zero-fill, and dd was using the "direct" output flag, so there's no cache to worry about, it should be a direct I/O process.

My laptop is an IdeaPad S145-15IWL 81S90008BR, with an Intel Core i5-8265U CPU, Nvidia MX110 GPU, 8 GB DDR4 RAM.
 
Last edited:
Solution
Sounds like a friction problem. You say multiple small files don't really affect it, but one large does. If one of the stepper motors is running dry, with a small file it'll only get so warm before docking and dissipating it's heat. With a large file, that doesn't happen. It's like rubbing your hands together for 2 seconds, then pause, you can do that a bunch of times and never get hot palms. Do it once for 30 seconds and you'll feel it.

The stepper motors are mobile, so very little is directly attached to the frame, so the casing changes or insulation does not affect them in a big way. 48°C at the polling site is still within specs, but that's a single point, not reflective of other parts inside the casing. Just like motherboard...
What/how are you trying to wipe the drive?

Does the drive really need to be wiped vs just reformatting the the drive?

There could be a number of reasons - update your post to include full system hardware specs and OS information.

How I'm trying to wipe is already absolutely clear. I said I'm using a live Linux Mint 18 on a thumbdrive and that I used the dd command to wipe, I even copied the exact command I used in the post.

I'll add the rest of the hardware info then.
 
Sounds like a friction problem. You say multiple small files don't really affect it, but one large does. If one of the stepper motors is running dry, with a small file it'll only get so warm before docking and dissipating it's heat. With a large file, that doesn't happen. It's like rubbing your hands together for 2 seconds, then pause, you can do that a bunch of times and never get hot palms. Do it once for 30 seconds and you'll feel it.

The stepper motors are mobile, so very little is directly attached to the frame, so the casing changes or insulation does not affect them in a big way. 48°C at the polling site is still within specs, but that's a single point, not reflective of other parts inside the casing. Just like motherboard temps might be 48°C but stick a finger on the Sata chipset and you'll feel what upper 90's feels like.

Drive is bunk basically.
 
  • Like
Reactions: Ralston18
Solution
Sounds like a friction problem. You say multiple small files don't really affect it, but one large does. If one of the stepper motors is running dry, with a small file it'll only get so warm before docking and dissipating it's heat. With a large file, that doesn't happen. It's like rubbing your hands together for 2 seconds, then pause, you can do that a bunch of times and never get hot palms. Do it once for 30 seconds and you'll feel it.

The stepper motors are mobile, so very little is directly attached to the frame, so the casing changes or insulation does not affect them in a big way. 48°C at the polling site is still within specs, but that's a single point, not reflective of other parts inside the casing. Just like motherboard temps might be 48°C but stick a finger on the Sata chipset and you'll feel what upper 90's feels like.

Drive is bunk basically.

I didn't say files, I said "fills", they are separate I/O operations with virtually no interval between them. Each one took around 17 seconds, and as soon as one finishes the script starts the next instantaneously, doesn't take even 0.1 s until the HDD starts writing again, there's no time for it to cool down.

Also, if it could cool down in this script, then it would cool down when I ran it afterwards the 19.5 GiB fill. But as I said, it kept going for 20 min and it wouldn't speed up. I had to give it 2 minutes then repeat the script.

Plus: Stepper motor?

The actuator that moves the head isn't a stepper motor, it's linear (in newer drives, only in very very old ones it was a stepper). The motor spinning the disk is, but it's always at constant speed, doesn't change due to these operations, specially considering I fully disabled power management, so it never turns off even when idle.

The actuator doesn't have much reason to heat during a purely sequential write operation either.

It's a zero-fill, not files being written in a file system which might be fragmented.

The head doesn't have to seek too much, it's writing continuously during the zero-fill and not seeking between distant addresses.
 
Last edited:
"and put a cloth over the laptop where the HDD is located to make it heat faster. "

A thumb drive is not the same as a HDD.

And heating a HDD faster is just something that "does not compute" in my mind unless in some sort of frigid environment.

Will defer to @Karadjine. All is now out of my comfort zone.

P.S. Hardware specs?
 
"and put a cloth over the laptop where the HDD is located to make it heat faster. "

A thumb drive is not the same as a HDD.

And heating a HDD faster is just something that "does not compute" in my mind unless in some sort of frigid environment.

Will defer to @Karadjine. All is now out of my comfort zone.

P.S. Hardware specs?

Yes, a thumb drive is not the same as a HDD, and?

I'm not talking about a thumb drive, I'm talking about a HDD.

I think you didn't understand what I wrote, I said the OS (a live Linux) is running in a thumb drive. It has nothing to do with the rest.

I think that bit was pretty clear, the HDD I'm talking about is a WDC WD10SPZX-24Z10, I said this right in the beginning.

Also, I already added hardware specs much before your reply. They're in the end of my post.
 
Oh, hah, forgot about those. And yes, you are correct about the steppers, they got replaced in higher density drives by voice coil actuators (they don't suffer from alignment problems that steppers do) , but those are different from the spindle motors which spin the platters at the constant rate.

However, the voice coil actuators do suffer from amplitude issues, since they are constant voltage but variable current actuators. Can happen with the digital (controller) to analog (actuator) conversion.
 
Last edited:
It's an SMR model:

https://hddscan.com/blog/2020/hdd-wd-smr.html

The throttling occurs when the CMR cache becomes full.

Well, this explains some of the behavior I'm seeing, although I don't quite understand how this results in everything I'm seeing here.

I did tests writing 128 MiB blocks and checking how it behaved with repeated writing over the same block, but changing between zero-fills and one-fills. Again, it would never slow down.

Then I did tests writing 128 MiB zero blocks, but checking if sequential or alternated writing would behave differently.

By sequential I mean that I made a script to automatically seek the next 128 MiB block and zero-fill it every time it finished the last one, and in the alternated writing it would alternate between writing blocks in two different regions, 5 GiB apart.

The results were pretty different.

In sequential writing there were 3 speed "plateaus", so to say. First it kept around 85 MiB/s for the first 4 or 5 blocks, then it fell to around 45 MiB/s, kept this speed for more 40 blocks and then it fell to around 22 MiB/s, which it keeps forever.

In the alternated writing there are only 2 speeds, and it manages to keep the 85 MiB/s speed for much more blocks, about 70 blocks, then suddenly falls to 22 MiB/s at once, without that 45 MiB/s plateau.

For wiping up to 10 GiB, the alternated writing is much, much faster than sequential (over 2 times faster), but for wiping the whole disk this isn't relevant.

Is there any way I could optimize the algorithm to consistently keep a greater speed to wipe the whole drive?
 
Is there any way I could optimize the algorithm to consistently keep a greater speed to wipe the whole drive?

I can't see how. The CMR cached regions eventually need to be flushed to the SMR regions to make way for new data. This normally happens during idle periods.

One thing you might like to experiment with is TRIM/UNMAP. These drives support this feature, so it would be interesting if issuing TRIM/UNMAP commands after the SMR cache becomes full will free up the SMR cache and allow the drive to continue writing at full speed. Note that after TRIM-ing, the drive returns zeros for the affected LBAs.
 
Last edited:
I can't see how. The CMR cached regions eventually need to be flushed to the SMR regions to make way for new data. This normally happens during idle periods.

One thing you might like to experiment with is TRIM/UNMAP. These drives support this feature, so it would be interesting if issuing TRIM/UNMAP commands after the SMR cache becomes full will free up the SMR cache and allow the drive to continue writing at full speed. Note that after TRIM-ing, the drive returns zeros for the affected LBAs.

Well, I found a solution which I should have looked for before.

Using the security erase option of the HDD itself, through hdparm.

This drive supports enhanced security erase and seems like it takes only 180 min.

It's a single ATA instruction and not several write commands, the drive does it in the fastest way possible.

It probably has an optimized internal procedure for this.
 
I'm still trying to wrap my head around the why. I think I used military erase maybe once, donkeys years ago, on a hdd, because back then a 540MB drive was $500, so it was getting sold.

I know where my wife works, when a hdd becomes aged, it's removed then the board is spiked with 50v and a giant magnet left to sit on it for 10 minutes. After that, it's tossed.

So this entire procedure of securely erasing a hdd by writing zero fills has me stumped. Interesting reading for sure, all about learning something new, but a 1Tb hdd can't be worth the trouble.
 
Status
Not open for further replies.