Question Toshiba MQ01ACF050 has high UDMA error count (C7) and high Write error rate (C8) ?

Oct 17, 2023
7
0
10
I was given a Toshiba MQ01ACF050 (fw: AV001D) by a friend who no longer needed it. In it's previous life it was used as the primary storage in a laptop. It is currently inside an Orico USB enclosure (JMS578) and is serving as an additional offline backup. Ever since I got it, the drive has shown a steadily increasing UDMA and write error count. The error count increases no matter if it is connected via SATA cable, is in the enclosure or plugged into a laptop. ZFS and BTRFS do not complain. badblocks is clean.

Should I be worried about the drive? Below is the latest S.M.A.R.T data.

Code:
SMART Attributes Data Structure revision number: 128
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   050    -    0
  3 Spin_Up_Time            POS--K   100   100   001    -    1982
  5 Reallocated_Sector_Ct   PO--CK   100   100   050    -    0
  9 Power_On_Hours          -O--CK   090   090   000    -    4111
 12 Power_Cycle_Count       -O--CK   100   100   000    -    1354
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    239
192 Power-Off_Retract_Count -O--CK   100   100   000    -    204
193 Load_Cycle_Count        -O--CK   085   085   000    -    158150
194 Temperature_Celsius     -O---K   100   100   000    -    36 (Min/Max 21/62)
199 UDMA_CRC_Error_Count    -O--CK   100   100   000    -    51724950
200 Multi_Zone_Error_Rate   -O--CK   100   100   000    -    75346590
240 Head_Flying_Hours       -O--CK   094   094   000    -    2414
241 Total_LBAs_Written      -O--CK   100   100   000    -    11313106099
242 Total_LBAs_Read         -O--CK   100   100   000    -    12002295485
254 Free_Fall_Sensor        -O--CK   100   100   000    -    79
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

SMART Extended Comprehensive Error Log Version: 1 (64 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4110         -
# 2  Short offline       Completed without error       00%      4109         -
# 3  Short offline       Completed without error       00%      1313         -
# 4  Short offline       Completed without error       00%      1277         -
# 5  Short offline       Aborted by host               90%       763         -
# 6  Extended offline    Completed without error       00%       745         -
 
Solution
After more thought, ISTM that Toshiba is indeed monitoring zone switching, or head switching, events in attribute 0xC8.

Each head has slightly different physical characteristics, so each surface needs to be tuned to match. The result is that each surface is recorded with different VBPI and VTPI settings (Variable Bits/Tracks Per Inch).

The tracks on each platter would be recorded with slight eccentricity, resulting in a sinusoidal wobble. To account for this Repeatable Runout (RRO), a compensatory signal is injected into the track servo. WD refers to this as Rotational Acceleration Feedforward (RAFF).

When switching between tracks and heads, there is the problem of track skew and head/cylinder skew. To address this problem, the drive...
Those errors would bother me, even though the result appears to be benign.

You could try to narrow down the source of the errors by first performing a read intensive task, and then following up with a write intensive task. For example, you could read 10GB of data from the drive and record the before-and-after SMART values, and then write 10GB of data and do the same.

To address write errors, you could try reseating connector PJ801 at the tail end of the PCB:

https://s.turbifycdn.com/aah/yhst-14437584971410/91707118-12.gif
 
Those errors would bother me, even though the result appears to be benign.

You could try to narrow down the source of the errors by first performing a read intensive task, and then following up with a write intensive task. For example, you could read 10GB of data from the drive and record the before-and-after SMART values, and then write 10GB of data and do the same.

To address write errors, you could try reseating connector PJ801 at the tail end of the PCB:

https://s.turbifycdn.com/aah/yhst-14437584971410/91707118-12.gif
I tried what you have suggested. While playing closer attention to the SMART attributes over time I noticed a few things.

There is a slow background increase in attributes C7 (UDMA) and C8 (Write Error / Multi Zone Error?).
During read intensive workloads the attributes C7, C8 and F2 (Total_LBAs_Read) increase steadily.
When I stop the read workload, there is a jump in C8 while the others plateau. This happened on a long read test.
During write intensive workloads the attributes C7, C8 and F1 (Total_LBAs_Written) increase steadily.
The rate of increase is the same in either purely read or purely write.

Reseating the connector did not help. Do I need to clean it? I have isopropyl alcohol.

At this point I will treat the disk as if it will fail at any point. I am more curious than anything else. I am open to any suggestions or tests.

Thanks for the help!
 
I'm wondering whether Multi-Zone Error Rate or Write Error Rate is the correct definition for attribute 200/0xC8. Your observations don't seem to be consistent with your disc read/write activity, so I'm very confused.

I was wondering whether there was a problem in one of the SATA Tx/Rx pairs, in which case there would have been a noticeable difference in the number of errors reported in 199/0xC7 during reading and writing. Since you didn't observe such a difference, I can only imagine that your host and drive cannot communicate reliably at the current link rate. Assuming that your data cable is OK, I would try reducing the link rate by selecting a different SATA port, eg switch from 6Gbps to 3Gbps.

This is a real case of corrosion:
https://goughlui.com/2023/03/11/notes-when-sata-cables-go-bad-or-rogue/

I think reseating the HDA connector should have been enough to burnish the contacts, so IPA probably won't make any difference.
 
Last edited:
I tried it out in a different system at 3 Gb/s and unfortunately there is no difference in behavior. While searching for answers I came across this Reddit thread about a Toshiba drive displaying similar behavior.

https://old.reddit.com/r/DataHoarder/comments/o86eqw/udma_crc_error_count_and_multi_zone_error_rate/

Archive of deleted post with images:
https://web.archive.org/web/2021062...ma_crc_error_count_and_multi_zone_error_rate/

I guess since the normalized values haven't reached the threshold, the drive might be fine(?) and the firmware is reporting strange raw values.

I tried wiping down the SATA contacts and it did not make a difference.

Would be funny if the drive eventually fails due to the high Load_Cycle_Count.

Code:
Device Model:     TOSHIBA MQ01ACF050
Serial Number:
LU WWN Device Id:
Firmware Version: AV001D
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database 7.3/5387
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Disabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

Code:
SMART Attributes Data Structure revision number: 128
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       2023
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       4137
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1360
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       239
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       204
193 Load_Cycle_Count        0x0032   085   085   000    Old_age   Always       -       158360
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       36 (Min/Max 21/62)
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       52458111
200 Multi_Zone_Error_Rate   0x0032   100   100   000    Old_age   Always       -       75390663
240 Head_Flying_Hours       0x0032   094   094   000    Old_age   Always       -       2423
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       11351377075
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       12655527269
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       79
 
This drive has around 8000 reallocated sectors, but is still showing a normalised value of 100 for attribute 05:

https://a.allegroimg.com/original/1...GB-SATA-III-2-5-2sztuki-inne-Model-MQ01ACF032

Therefore, I don't think we can trust the 100 scores for 0xC7 and 0xC8. In fact these attributes also have huge raw values, so I'm wondering if they reflect some other parameters.

One thing that raises my suspicions is that there is no Seek Error Rate attribute. I wonder if these 0xC7 and 0xC8 attributes are seek counts and host I/O counts? In fact Seagate's Seek Error Rate attribute counts the seeks in the lower 32 bits and the seek errors in the upper 16 bits. ISTR that I had a Fujitsu drive that also behaved similarly.
 
Last edited:
How far does your curiosity extend? Would you be prepared to stick a capacitor, say 1nF, across the SATA Tx pair or across the Rx pair? This would cause a communications error but shouldn't harm your data (because the error is detected and corrected, I think). You could then see if there is a radical change in 0xC7, perhaps in the upper 16 bits.

One other test you could try would be to dd the entire drive to the NULL device and see how this affects the SMART attributes. By doing so, you would be avoiding writes and restricting the I/O to reading and seeking.

Still another possibility would be an extended SMART test which would test every sector internally, without any SATA I/O. This would probably run for about 1 or 2 hours. If 0xC7 were to increase during this internal test, then this would suggest that 0xC7 is not counting UDMA CRC Errors.
 
Last edited:
Posting the tests as I complete them. Read 5GiB from start of disk and toss into /dev/null. I cut out the attributes that didn't change.

Code:
sudo smartctl -A /dev/sdb
sudo dd if=/dev/sdb of=/dev/null status=progress count=10M
sudo smartctl -A /dev/sdb

Code:
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       52462286
200 Multi_Zone_Error_Rate   0x0032   100   100   000    Old_age   Always       -       75391455
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       12663629787

5248775168 bytes (5.2 GB, 4.9 GiB) copied, 43 s, 122 MB/s
10485760+0 records in
10485760+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 44.243 s, 121 MB/s

199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       52466521
200 Multi_Zone_Error_Rate   0x0032   100   100   000    Old_age   Always       -       75391514
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       12674116027

UDMA_CRC_Error_Count goes up by 4235
Multi_Zone_Error_Rate goes up by 59
Total_LBAs_Read goes up by 10486240
 
A 7200 RPM drive makes 120 revolutions per second. Assuming that the track-to-track seek time is 0, this means that the capacity of each track is about 1MB (= 122 MB/s / 120 revs per second). This in turn means that the number of tracks read is 5200.

However, the track-to-track seek time is usually 1ms or thereabouts, while the time for one revolution is 8.33ms. This means that the actual number of tracks read is more like 4600 (= 5200 x 8.33 / 9.33).

ISTM that the UDMA CRC Error Count is in sync with the track count, which would be in sync with the seek count if the test data were contiguous.

I think an extended SMART test would help to understand the "UDMA count". I'm expecting a difference of about 400000 after a full surface scan.

I'm finding numerous similar "problems" with other Toshiba models.

https://www.overclock.net/threads/ultra-dma-crc-error-count.1311851/ (MK3261GSYN)

https://www.dell.com/community/en/c...ma-crc-count-1219638/647f0764f4ccf8a8ded7b1b5 (MK5065GSX)

https://forums.tomshardware.com/threads/interface-crc-error-count-data.2162700/

https://forums.tomshardware.com/threads/interface-crc-error-count-its-not-the-cable.1586600/ (mk6461gsy)

https://forums.tomshardware.com/threads/increasing-ultradma-udma-crc-error-count.3758603/ (MQ01ABD100)

https://gathering.tweakers.net/forum/list_messages/1550317 (MK5061GSYF)

https://datarecovery.parts/mk5056gsyf-toshiba-donor-hard-drive-hdd2e71-lj001d.html (MK5056GSYF)
 
Last edited:
Your Google-fu is much better than mine. That's a lot of discussions I couldn't find.

Short test (~3m):
UDMA_CRC_Error_Count goes up by 8162
Multi_Zone_Error_Rate goes up by 2917
Nothing else changes.

Long test (~80m):
UDMA_CRC_Error_Count goes up by 434655
Multi_Zone_Error_Rate goes up by 5728

Looks like your hunch is right! Any idea what C8 is? If these numbers look suspect I can repeat the test.
 
It appears that both attributes increased by about 100 times. 5GB x 100 = 500GB, so that makes sense.

The fact that these tests run internally proves that neither attribute is related to SATA I/O. I'm betting that 0xC7 is a seek count, but 0xC8 has me stumped.

The only other test I can think of is an ATA secure erase. Once again this runs wholly within the drive, but it means that you will need to back up your data. I can understand if this is not an option for you (it wouldn't be for me). This test would be write intensive.

Still one more non-destructive test would be a short stroked read benchmark in HD Tune. This would tell us the size of each "serpentine segment".

How to determine number of heads using HD Tune:
http://www.hddoracle.com/viewtopic.php?p=1796#p1796

I'm hoping that an examination of the benchmark graph might give us more clues.
 
Last edited:
Does this work?

1EgpB8l.png


I don't mind doing a secure erase. Will try when I get the time. SMART attributes before and after is sufficient?
 
Your drive appears to have 2 heads. Each head has a "mini-zone" which spans about 91MB (= 1000MB / 11 zones). There are about 5500 such zones across the whole drive, assuming that the size remains consistent.

500000 / 91 = 5495​

Looking at it another way, reading 5.4GB from the start of the disk would have traversed 59 mini-zones (= 11 x 5.4GB / 1000MB). That corresponds to the increase in 0xC8.

When the heads traverse the drive, they read 100 tracks, say, on head 0 (the first mini-zone), then switch to head 1 and read the next 100 tracks, then back again to head 0 and so on. It's a very tenuous idea, but I'm wondering if 0xC8 is counting the number of times that the heads switch between zones.

BTW, the following article explains what you are seeing in the graph:

HDD from inside: Tracks and Zones. How hard it can be?
https://hddscan.com/doc/HDD_Tracks_and_Zones.html

Each head is tuned to a different Variable Bits Per Inch setting which explains the different transfer rates (125MBps and 117MBps). The serpentine tracking is so-called because the heads move backwards and forwards like a sidewinder snake.

A secure erase would be nice, but not absolutely necessary. I suspect you will see the same numbers, which would then provide confirmation that neither attribute relates to reading or writing.
 
Last edited:
After more thought, ISTM that Toshiba is indeed monitoring zone switching, or head switching, events in attribute 0xC8.

Each head has slightly different physical characteristics, so each surface needs to be tuned to match. The result is that each surface is recorded with different VBPI and VTPI settings (Variable Bits/Tracks Per Inch).

The tracks on each platter would be recorded with slight eccentricity, resulting in a sinusoidal wobble. To account for this Repeatable Runout (RRO), a compensatory signal is injected into the track servo. WD refers to this as Rotational Acceleration Feedforward (RAFF).

When switching between tracks and heads, there is the problem of track skew and head/cylinder skew. To address this problem, the drive is low level formatted at the factory so that there is a slight offset between the sectors in adjacent tracks. This is to enable the head to arrive just-in-time at the target sector, otherwise a complete revolution is required for that sector to pass under the head again.

https://www.fujitsu.com/downloads/COMP/fcpa/hdd/discontinued/maa3182_prod-manual.pdf (page 42)

Therefore, head switching results in a resynchronisation of the track servo. To me it seems plausible that these events would be monitored for errors.

I expect that the lower 32 bits represent an event count, while the upper 16 bits probably store the error count. If so, then this would suggest that the drive is currently error-free.
 
Solution
Thank you for the detailed explanations. The resources in the previous two replies helped me better understand what is going on.

I am trying to use the fio utility to perform some tests. My first goal was to replicate the HD Tune Pro graph. I used the following job file and obtained the following plot:

Code:
[global]
ioengine=libaio
invalidate=1
ramp_time=0
iodepth=32
direct=1

[read]
bs=128k
offset=0
size=1GiB
filename=/dev/sdb
rw=read
write_bw_log

wlTU3gs.png


Ignoring the outliers this is very similar to the HD Tune Pro graph I posted earlier. I ran multiple tests and the lowest difference in 0xC8 I observed was 13.
Next I tried to stay within the first 1/11th of the read test (size=60MiB) and recorded the the minimum 0xC8. which was 2.
Next I offset the test by 30MB so that it would cross the border and the minimum 0xC8 difference recorded was 3.

This is not a very scientific test but looks like the head switch counter as you've suspected. Never knew how "vendor defined" the SMART attributes could be.

Thanks for the help!

On an unrelated note, does the SMART data point to any other issues I should look out for? The Load_Cycle_Count seems to be quite high. Is setting the APM to 245 and disabling standby the best I can do?
 
APM settings seem to be volatile, ie they don't survive a power cycle. Therefore, you will need to invoke this setting at boot time (using hdparm?) and possibly after every wake-from-standby event.

The Load_Cycle_Count appears to be suggesting that the drive is rated for a maximum of 1 million cycles. Personally, I would prefer to reduce the aggressiveness of the APM timer.

I like your fio tool/script. Very nice.

FYI, here is Seagate's SMART spec:

http://t1.daumcdn.net/brunch/service/user/axm/file/zRYOdwPu3OMoKYmBOby1fEEQEbU.pdf
http://www.hddoracle.com/download/file.php?id=5129

http://t1.daumcdn.net/brunch/service/user/axm/file/Vw3RJSZllYbDc86ssL6bofiL4r0.pdf
http://www.hddoracle.com/download/file.php?id=5130

Note the structure of the Seek Error Rate. I suspect that your 0xC7 and 0xC8 attributes would be similar, apart from the logarithmic normalised values.
 
Last edited:
This appears to be your PCB:

https://ae01.alicdn.com/kf/H6ea7c04...ABD075-HDKCB16D2A01-MQ01ABF032-MQ01ABF050.jpg

Note the two white shock sensors, MT1 (?) and MT2, at the edges of the PCB. These components, together with their op amps (BD3851), would provide rotational vibration sensing (in the X and Y axes). This feature is normally only present in enterprise drives. That could explain why Toshiba chose to replace attribute 0x07 (seek error rate) with 0xC7 and 0xC8.

MT102 (adjacent to the TLS2605 motor controller) would provide basic shock sensing and is present on most desktop class drives.

IC604 is unpopulated on this PCB. It appears to be reserved for a tri-axis accelerometer. This component would normally be associated with the Free_Fall_Sensor SMART attribute (254). The idea is that the firmware can automatically retract the drive's headstack if it senses a freefall event. If this component is absent from your PCB, then I would be wondering how this attribute is sensed.

Digital accelerometers in 2.5" HDDs:
http://www.hddoracle.com/viewtopic.php?p=19497#p19497
 
However, the track-to-track seek time is usually 1ms or thereabouts, while the time for one revolution is 8.33ms. This means that the actual number of tracks read is more like 4600 (= 5200 x 8.33 / 9.33).

I found the following document which states that the Track-to-track Seek time is 2ms. Therefore, the number of tracks read would be 4205 (= 5200 x 8.33 / 10.3), which is very close to the observed result (4235).

https://static6.arrow.com/aropdfconversion/3c51ee49292300b0609cd52583f4a0c39a8aa5f7/mq01acf032.pdf