Hi,
After uying 2 new ST1000DM0003 with CC49 i decided to upgrate the firmware of the old ST31000528AS from CC38 to CC49, since i wanted to do some RAID testing and it seems good practice to match firmware versions.
The firmware upgrade went well (all other drivers were unplugged, followed all instructions, etc). While i can't confirm it 100%, it's the only major change that happened. The PC is used mostly for light OS testing (RAID, LVM, SMB, whatever), nothing disk-intensive. Well, not yet at least and not for a while apparently.
The problem is, not long after the upgrade two things happened:
1) i suddenly lost all data on one of the partitions in that drive. At first i thought it was a samba issue, then being a remote share, maybe filesystem corruption, PEBCAK, something. I did manage to recover (almost) all data.
2) ever since boot - right after POST, even before an OS is loaded, the disk starts making noise (as when it's seeking, scanning, like when you copy a large file or let the antivirus have a go at it). Seems to be heads flying. It won't stop.
This is an excerpt of a few commands i've been running:
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.12
Device Model: ST31000528AS
Serial Number:
LU WWN Device Id: 5 000c50 027db4afd
Firmware Version: CC49
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Apr 17 11:23:18 2014 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 113 099 006 - 50643073
3 Spin_Up_Time PO---- 095 095 000 - 0
4 Start_Stop_Count -O--CK 099 099 020 - 1274
5 Reallocated_Sector_Ct PO--CK 047 047 036 - 2181
7 Seek_Error_Rate POSR-- 075 060 030 - 35143152
9 Power_On_Hours -O--CK 079 079 000 - 18750
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 638
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 099 000 - 1
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 071 051 045 - 29 (Min/Max 22/29)
194 Temperature_Celsius -O---K 029 049 000 - 29 (0 11 0 0)
195 Hardware_ECC_Recovered -O-RC- 026 018 000 - 50643073
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 178838143258596
241 Total_LBAs_Written ------ 100 253 000 - 1457922426
242 Total_LBAs_Read ------ 100 253 000 - 1552877542
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
...
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 18746 -
# 2 Extended offline Aborted by host 90% 18742 -
# 3 Extended offline Interrupted (host reset) 90% 18742 -
# 4 Short offline Interrupted (host reset) 00% 18741 -
# 5 Short offline Completed without error 00% 18653 -
Some tests just keep showing at 90%, even if i let them run for twice the estimated ammount of time.
This shows at boot: "Incorrect metadata area header checksum on /dev/sdd1 at offset 4096"
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
# hdparm -w /dev/sdd
/dev/sdd:
resetting drive
HDIO_DRIVE_RESET failed: Inappropriate ioctl for device
# hdparm -B /dev/sdd
/dev/sdd:
APM_level = not supported
# hdparm -Z /dev/sdd
/dev/sdd:
disabling Seagate auto powersaving mode
HDIO_DRIVE_CMD(seagatepwrsave) failed: Input/output error
# hdparm -M /dev/sdd
/dev/sdd:
acoustic = 208 (128=quiet ... 254=fast)
# hdparm -M 208 /dev/sdd
/dev/sdd:
setting acoustic management to 208
acoustic = 208 (128=quiet ... 254=fast)
The only think that keeps if from making noise is shutting it down via hdparm -y /dev/sdd
So, some questions:
1. Did the firmware screw it up? Can i flash the previous (CC38) version back?
2. Is the drive dying?
3. Was it a mistake to buy 2 new ST1000DM0003 (especially now that Samsung's disk division was acquired, that would be my second choice but now i wonder)?
The system has always run fine with an old Maxtor, which still works, and with this troublesome Seagate. It was only after i bought the new ones and upgraded the firmware that it all went belly-up.
Thanks for reading, any suggestions / technically founded opinions would be appreciated.
After uying 2 new ST1000DM0003 with CC49 i decided to upgrate the firmware of the old ST31000528AS from CC38 to CC49, since i wanted to do some RAID testing and it seems good practice to match firmware versions.
The firmware upgrade went well (all other drivers were unplugged, followed all instructions, etc). While i can't confirm it 100%, it's the only major change that happened. The PC is used mostly for light OS testing (RAID, LVM, SMB, whatever), nothing disk-intensive. Well, not yet at least and not for a while apparently.
The problem is, not long after the upgrade two things happened:
1) i suddenly lost all data on one of the partitions in that drive. At first i thought it was a samba issue, then being a remote share, maybe filesystem corruption, PEBCAK, something. I did manage to recover (almost) all data.
2) ever since boot - right after POST, even before an OS is loaded, the disk starts making noise (as when it's seeking, scanning, like when you copy a large file or let the antivirus have a go at it). Seems to be heads flying. It won't stop.
This is an excerpt of a few commands i've been running:
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.12
Device Model: ST31000528AS
Serial Number:
LU WWN Device Id: 5 000c50 027db4afd
Firmware Version: CC49
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Apr 17 11:23:18 2014 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 113 099 006 - 50643073
3 Spin_Up_Time PO---- 095 095 000 - 0
4 Start_Stop_Count -O--CK 099 099 020 - 1274
5 Reallocated_Sector_Ct PO--CK 047 047 036 - 2181
7 Seek_Error_Rate POSR-- 075 060 030 - 35143152
9 Power_On_Hours -O--CK 079 079 000 - 18750
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 638
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 099 000 - 1
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 071 051 045 - 29 (Min/Max 22/29)
194 Temperature_Celsius -O---K 029 049 000 - 29 (0 11 0 0)
195 Hardware_ECC_Recovered -O-RC- 026 018 000 - 50643073
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 178838143258596
241 Total_LBAs_Written ------ 100 253 000 - 1457922426
242 Total_LBAs_Read ------ 100 253 000 - 1552877542
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
...
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 18746 -
# 2 Extended offline Aborted by host 90% 18742 -
# 3 Extended offline Interrupted (host reset) 90% 18742 -
# 4 Short offline Interrupted (host reset) 00% 18741 -
# 5 Short offline Completed without error 00% 18653 -
Some tests just keep showing at 90%, even if i let them run for twice the estimated ammount of time.
This shows at boot: "Incorrect metadata area header checksum on /dev/sdd1 at offset 4096"
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
# hdparm -w /dev/sdd
/dev/sdd:
resetting drive
HDIO_DRIVE_RESET failed: Inappropriate ioctl for device
# hdparm -B /dev/sdd
/dev/sdd:
APM_level = not supported
# hdparm -Z /dev/sdd
/dev/sdd:
disabling Seagate auto powersaving mode
HDIO_DRIVE_CMD(seagatepwrsave) failed: Input/output error
# hdparm -M /dev/sdd
/dev/sdd:
acoustic = 208 (128=quiet ... 254=fast)
# hdparm -M 208 /dev/sdd
/dev/sdd:
setting acoustic management to 208
acoustic = 208 (128=quiet ... 254=fast)
The only think that keeps if from making noise is shutting it down via hdparm -y /dev/sdd
So, some questions:
1. Did the firmware screw it up? Can i flash the previous (CC38) version back?
2. Is the drive dying?
3. Was it a mistake to buy 2 new ST1000DM0003 (especially now that Samsung's disk division was acquired, that would be my second choice but now i wonder)?
The system has always run fine with an old Maxtor, which still works, and with this troublesome Seagate. It was only after i bought the new ones and upgraded the firmware that it all went belly-up.
Thanks for reading, any suggestions / technically founded opinions would be appreciated.