Question HDD only works a few hours after boot/reboot

Jan 21, 2021
2
0
10
1 of the 3 HDDs (Western Digitial WD3003FZEX Black 3TB SATA 6GB/S 7200RPM 64MB Cache 3.5IN Hard Drive) fails in a RAID 0 array. At first, it disappears from the OS. Then, it re-appears after re-plug the cables, but not stable.

It works a few hours after boot/reboot, mdadm can read out the serial number, and ddrescue reads about 90GB data out at an average speed of 2 MB/s. After the problem appears, serial number becomes empty, and ddrescue reads 0 byte without any successful read.

Code:
I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

          Port2 : /dev/sdd (WD-WMC5D0D9X6D8)

          Port3 : /dev/sde (WD-WMC1F0EARDW0)

          Port1 : /dev/sdc ()

The motherboard is ASUS X99-E USB 3.1, the RAID 0 is setup with Intel RST, and the I/O Controller is still left in RAID mode in BIOS. However, the raid array only consists of WD-WMC5D0D9X6D8 and WD-WMC1F0EARDW0 in BIOS now, the broken is now a non-member drive. The system must somehow detect the drive failure and remove it from the array automatically.

I would like to clone the failing drive. Shall I keep rebooting the machine? Is there any command I can just reboot/reset the HDD rather than the whole system? Any other workaround? Thank you.
 
have you tired changing which sata port it is plugged into?
intel RST will pick up where it was moved to and still create the array.

Also try pointing a fan at it in case its a heat related issue. it might get you more time.

Yes, I did, but did not help, still could not read out.

It doesn't seem a heat issue, since smart shows normal temperature, i.e. 33c.