Ruined motherboard by secure erasing an SSD

dr-claw

Honorable
Oct 15, 2013
13
0
10,520
Hello. I was attempting to perform a secure erase of an SSD disk in preparation for re-installing Windows on it. I Google searched for "linux secure erase ssd" (since it made sense to me to boot off of my Linux drive and perform the secure erase operations using Linux). I found this page:

http://www.unixmen.com/secure-erase-your-ssd/

I followed the instructions, and it turned out my SSD was in fact frozen, so I unplugged the SATA cable and then plugged it back in, while the PC was running, and it was still frozen. So I unplugged the power cable to the SSD and plugged it back in, also while the computer was running, and that gave the desired output of "not frozen". I then proceeded with the "hdparm" Linux command to secure erase the SSD. It appeared to have worked, as the output of 'fdisk -l' showed no partitions on the drive, when previously there were.

So I rebooted, inserted with Windows CD, and proceeded to install Windows on the now "fresh" SSD. Everything appeared to be going OK, until I got to the stage where it takes forever to do all the Windows updates. It kept crashing. I kept rebooting and it kept crashing. So I rebooted to Linux, which also crashed randomly. After allot of trial and error and fist pounding, I figured out that the system was unstable if any drive was plugged into any SATA port. Even when booted off an external USB disk, it eventually crashed, although it was much more stable.

I tried things including, but not limited to, resetting BIOS settings to fail safe defaults, removing CMOS battery and leaving it out for 1 hour, flashing to an older BIOS version and reflashing the latest one. $150.00 later, I have a new motherboard, transferred my CPU, RAM, etc over to the new motherboard, and everything works fine.

It could be the Linux "hdparm" command that caused the problem. The manual page does say:

Most of these are VERY DANGEROUS and can destroy all of your data!
Due to bugs in older Linux kernels, use of these commands may even
trigger kernel segfaults or worse. EXPERIMENT AT YOUR OWN RISK!

I was willing to deal with potential data loss greater than the one disk I was planning to erase, and had everything backed up. But it looks like I experienced the "or worse" part of that warning. While I suppose it is possible that this particular "hdparm" command:

Code:
sudo hdparm --security-erase NULL /dev/sda

is what caused the problems, I think it is more likely I caused the damage by blinding following along with these instructions:

To do this, unplug the SATA cable from the SSD’s backport and then plug it back in, while your computer is still running.

One thing I did not think of, at the time, was that my BIOS settings had the hot plug feature set to disabled for all SATA ports. This is my best guess as to what caused the problem, but I'm not entirely certain. Which brings me to the advice I wish I had before I attempted this:

1. Never, ever, ever unplug anything connected to your motherboard while it is powered up for any reason under any circumstances. It's not worth it. Power it down completely and switch off the power supply each and every time, no matter what component is being attached or removed.

2. Do you have an older PC lying around? If so, use that one for these "VERY DANGEROUS" commands.

3. Try to find a different way to get your drive "not frozen" other than randomly unplugging and plugging back in cables while it is running. Does your SSD manufacturer have software that will accomplish this? It might be some annoying bloatware, but if you're going to re-install Windows anyway ...

4. If you must unplug / plug in a SATA drive to your motherboard while it is running, make sure the hot plug feature is enabled in your BIOS settings.

My questions are:

1. Does anybody have any other thoughts as to what might have caused the random crashes?

2. Does anybody have any suggestions about how I might revive this motherboard? The motherboard is a ASUS Z-87 Plus and was running the latest BIOS update at the time.

Thanks.
 
Solution
I am sorry to hear about the problems you have run in to, but what you did is entirely your fault, and no amount of advice can replace ensuring that you inform yourself before attempting something like this.

First of all, never, ever, blindly follow instructions on some random website (especially when the tutorial has obvious grammar and spelling mistakes). You need to make sure you understand everything yourself before attempting anything like this.

And from what you said, you blindly used a tool even though it warned you about there being consequences! Hdparm is community developed and relies on information reverse engineered from hardware to use hardware features that most of the time are not documented well, if at all, by...
I am sorry to hear about the problems you have run in to, but what you did is entirely your fault, and no amount of advice can replace ensuring that you inform yourself before attempting something like this.

First of all, never, ever, blindly follow instructions on some random website (especially when the tutorial has obvious grammar and spelling mistakes). You need to make sure you understand everything yourself before attempting anything like this.

And from what you said, you blindly used a tool even though it warned you about there being consequences! Hdparm is community developed and relies on information reverse engineered from hardware to use hardware features that most of the time are not documented well, if at all, by hardware manufacturers. The commands are also designed to do things like destroy data, and some of the features it uses may not play well with older linux kernels. The developers gave you fare warning of this. Many people (myself included) use hdparm everyday, including for SE features, without issue.

Now about the advice you wanted to have:
1) If hot-swap is enabled, it would be fine. The fact that you did this without understanding if it is actually OK to do, or when it is OK to do, means you did not take the time to fully research the task at hand.

2) "VERY DANGEROUS" commands was a fare warning that what you were doing can cause problems, data loss, and many, many headaches if used by someone without adequate knowledge. This does not mean that the commands themselves are dangerous, just that people can do stupid things with them. The gun does not decide to shoot its user in the foot.

3) Since you refer to the drive as being "not frozen" it is quite apparent to me that you don't know what the "not frozen" term is referring to. There is a security bit on SSDs that prevent certain features from being executed, which is typically set by the BIOS on post. In order to get this bit disabled, it requires some "tricks", like doing a hot swap. Unfortunately, this all depends on your exact hardware configuration, and SSD manufacturer, and no method is going to work in every case.

4) This should have been your first advice, and also the first thing you googled when the instructions you found asked you to do this.

In response to your questions:
1) It sounds like your SATA controller is having hardware failures. I would have attributed this originally to a memory problem, but damaged traces on a signalling path/output circuitry can compromise signal integrity to your storage device, and lead to random failures. You can verify this further using a linux livecd SSD test program.

2) Revive the motherboard? You may be able to get away with purchasing a cheap PCIe SATA controller, assuming that it is the on-board sata controller that has failed.

I am sorry if I sounded harsh in criticizing what you did, but it is important that people understand what they are doing before doing it, and not blindly follow random instructions on a website. When I read about people doing that, I feel bad for them but am also angry that they allowed themselves to do something while ignorant of what they were doing.



 
Solution
No worries, onichikun . You're absolutely right. I did something stupid and learned from it, and my main point in posting was to warn others against making the same mistakes.

I've been trying to get a hold of the administrators of http://www.unixmen.com to ask them to include a warning about enabling hot swapping and potential damage to hardware, but so far no reply.