Question RAID card no longer working, gives firmware error on POST ?

Cyber_Akuma

Distinguished
Oct 5, 2002
496
21
18,785
I have a LSI 9260-8i, this was actually reflashed from the original IRM ServeRAID M5014 it arrived as. It has since also been upgraded to the latest firmware from LSI/Broadcom, which is 12.15.0-0239 (also the card at boot identifies as BIOS version "3.30.02.2 (Build June 17, 2014)"). I am just simply using it in my Windows 10 system, not as part of a server or NAS.

This card had been disconnected from my system for about 6-12 months, as well as it's drives, I recently reconnected everything in a new system. I have four HDDs in a RAID5, but had recently acquired the license key to enable RAID6 and had installed a 5th HDD in preparation for that. I had a few errors at first, but many things in the system were at first giving me errors so I didn't think much of it as I had performed several upgrades and changes at once.

I eventually got everything booting properly and then opened up the RAID Windows management software (Version 17.05.02.01 at the time) and it seemed to be going ok. It was performing a Patrol Read on all of my drives and was recharging the battery (Though it claimed the battery was bad, but it was new, so I figured it needed to do a recharge cycle and re-learn to see it as good again). It said the Patrol Read was only going to take 10 minutes but I knew it was going to take hours. Halfway in, at about the 3-4 hour mark, the software completely stopped responding. I restarted it, and now the card was showing that absolutely nothing was installed to it.

I performed a reboot and now I kept getting this error message during the card's initialization during POST:

LSI MegaRAID SAS-MFI BIOS Version 3.30.02.2 (Build June 17, 2014) Copyright(c) 2014 LSI Corporation Host Adapter Bus 5 Dev 0:

F/W is in Fault State MFI Register State 0xF0010002

Adapter at Baseport is not responding

No MegaRAID Adapter Installed

The card then could be seen as installed, but the drives were not showing up, and the management software could not even tell a card was installed anymore. I tried updating to the latest management software (17.05.06.00) in case it might at least see there is a card installed, with that being it's own can of worms of expecting me to manually install OpenJDK and set the environment paths myself, it also just gets stuck loading the application.

I tried MegaCLI, StorCLI, and MegaSCU (I admit I am not too familiar with managing this card through a CLI) but -v and -AdpAllInfo -aALL but they all returned nothing. I tried disconnecting the drives in case one had somehow become so fault during storage that it was crashing it, and no difference. I tried the only other PCI socket in my motherboard and it would not even boot then, guess that socket isn't even working with everything else installed in my system.

I have no idea what to do now. The card refuses to work, claiming it's suffering some kind of firmware fault on POST, none of the software seems to even detect the presence of the card at all despite it physically showing up in Device Manager (although with an "An I/O adapter hardware error has occurred." error) and HWiNFO, and I am not aware of any way I can attempt a force-reflash of the firmware in case it's a software issue (although I somehow doubt it) or of what else to try.

And yes, I have a backup of my data.
 
Last edited by a moderator:
Unless there are very specific reasons for using RAID (of any sort) RAID is not necessary for most computer and networking environments.

RAID thus being or becoming problematic overall.

My suggestion is to stop using RAID completely.

Barring, as stated in the first sentence, specific reasons/requirements that support the need for RAID.

More information neeeded. Why RAID?
 
Offers me a larger drive than a single drive while also having redundancy. I just want to get this working again, not completely overhaul my entire system to undo the RAID setup.
 
As I said, it's a RAID5 but I added another drive and was planning to convert it to a RAID6.
have you tryed to disconnect everything from raid card? that firmware error means none or partial communication with controller, there is small off chance that some drive is causing your controller to no boot up, but if you disconnect everything from that card and you still get no connection from controller, that would mean your raid card is e-waste now
 
Yeah, I tried disconnecting all the drives. The card has two SAS ports and the drives are connected by a SAS-to-4xSATA adapter, so I just tried unplugging both SAS cables from the card, still got the FW error on POST.
 
Last edited:
I think I might have a lead, but it's incredibly odd and makes little sense. Remember how I said at first the card was working but then crashed during the Patrol Read and was not working again? I noticed that the card was reporting the BBU temperature was high when it was doing the Patrol Read. Normally my case has a 200mm side-fan that would be blowing across all the PCIe ports, but since I am working on the computer all the side panels are off. I had noticed previously that the card would run hot with the side panel/fan removed but was fine when it was in place so I decided to just put a desk fan blowing towards the card, it crashed shortly after.

The fan was angled towards the PC the whole time I was trying to work on it, and it wasn't something that I was actively thinking about. I also performed many stress tests with the card disconnected and the fan blowing on the system with no issues.

Recently though I turned it off to move it out of the way so I would have more room to work, and that's when the card started to mysteriously work again. I left it like this and it not only finished it's patrol read but kept working fine, I left it overnight and it still was working fine in the morning. I tried turning on and pointing the fan on it again and within minutes the system bluescreened, when I rebooted the Windows client software for the RAID card immediately gave me a popup that the card had suffered a fatal error and was reset.

What on Earth is going on? The card runs fine when it's barely within/above it's standard operating temperatures but it crashes if I try to cool it? Has anyone ever heard of something like this? Is this a sign that the card is damaged or would pointing a desk fan at a PC normally cause problems like this? This makes no sense to me.
 
Could it be that the flash IC that stores the card's firmware has become flaky when cold (bit rot?) and now responds to being warmed up? Try targeting this chip with heating and cooling.

https://m.media-amazon.com/images/I/91jF+TExYfL.jpg

I see two 8-pin Atmel ICs near the top right corner of the heatsink. The 24C0x chip is a low capacity EEPROM, possibly for storing RAID metadata, configuration info, etc. I can't recognise the other chip, but it may be a 25xxx series serial flash memory, in which case it would hold the card's firmware.

Another possibility is that warming the card disturbs an intermittent solder joint. In this case you should be able to provoke the fault by tapping or flexing the card.
 
Last edited:
I think those two chips you are talking about are on the back of the card, a (nand?) chip that stores the firmware and an NVRAM chip that stores the settings/cofiguration:

https://m.media-amazon.com/images/I/61OHYKIuWJL._AC_SL1000_.jpg

Though obviously the CPU if not other chips would be under the heatsink, which I have never removed (and don't even know how).

Bad solder joint? So then should I just let the card super-overheat in hopes it will melt back? Just kidding, if it's a flaky chip or a broken solder joint on something this intricate it's well beyond my skills to repair. I ordered another on eBay, the seller claimed it's new but I highly doubt any new ones have existed for years anymore, especially on eBay of all places. I just hope it doesn't also have issues.

So then, since the card is working for now, I was able to use the Windows MegaRAID Storage Manager to save it's configuration, can I just load that on the new card then plug in the drives and continue like nothing happened? Never had to swap RAID cards before, so I don't want to do it in a way that might result in my data getting wiped.

As I mentioned I was also planning to migrate the RAID5 over to a RAID6 and had added another currently unconfigured drive for this. Again, something I had never done before. From what I understand in the software I would go to the list of logical drives, right-click on the Drive Group that is the RAID5, and choose "Modify Drive Group" where I would switch it to RAID6 and add the 5th drive to do that without it wiping my data right?

Even though this current card was now able to finish the Patrol Read and perform a Consistency Check without issues, I don't trust it not to crash during a RAID6 conversion, so I will wait until the replacement arrives. The replacement should not crash from me pointing a fan to it, right?
 
The TSOP-56 IC does appear to be the main firmware store. I would have thought that the 24C0x chip on the other side would have been the NVRAM. If you can tell me the part markings, I would be able to make a better guess.