I woke up this morning and what do you know, one of my RAID arrays in my workstation was degraded. Just great. This particular array was made up of 3x 1TB disks in RAID level 5 running on an Intel RST fakeRAID controller. After Ubuntu had some serious difficulty communicating with the drive on bootup, I assumed that the onboard drive controller board had gone. Having some spare parts for this model of drive, I decided to replace the controller board with another I had from a previous drive that failed. I didn’t think I really had anything to lose – the array was already degraded, so what more could happen? The best case scenario was that I wouldn’t have to buy another $100 drive. The worst would be that the drive is still detected as failed and nothing would happen. Seems simple right? Wrong.

Upon plugging in the newly matched drive/controller pair, the Intel RST utility immediately indicated that my array had failed. It wasn’t just degraded, but FAILED. My volume was no longer accessible in Windows and Intel RST indicated that all data was lost. Rebooting did not solve the problem and even the BIOS indicated a failed array, even after removing the new frankendrive. I knew that my data was not really lost because I still had 2 healthy drives in the array and even Intel RST indicated this. After some careful inspection of the array properties, it seemed that Intel RST had duplicated the failed drive’s serial number. I truly can’t explain what happened.

I somehow needed to override the metadata on the drives to indicate that indeed the array was healthy enough to start or at least rebuild. Linux to the rescue! I knew that Ubuntu used dmraid as the fake/software RAID handler. Not having used it to rebuild an array before, I began experimenting. First of all, it did not bring up the array as it should have during the boot sequence. This was a bad sign and it kind of sunk my hopes a bit. I first checked for all the arrays on the system by running

    sudo dmraid -s -s

After spitting out a few errors about my array not having the correct number of drives (2/3) it gave me a status report on the array indicating that it was “inconsistent”.  I decided to try to start the array by running

    sudo dmraid -ay

and lo and behold it brought up the array which I could mount in Nautilus. This was the first ray of hope for my data. At the very least now I could copy my files to other drives and rebuild the array from scratch.

After buying a new drive and installing it, I rebooted and checked the drive’s health with palimpsest. The drive appeared healthy and was being given the handle /dev/sda. I now wanted to add the drive to my array (which was not running at this point due to the restart). To do this I got the array (subset) name from “sudo dmraid -s -s” and used that to identify the array when issuing the rebuild command

    sudo dmraid -R <array name> /dev/sda

It gave me a few errors but the command looked like it completed successfully for the most part. The big indication that it was doing something was that the HDD activity LED was lit up constantly. Supposedly you can monitor the status of the rebuild using the command “sudo dmsetup status”, but it did not give me any changing information. I simply waited until the LED turned off, about 4 hours later. I rebooted once I was sure that everything was finished rebuilding and everything mounted normally. The BIOS once again indicated that the array was operating normally, and a quick boot into Windows showed that Intel RST was reporting the correct status.

I have two pieces of advice that stem from this experience. Number one, don’t try to build drives from spare parts without first purging the RAID metadata. Second, give dmraid a shot if you are having trouble with regular Windows-based fake/software RAID. It saved me 2TB of precious data.