Tuesday, March 10, 2009

Howto: Fix a 2 drive failure in a RAID 5 mdadm array

Problem:

Somehow 2 of your drives have been marked as "faulty" by mdadm at the same time. This happened to me because the SATA controller card they were both hooked up to got confused and stuttered. Output should look something like this:

$ sudo mdadm --detail /dev/sda1

Update Time : Fri Feb 6 08:28:55 2009
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 2
Spare Devices : 0
Checksum : 50c18fe4 - correct
Events : 0.487370

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1

0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed


As you can the server thinks that raid device 2 and 3 were faulty and thus removed from the array. Since this is a RAID5, 2 faults is 1 fault too many and your data is in jeopardy.

At this point you need to be relatively sure that these at least one ( preferably both ) drives still have the valid data on it. If this is true then go ahead and issue this command:

sudo mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Note that the order of the drives must be exactly the same as they were when you first created the array. This information is available by issuing the command used to get the previous output.

Thats the best case scenario.