If you've read the rest of this HOWTO, you should already have a pretty good idea about what reconstruction of a degraded RAID involves. I'll summarize:
raidhotadd /dev/mdX /dev/sdX
to re-insert the disk
in the arrayWell, it usually is, unless you're unlucky and you RAID has been rendered unusable because more disks than the ones redundant failed. This can actually happen if a number of disks reside on the same bus, and one disk takes the bus with it as it crashes. The other disks, however fine, will be unreachable to the RAID layer, because the bus is down, and they will be marked as faulty. On a RAID-5 where you can spare one disk, loosing two or more disks can be fatal.
The following section is the explanation that Martin Bene gave to me,
and describes a possible recovery from the scary scenario outlined
above. It involves using the failed-disk
directive in your
/etc/raidtab
, so this will only work on kernels 2.2.10 and later.
The scenario is:
One thing left: rewrite the RAID superblocks by mkraid --force
To get this to work, you'll need to have an up to date /etc/raidtab
- if
it doesn't EXACTLY match devices and ordering of the original
disks this won't work.
Look at the sylog produced by trying to start the array, you'll see the event count for each superblock; usually it's best to leave out the disk with the lowest event count, i.e the oldest one.
If you mkraid
without failed-disk
, the recovery
thread will kick in immediately and start rebuilding the parity blocks
- not necessarily what you want at that moment.
With failed-disk
you can specify exactly which disks you want
to be active and perhaps try different combinations for best
results. BTW, only mount the filesystem read-only while trying this
out... This has been successfully used by at least two guys I've been in
contact with.