If you plan to use RAID to get fault-tolerance, you may also want to test your setup, to see if it really works. Now, how does one simulate a disk failure ?
The short story is, that you can't, except perhaps for putting a fire axe thru the drive you want to ``simulate'' the fault on. You can never know what will happen if a drive dies. It may electrically take the bus it's attached to with it, rendering all drives on that bus inaccessible. I've never heard of that happening though. The drive may also just report a read/write fault to the SCSI/IDE layer, which in turn makes the RAID layer handle this situation gracefully. This is fortunately the way things often go.
If you want to simulate a drive failure, then plug out the drive. You should do this with the power off. If you are interested in testing whether your data can survive with a disk less than the usual number, there is no point in being a hot-plug cowboy here. Take the system down, unplug the disk, and boot it up again.
Look in the syslog, and look at /proc/mdstat
to see how the RAID is
doing. Did it work ?
Remember, that you must be running RAID-{1,4,5} for your array to be able to survive a disk failure. Linear- or RAID-0 will fail completely when a device is missing.
When you've re-connected the disk again (with the power off, of
course, remember), you can add the ``new'' device to the RAID again,
with the raidhotadd
command.
RAID (be it hardware- or software-), assumes that if a write to a disk doesn't return an error, then the write was successful. Therefore, if your disk corrupts data without returning an error, your data will become corrupted. This is of course very unlikely to happen, but it is possible, and it would result in a corrupt filesystem.
RAID cannot and is not supposed to guard against data corruption on
the media. Therefore, it doesn't make any sense either, to purposely
corrupt data (using dd
for example) on a disk to see how the
RAID system will handle that. It is most likely (unless you corrupt
the RAID superblock) that the RAID layer will never find out about the
corruption, but your filesystem on the RAID device will be corrupted.
This is the way things are supposed to work. RAID is not a guarantee for data integrity, it just allows you to keep your data if a disk dies (that is, with RAID levels above or equal one, of course).