Next Previous Contents

5. Testing

If you plan to use RAID to get fault-tolerance, you may also want to test your setup, to see if it really works. Now, how does one simulate a disk failure ?

The short story is, that you can't, except perhaps for putting a fire axe thru the drive you want to ``simulate'' the fault on. You can never know what will happen if a drive dies. It may electrically take the bus it's attached to with it, rendering all drives on that bus inaccessible. I've never heard of that happening though. The drive may also just report a read/write fault to the SCSI/IDE layer, which in turn makes the RAID layer handle this situation gracefully. This is fortunately the way things often go.

5.1 Simulating a drive failure

If you want to simulate a drive failure, then plug out the drive. You should do this with the power off. If you are interested in testing whether your data can survive with a disk less than the usual number, there is no point in being a hot-plug cowboy here. Take the system down, unplug the disk, and boot it up again.

Look in the syslog, and look at /proc/mdstat to see how the RAID is doing. Did it work ?

Remember, that you must be running RAID-{1,4,5} for your array to be able to survive a disk failure. Linear- or RAID-0 will fail completely when a device is missing.

When you've re-connected the disk again (with the power off, of course, remember), you can add the ``new'' device to the RAID again, with the raidhotadd command.

5.2 Simulating data corruption

RAID (be it hardware- or software-), assumes that if a write to a disk doesn't return an error, then the write was successful. Therefore, if your disk corrupts data without returning an error, your data will become corrupted. This is of course very unlikely to happen, but it is possible, and it would result in a corrupt filesystem.

RAID cannot and is not supposed to guard against data corruption on the media. Therefore, it doesn't make any sense either, to purposely corrupt data (using dd for example) on a disk to see how the RAID system will handle that. It is most likely (unless you corrupt the RAID superblock) that the RAID layer will never find out about the corruption, but your filesystem on the RAID device will be corrupted.

This is the way things are supposed to work. RAID is not a guarantee for data integrity, it just allows you to keep your data if a disk dies (that is, with RAID levels above or equal one, of course).


Next Previous Contents