Most RAID devices were designed and built for data reliability. RAID stands for Redundant Array of Independent Disks (or originally, Inexpensive Disks). The technology, which was developed at the University of California, Berkeley, allows enough redundancy of the data that if one hard drive in the system fails, the information it contained can be reconstructed from the remaining hard drives.
RAID-5 devices not only offer this redundant data reliability, they spread or stripe the data in such a way that it can be read and used faster. In terms of hard drive technology, they offer greater speed and greater resistance to data loss.
But they have their limits. After one drive fails, the machine can keep limping along, running in a degraded state as it reconstructs the failed drive’s data by logical analysis of the remaining drives. But unless someone notices this and acts, this situation can persist until a second drive fails and brings everything to a halt.
Suddenly, since none of the data is accessible, there’s cause for panic. In the desperate efforts that ensue, a new hard drive may be inserted into the device, a remaining drive could be reformatted, or all the hard drives might be taken out of the device and put back in the wrong order. Meanwhile, a whole organization’s most important data is inaccessible. It’s a horrible situation.
If you have just experienced failure of your RAID-5 device, here is some useful advice, provided you act in a timely manner.
Listen to Douglas Adams: Don’t Panic
First, if your data is not accessible, you should never rebuild the array; this will not repair anything. It will take the current state of affairs and make it permanent. Another common mistake is to force drives back online after an observation that only one of three drives or two of four drives are up. The RAID controller took these drives offline for a reason. They’re probably failed drives. When you force these drives online, data on the healthy drives likely will be corrupted. Worse, file system repair utilities will start seeing this mess and will start “repairing” all recent data. The effect is that the most critical data on the healthy drives will be gone.
The best thing to do when your RAID-5 fails is to step back, consider the value of the data and what its permanent loss would mean, and call a recovery lab.
At Gillware, engineers and computer scientists have pioneered techniques for RAID recovery cases. Successful RAID-5 recoveries depend on reassembling the logical structure of the file system, which is necessary to get meaningful data back from a failed RAID device.
After addressing any mechanical issues, such as damaged read/write heads, we create full binary copies of all the drives in the system. We look at each drive independently with a binary hex editor, which shows where the 1’s and 0’s are, to determine how the data was being divided or striped among the drives and in what order. Each RAID controller is different, and it’s a logic puzzle to determine how the data was being handled and what the file structure was before the system failed.
It is crucial to determine which drive failed first. As mentioned earlier, RAID systems depend on a logic calculation to store their redundant data. It’s called an “exclusive or” binary operator. You might intuitively expect that if you had four disks full of data that you’d need another four disks to have a redundant copy. But the “exclusive or” binary operator is a clever way to allow four disks to have their data redundantly stored on one disk. To reconstruct data using this operator, it’s necessary to understand in what order the disks failed and exactly how data was being written to them.
Only after all this analysis, including a correct diagnosis on the drive failure order, may data recovery experts begin to write the code that will rebuild this data system. They then test their hypothesis by checking the integrity of a large recent file and proceed to reassemble all the pieces in the puzzle into one contiguous physical volume.
RAID-5 systems have a lot of appeal: speed, reliability and ease of use. Many organizations trust them to hold up the entire network’s data without employing an automatic remote backup system. That puts a great deal of faith in the idea that your RAID-5 device will never fail. If it does, don’t let panic complicate matters – there is good reason to hope for a successful RAID-5 data recovery with the right help.