RAID Recovery Case Study: RAID 5 Failure and Reconstruction

How RAID Reconstruction Works in RAID-5

RAID-5 is one of the most popular forms of RAID, and it’s not hard to see why. When you combine three or more hard drives into a RAID 5 array, you get three benefits. First, the hard drives combine to form a single storage volume, with greater capacity than a single drive all by its lonesome. Second, the multiple drives working in unison deliver faster performance (not SSD level, but better than one solitary hard drive). And third, RAID 5 uses parity data to provide redundancy in case one drive fails.

A RAID 5 array breaks up all of the data you write to it into blocks and “stripes” these blocks across the disks. A single stripe of data contains as many blocks as there are disks in the array. But of these blocks, one in particular is special. The special block includes “parity” data, created by running XOR calculations against the bits in all the rest of the blocks. When you have a list of several values, the XOR function “fills in the blanks” if one of those values goes missing. The XOR function is actually easy to learn and quite fun to do—you can learn it as a party trick (or rather, a “parity” trick) to impress your friends. If you go to those kinds of parties, that is. https://www.youtube.com/embed/IujbjWhQ2WQ?feature=oembed

Every stripe in a RAID-5 has one XOR parity block. This way, if any one drive in the array goes missing, the parity block in each stripe can step in. By running XOR calculations on all the remaining blocks in a stripe, the array can perfectly recreate all of the data from the missing block. Do this for every stripe and voila—nothing of value is lost.

If a second drive goes missing, however—you’re dead meat.

How to Recover Data from RAID 5 Arrays

Recovering data from RAID arrays involves taking the most up-to-date hard drives (whenever possible) and using custom RAID controller emulation to string them back together. The way RAID 5 arrays use parity data to provide redundancy comes in handy for our RAID recovery technicians. It gives our engineers a chance to “kick out” one hard drive. Now, that might not sound useful at first, but think about what happens when a RAID 5 array fails.

You get your first drive failure, and the array keeps chugging along as if nothing happened (thanks, XOR parity!). You keep writing new data to the array as you use it—but not to the failed drive. Its data becomes more and more out-of-date with each passing second. We call the first drive to fail “stale”. If another drive fails before you can replace the failed drive, the whole array crashes.

The client in this RAID recovery case had three drives. The first drive failed without the client noticing, and the RAID 5 server ran “degraded” for a few weeks before the technician accidentally removed one of the remaining healthy drives and crashed the server. So one drive was stale, and the two remaining drives were up-to-date. A complete enough recovery of the two up-to-date hard drives would let our engineers avoid using the stale drive.

You can look at our video blog post to find out what happens when you try to reconstruct a RAID 5 array with a stale drive. The results aren’t particularly pretty. The stale blocks of data can cause massive amounts of corruption. Our engineers tend to leave the stale drives alone unless there is literally no other option for RAID data recovery—for example, if one of the other hard drives has suffered severe rotational scoring.

RAID Recovery Case Study: RAID 5 Failure
RAID Level: RAID 5
Drive Model: Maxtor Atlas 10K V 8J073J002295E
Total Capacity: 146 GB
Operating/File System: Windows Server 2003
Data Loss Situation: 3-drive RAID-5 running degraded failed when one healthy hard drive was accidentally pulled out instead of the failed drive.
Type of Data Recovered: Software database
Binary Read: 100%
Gillware Data Recovery Case Rating: 10

The client in this data recovery case came to Gillware Data Recovery for our RAID recovery services after having an unfortunate accident with their server. Their IT technician had noticed that a single Maxtor hard drive in the server had failed, and intended to remove and replace it. However, they accidentally pulled out one of the remaining two healthy drives instead, crashing the server. Putting the removed drive back in wouldn’t bring the server back, but Gillware’s RAID-5 data recovery experts could recover their data.

RAID Recovery Results

All three of the Maxtor Atlas 10K V hard drives from this client’s RAID 5 server went straight to our cleanroom for data recovery work. Our engineers would analyze them, make whatever repairs were necessary, and create as complete as possible forensic disk images for further analysis. Then the disk images would go to our RAID 5 data recovery experts for further recovery work.

One of the three Maxtor hard drives in the array had developed bad sectors, preventing a full recovery. Only 99.8% of the drive could be read. Further analysis of the metadata placed on the drive by its RAID controller showed that this drive was the stale one. The remaining two hard drives were relatively healthy: our engineers got 100% reads of both hard drives with the help of our fault-tolerant recovery tools.

This was an ideal situation. Our RAID recovery engineer Cody could use the two fully-recovered hard drives to reconstruct the entire RAID 5 array and all of the data on it. This RAID 5 recovery case had a perfect outcome, with 100% of the client’s files fully recovered and fully functional.

Will Ascenzo
Will Ascenzo

Will is the lead blogger, copywriter, and copy editor for Gillware Data Recovery and Digital Forensics, and a staunch advocate against the abuse of innocent semicolons.

Articles: 213