Risky RAID-5 VHDX Rebuild

Reading Difficulty:

In this VHDX recovery case study, the client had several Hyper-V virtual machines stored in the VHDX format on their three-drive RAID-5 server. The RAID-5 server was comprised of two 600 GB Western Digital SAS hard drives and one HP hard drive. After one of the HP drives had failed, the client replaced it with another HP drive and began the process of re-integrating the new drive into the array, a process known as “rebuilding” the array.

Everything seemed to be going well, until a second hard drive failed during the rebuild process. The second drive failure crashed the server and brought the client’s business to a grinding halt. The client sent the drives from their failed RAID-5 server to Gillware to recover their VHDX files and Quickbooks records.

VHDX Recovery Case Study: Risky RAID-5 Rebuild
RAID Level: 5
Drive Model: Western Digital WD6000BKHG-02A29V1 600 GB SAS (x2), HP MBF2600FC 600 GB SAS
Drive Capacity: 1.2 TB
Operating System: Windows Server 2012
Situation: 3-disk RAID-5 server failed. When they added the replacement disk and began the rebuilding process, a second drive failed, crashing the server
Type of Data Recovered: Hyper-V VHDX virtual machines and Quickbooks files
Binary Read: 99.9%
Gillware Data Recovery Case Rating: 9

RAID-5: The Risks of a Rebuild

When you link multiple hard drives together to create a RAID-5 array, you create redundant data on the array in the form of XOR parity blocks. These parity blocks allow the array to reconstruct missing data if a single hard drive fails. By performing XOR computations on the remaining drives, the array seems to miraculously resurrect the data from the failed drive. And when you insert a new hard drive into the array and replace the old one, the array uses the same XOR computations to write the data onto the healthy drive.

But unfortunately, there is a risk associated with rebuilding the RAID-5 array. When a RAID-5 array runs minus one hard drive (in a degraded condition), its remaining drives endure much more stress than usual. And that stress only increases once the rebuild process begins and the server must work to both integrate the new drive and continue performing its duties. With the additional workload, a second hard drive in the array might fail during the rebuild, creating gaps and holes in the array and crashing the server.

This can put the IT department responsible for looking after the server in an uncomfortable bind when a hard drive fails. Not only can the rebuild process slow the server’s performance to an unacceptable crawl, but it also carries with it the risk that the rebuild process itself will cause the server to crash. But all the same, you must replace the failed hard drive sooner or later. Do you replace it right away? Or do you wait for a more opportune time to rebuild the array?

Sadly, a RAID-5 crash can happen regardless of which option you choose. Fortunately for you, though, Gillware is here to help in these situations.

VHDX Recovery

Our data recovery engineers took a look at each of the four hard drives the client had sent us for RAID-5 data recovery. In our cleanroom, we created 99.9%-complete forensic images of the two failed hard drives, then sent the images to our RAID engineer Cody for analysis.

Cody used the metadata contained on each of the four drives to determine their order in the array, which one had been newly added, and the times of failure for both failed hard drives. Casting aside the replacement drive and the first failed hard drive (which was filled with “stale” data), Cody could rebuild the array using the two remaining drives. After using RAID controller emulation software to connect the two remaining drives (with XOR parity filling in for the missing third), Cody was able to extract the server’s contents.

Recovering the VHDX files came next. Virtual machines store entire hard drives inside single VHD and VHDX files, which can be mounted and run as if they were independent computers by a hypervisor. The client in this case had been using Microsoft Hyper-V to create and manage their virtual machines. Cody had to test these VHDX files and examine their contents to make sure none of the gaps created by the failed drive had fatally corrupted any of the client’s critical data such as their Quickbooks files.

After examining the client’s VHDX files, Cody found no corruption affecting any of the client’s critical data. This VHDX recovery case was a success, with the vast majority of the client’s files in perfect condition. Our engineers rated this case a high 9 on our ten-point case rating scale.

Will Ascenzo
Will Ascenzo

Will is the lead blogger, copywriter, and copy editor for Gillware Data Recovery and Digital Forensics, and a staunch advocate against the abuse of innocent semicolons.

Articles: 213