In this Dell server recovery scenario, the client’s server fell victim to a power outage. When the power came back on, the Dell PERC 6/i RAID controller card reported two errors. One drive had failed. Another drive was reported as having a “foreign configuration”. With two drives unrecognized, the RAID-5 array was unable to function.
Dell Server Recovery Case Study: PowerEdge RAID-5 Array Foreign Configuration
Drive Model: Seagate ST3146356SS
Total Capacity: 292 GB
Operating System: Windows
Situation: After a power outage, one drive failed and one had a foreign configuration
Type of Data Recovered: Quickbooks and Taxwise documents
Binary Read: 99.9%
Gillware Data Recovery Case Rating: 9
The client here had three Seagate ST3146356SS SAS (serial-attached SCSI) hard drives in their Dell PowerEdge server connected in a RAID-5 configuration. The one remaining drive acted as a hot spare. When hard drives are linked in a RAID-5 array, blocks of data are striped across the disks in the array, along with extra parity data. The RAID controller performs XOR logic functions using this parity data to reconstruct any data that goes missing if one drive in the array fails. RAID-5 not only allows multiple hard drives to act as one data storage volume. It also offers increased performance and a layer of protection in case one hard drive fails.
This RAID-5 array was more well-protected than most. One of the four hard drives in the client’s server was acting as a hot spare. A hot spare is a drive that sits inside a RAID array, biding its time until one drive fails. It is “hot” in the sense that it is powered on and running, but not receiving any commands to read or write data. Once a hard drive fails, the hot spare jumps in, and the RAID controller automatically begins to rebuild the array. The failed drive is tossed aside and the controller uses XOR calculations to write all the missing data to the hot spare. There are some risks involved in using a hot spare, but in theory, this provides a second layer of data redundancy for the RAID-5 array.
But in this Dell server recovery situation, things didn’t quite work out as planned. The power failed, the lights went out, and when they came back on, the client’s Dell PowerEdge server didn’t come back with them. Two hard drives had failed at once, and this was enough to stop the server in its tracks. The client checked the Dell PERC 6/i controller card for a status report on the four drives. One was listed as failed. The other was now reading as having a “foreign configuration”.
All of the hard drives in a RAID array have a little bit of metadata written to them. This metadata is invisible to the user, but vital to the array. The RAID controller writes and keeps track of this metadata to determine what order the drives go in, what order the blocks go in, the size of the blocks, etc. If the RAID controller sees metadata on one drive that doesn’t match the rest, it flags that drive as having a foreign configuration. It was likely something had happened to corrupt the metadata on this one drive when the power had failed. A sudden loss of power, especially when a hard drive’s read/write heads are in the middle of writing something, can cause data corruption. The read/write heads may fail to properly or completely write data to a sector. This can turn important data into garbled nonsense.
The client sent all four of their hard drives to our RAID recovery technicians for Dell server recovery services. We always recommend that clients with failed RAID arrays send all of the drives associated with their RAID arrays, regardless of whether they are currently used in the array. Some of the metadata might turn out to be redundant, but in the event that some of that data is lost due to hard drive failure, every little bit helps. Our RAID data recovery engineers would rather know too much about how your RAID array was put together than too little.
The Dell server recovery process began with, as is customary, a free evaluation. We even sent the client a prepaid UPS shipping label so they wouldn’t have to pay for inbound shipping. Our data recovery technicians inspected the two failed drives and were able to read 99.9% of the sectors on their platters. Our engineers use powerful write-blocked forensic imaging tools of our own design. These tools allow us to read and copy most sectors, even if a hard drive isn’t healthy enough for a computer to read them unassisted.
After our engineers assessed the condition of the hard drives, we presented the client with a price quote and probability of a successful Dell server recovery process. The client approved the quote, and our recovery work continued. The forensic images we made of the four drives went over to our RAID engineer Charles. Charles analyzed the metadata on the four drives and was able to place the three used drives in the correct order. This RAID array had three of its four drives in a left-synchronous arrangement, with blocks 64 kilobytes large striped across the drives. Using custom software to emulate the Dell PERC 6/i RAID controller, we were able to access the data on the array.
Charles assessed the recovered data and tested the client’s critical Quickbooks files and Taxwise documents for any signs of corruption. Since there had been sectors on the two failed drives we couldn’t read, there was a possibility that some of the data was corrupted. The client’s most critical data, though, was intact. Upon request, we sent the client a list of all of the files we had recovered. The client was pleased with the result and paid the bill for our Dell server recovery services. We sent their data back to them on a healthy, new hard drive. Our engineers ranked this Dell server recovery case a 9 on our ten-point scale.
While the client did have a hot spare, it wasn’t able to protect their RAID-5 array from data loss. XOR parity calculations can only reconstruct the contents of one drive if all the rest are healthy. It cannot fill in multiple missing points. When two drives failed simultaneously, the RAID controlled didn’t even know where to begin.
If the client had used their hard drives to set up a four-drive RAID-6 array, instead of a three-drive RAID-5 array with one hot spare, they would have had the same total capacity available and been able to withstand the failures of both drives without their server crashing. RAID-6 uses extra parity data so that it can continue to function even if up to two hard drives in the array fail. Perhaps after this Dell server recovery situation, the client will consider switching to RAID-6.