fbpx

RAID 5 Video Demonstration: Salvaging Data with a Stale Drive

In this RAID 5 video blog, Brian Gill breaks down one of our recent RAID 5 data recovery cases. Our client sent in a 3 drive RAID 5 array for data recovery. One hard drive was perfectly healthy. However, two other hard drives had failed. One of the drives had massive platter damage and was completely nonrecoverable. Most of the magnetic substrate on its platters containing the client’s data had been reduced to a fine powder. The third drive had a failed spindle motor.

RAID 5 video
Top: Timestamps from the healthy drive. Bottom: Timestamps from the stale drive.

The drive with the failed spindle motor had failed eight months ago, and the RAID array had been running minus one drive since then. RAID level 5 uses XOR functions to reconstruct lost data in case one drive in the array fails. When a RAID 5 array continues to run after one drive fails, we say the RAID has been running in a “degraded” condition. No data has been lost, but the array is a vulnerable state.

The hard drive that had failed eight months ago was the only one of the two failed hard drives that could be resuscitated. The only option for our data recovery technicians was to put the RAID array back together using the healthy drive and the stale drive.

Severe rotational scoring
Severe rotational scoring on one of the client’s three hard drives.

Having to recreate a RAID 5 array using a stale hard drive is hardly an ideal circumstance for our data recovery technicians. As Brian demonstrates, the stale drive causes any data written to the array since the beginning of the “stale epoch” to become corrupted. In this case, the stale epoch began in October and lasted until the second drive failed in June.

In this video, Brian discusses how RAID 5’s XOR parity calculations try to reconstruct the data on a failed RAID array using one healthy drive and one stale drive, and how data can become corrupted as a result. Unfortunately, there’s no way around having to use the stale drive in this particular case. As a result, only a small fraction of the data created during the RAID 5 array’s stale epoch will be usable.

RAID 5 Video Demonstration: Salvaging Data with a Stale Drive

Default image
Will Ascenzo
Will is the lead blogger, copywriter, and copy editor for Gillware Data Recovery and Digital Forensics, and a staunch advocate against the abuse of innocent semicolons.
Articles: 218

7 Comments

  1. […] Servers typically connect hard drives together using fault-tolerant methods, such as RAID-5. A RAID-5 array has one drive’s worth of fault tolerance, meaning one hard drive can fall offline without jeopardizing the client’s data. Servers are meant to run continuously, so they constantly monitor the health of the hard drives in the array. If the RAID controller senses that one drive is behaving oddly or about to fail, it will kick the drive offline and let the RAID array’s fault tolerance on the remaining two drives fill in for it. Because the drive is offline for months, the data on it becomes out-of-date and “stale”. […]

  2. […] To access this failed RAID array’s VMFS filesystem, our RAID recovery technicians first had to rebuild the array. To do this, we needed the write-blocked images of the two healthy drives and the last drive to fail. While the first failed drive was also healthy, because its data was stale, introducing it into the rebuilt RAID would have been catastrophic. […]

  3. […] Two of the RAID-5 arrays in this server had been running degraded for some time. In January, two drives in two separate RAID arrays had failed, and since then, both arrays had been running in a degraded state. The first drive(s) in a RAID array to fail are known as stale drives. As time passes and the degraded server continues its operations, the data trapped on the failed drives becomes increasingly out-of-date. Forcing stale data back into a RAID array causes massive data corruption. […]

Leave a Reply