The read/write heads from this clicking hard drive.
Clicking Seagate Hard Drive Case Study: When Backup Isn’t Backup
June 17, 2016
Power Surge Data Recovery Case Study: Synology RAID-5 NAS Device
June 21, 2016

RAID 5 Video Demonstration: Salvaging Data with a Stale Drive

In this RAID 5 video blog, Brian Gill breaks down one of our recent RAID 5 data recovery cases. Our client sent in a 3 drive RAID 5 array for data recovery. One hard drive was perfectly healthy. However, two other hard drives had failed. One of the drives had massive platter damage and was completely nonrecoverable. Most of the magnetic substrate on its platters containing the client’s data had been reduced to a fine powder. The third drive had a failed spindle motor.

RAID 5 video

Top: Timestamps from the healthy drive. Bottom: Timestamps from the stale drive.

The drive with the failed spindle motor had failed eight months ago, and the RAID array had been running minus one drive since then. RAID level 5 uses XOR functions to reconstruct lost data in case one drive in the array fails. When a RAID 5 array continues to run after one drive fails, we say the RAID has been running in a “degraded” condition. No data has been lost, but the array is a vulnerable state.

The hard drive that had failed eight months ago was the only one of the two failed hard drives that could be resuscitated. The only option for our data recovery technicians was to put the RAID array back together using the healthy drive and the stale drive.

Severe rotational scoring

Severe rotational scoring on one of the client’s three hard drives.

Having to recreate a RAID 5 array using a stale hard drive is hardly an ideal circumstance for our data recovery technicians. As Brian demonstrates, the stale drive causes any data written to the array since the beginning of the “stale epoch” to become corrupted. In this case, the stale epoch began in October and lasted until the second drive failed in June.

In this video, Brian discusses how RAID 5’s XOR parity calculations try to reconstruct the data on a failed RAID array using one healthy drive and one stale drive, and how data can become corrupted as a result. Unfortunately, there’s no way around having to use the stale drive in this particular case. As a result, only a small fraction of the data created during the RAID 5 array’s stale epoch will be usable.

 

 

RAID 5 Video Demonstration: Salvaging Data with a Stale Drive

7 Comments

  1. […] whenever possible. But in some situations, in order to recover RAID 5 data, our engineers have to rebuild the RAID 5 array with the stale drive and salvage the usable data from the […]

  2. […] Putting the server back together only required one of the two failed drives, though, due to RAID-5’s parity. This is good, because when two drives fail, they rarely do so at the same time. As a result, the first drive to fail ends up filled with “stale” data, which we want to avoid using to repair the RAID array as long as we have a choice in the matter. […]

  3. […] can look at our video blog post to find out what happens when you try to reconstruct a RAID 5 array with a stale…. The results aren’t particularly pretty. The stale blocks of data can cause massive amounts of […]

  4. […] Servers typically connect hard drives together using fault-tolerant methods, such as RAID-5. A RAID-5 array has one drive’s worth of fault tolerance, meaning one hard drive can fall offline without jeopardizing the client’s data. Servers are meant to run continuously, so they constantly monitor the health of the hard drives in the array. If the RAID controller senses that one drive is behaving oddly or about to fail, it will kick the drive offline and let the RAID array’s fault tolerance on the remaining two drives fill in for it. Because the drive is offline for months, the data on it becomes out-of-date and “stale”. […]

  5. […] failed hard drives. Casting aside the replacement drive and the first failed hard drive (which was filled with “stale” data), Cody could rebuild the array using the two remaining drives. After using RAID controller […]

  6. […] To access this failed RAID array’s VMFS filesystem, our RAID recovery technicians first had to rebuild the array. To do this, we needed the write-blocked images of the two healthy drives and the last drive to fail. While the first failed drive was also healthy, because its data was stale, introducing it into the rebuilt RAID would have been catastrophic. […]

  7. […] Two of the RAID-5 arrays in this server had been running degraded for some time. In January, two drives in two separate RAID arrays had failed, and since then, both arrays had been running in a degraded state. The first drive(s) in a RAID array to fail are known as stale drives. As time passes and the degraded server continues its operations, the data trapped on the failed drives becomes increasingly out-of-date. Forcing stale data back into a RAID array causes massive data corruption. […]

Leave a Reply

Your email address will not be published. Required fields are marked *