The MSP’s Guide to Regulatory Compliance: Backup Edition
March 27, 2015
Android Device Data Recovery: Gillware’s Method
April 17, 2015

In Gillware’s latest blog series, “Data Recovery 101”, our bloggers take a closer look at each of the different components of a hard drive and explain how they work, how they fail and how we recover the data from each failure situation. In this post, our CEO Brian Gill explores the drive’s firmware or hard drive operating system.

What is hard drive/SSD firmware?

Photo credit Scott Schiller, https://flic.kr/p/2tCQdS

Photo credit Scott Schiller, https://flic.kr/p/2tCQdS

Firmware is the storage device’s operating system. Just like you may run a Windows operating system on your computer and run an IOS or Android operating system for your phone, a complicated device needing to store and organize billions of bits onto platters or NAND needs an operating system. I personally prefer the term hard drive operating system or HDD O/S instead of firmware. Either is accurate, but the term firmware tends to have a connotation that it’s nothing special or unique. One would expect the firmware for multiple electronic devices coming off a manufacturing line to be identical, but this is not the case in the world of storage.

The firmware on a spinning disk will have all the compiled application code for doing everything the drive needs to do. This baseline firmware will vary slightly from O/S revision to O/S revision. Hard drive manufacturers are always making tweaks to this code for increased performance, security and reliability. Some manufacturers will produce hundreds of versions of their base firmware in a calendar year. For any particular drive-line, like the Western Digital Blue desktop series, they may have ten or twenty revisions a year.

HDD manufacturers will sometimes create custom firmware for different computer companies; Apple likes to have their own firmware as one example. They will code different drive behavior for drives intended for enterprise data centers, consumer desktops, consumer DVR units, etc. A consumer drive like a WD Green will spin down its platters and park the heads during inactivity, as opposed to a WD Enterprise drive that will keeping spinning until a RAID controller tells it to spin down. These behaviors are defined in the firmware.

The firmware zone is also where a lot of the drive’s unique calibrations, defect lists, zone tables, unique translation (addressing) information, performance logs and SMART attributes are stored.

How does firmware fail?

Firmware can become corrupted and require repair. Even though a manufacturer will typically keep at least one backup copy of this special set of data, unrecoverable corruptions can occur.

Ironically, I believe the majority of these corruptions are directly attributed to the very mechanisms that exist to prolong a drive’s lifespan and warn you of imminent failure.

All modern drives implement SMART (Self-Monitoring, Analysis and Reporting Technology). These drives are paying attention to their own behavior and performance, and when it starts deviating outside the norms, log that information in log files and SMART tables.

 

Photo Credit Patrick Pellitier, https://flic.kr/p/o7Z6qa

Photo Credit Patrick Pellitier, https://flic.kr/p/o7Z6qa

During a drive’s lifetime sectors go bad.  The first time a sector is attempted to be read after it has failed it needs to get put on a list of sectors that we’d like to relocate.  The drive can’t relocate it right away as the sector is corrupted as the drive does not know what data used to live there.  If that sector happened to be in the middle of a payroll database, and the drive just handed back a bunch of random zeros instead of giving an UNC error, you might pay an employee $1000000 instead of $1000.  But, at the next write opportunity, the sector will get remapped to a healthy sector that in a reserve area.  It’s not a great situation when you try to load that database and the operating system says it cannot be loaded because of sector errors, but it is better than pretending everything is fine.

This information about which sectors are pending reallocation and have been successfully reallocated (and where) live on the platters in the firmware zone.  Also in the firmware zone are the performance logs, events, and subsequently SMART attributes.

How do corruptions occur?

So let’s imagine a scenario where a headstack is in the early stage of failure. It’s taking multiple read attempts to successfully read data, those read events are having unacceptable latency, and lots of sectors need to be added to the growth defect list. The drive needs to use those same heads to save this performance and sector information to the platters! So, one can easily understand how they might write a bunch of gibberish to the firmware zone.

Let’s imagine another scenario where a drive is in the middle of doing a bunch of this sector reallocation and subsequently a bunch of performance bookkeeping in the SMART tables. The end user is experiencing I/O lag on the drive and is getting frustrated. The frustrated user decides to do a therapeutic shutdown and cold reboot. The operating system notifies the drive that it wants to perform a shutdown. The drive replies “gimme a minute I’m in the middle of some bookkeeping” so the O/S blocks the event temporarily and is going to wait until the drive tells it “cool I’m done, go ahead and shut down”. The human is now having their blood pressure raised as even the shutdown is taking 30 seconds! And they perform a hard shutdown or just yank the power cord rather than wait, while the drive was right in the middle of altering its operating system. Once again, it isn’t difficult to comprehend how the HDD O/S can be adversely affected.

How does Gillware recover data from cases with firmware corruption?

When these firmware areas are corrupted it will need to get repaired or the drive cannot boot itself, just like if Windows has corrupted O/S files and cannot boot itself you’ll need to grab an O/S disk and troubleshoot. There are many approaches for performing this analysis and repair. Here at Gillware we build our firmware library every single day and attempt to back up the firmware on every drive that enters our doors as part of our standard process. There are tools you can buy to perform the basic operations of reading/writing firmware, the most popular being the PC-3000 toolkit.

gillware-data-recovery-intel-320-series-bad_ctx-8mb-bug
Typically the drive will not correctly detect in the BIOS.  Instead of this vintage drive detecting as a 6E040L0 model with 40 GB of capacity, a common firmware corruption will cause that drive to detect with 0 GB capacity and as N40P.  A more modern example is the Intel 8MB bug.  When the firmware has a corruption, instead of this SSD drive detecting as an Intel 320 series with 160GB of capacity, it will detect as BAD_CTX with an error code and 8MB of capacity.

Ten percent of the cases we see here at Gillware have healthy internals, electronics and the data user area (partition tables, file system meta-data, binary user data) is healthy as well. They show up here because the only problem is they have a firmware bug. Ten percent may not seem like a lot, but there are other uses for firmware manipulation besides repair, and as a company that has spent millions of dollars to increase success rates, I’ll tell you the ability to recover data on an extra ten percent of cases is huge.

The challenge of reverse engineering complicated electronic devices, and applying that knowledge to get people out of jams, is actually fun for a certain type of engineer. It would be a lot more fun if the stakes and anxiety wasn’t so high for all our clients. If you’ve got a computer engineering and programming background, and dedicate about three years of your life to it, you’ll get pretty decent at it. A scientific background and ten thousand hours will make you a master of this very odd specialty. I’d estimate there are less than 300 humans worldwide that have put in these 10,000+ hours. I’ve met about 12 of them, and none of us want to do it very much anymore but it’s very difficult to get away from it entirely.

Knowledge of the storage device’s operating system is of paramount importance to any company serious about data recovery. It is when you use it in conjunction with other skills like electrical or physical rework that the applied knowledge is truly useful.

Download our white paper about firmware corruption data recovery

//

21 Comments

  1. […] are a lot of other things to explore and detail, but if this is your introduction to how data is organized on a hard drive, you deserve credit for […]

  2. […] are a lot of other things to explore and detail, but if this is your introduction to how data is organized on a hard drive, you deserve credit for […]

  3. […] A lot of the posts on our blog focus on the mechanical aspects of data recovery, like replacing failing read/write head assemblies or restoring damaged platters using our burnishing machine. However, mechanical repairs are only half of the data recovery process. After the drive is restored to a workable state, our engineers still have a critical task ahead of them: retrieving the actual data from the failed device. This is often easier said than done. Beyond mechanical issues, many drives present logical obstacles to recovery as well, such as bad sectors or data corruption. […]

  4. […] drive into our lab, our engineers determined that it was suffering from an issue with its firmware. Firmware is a storage device’s operating system that holds all of the compiled application code for […]

  5. […] drive into our lab, our engineers determined that it was suffering from an issue with its firmware. Firmware is a storage device’s operating system that holds all of the compiled application code for […]

  6. […] of the drive. The failed Seagate external drive was suffering, first and foremost, from a firmware problem. Before you can see anything on a drive, the read/write heads have to read and make a […]

  7. […] hard drive’s read/write heads failure, the drive had suffered firmware corruption. Hard drive firmware is the most important bit of data on any hard drive. Whenever you power on a hard drive, the first […]

  8. […] reformats are rarely the result of a slip of a finger. If a hard drive is experiencing intermittent firmware or connectivity issues, it may show up as blank and prompt for a reformat. When this happens, […]

  9. […] hard drive’s problem lay within its firmware. Hard drive firmware is somewhat like the operating system in your computer. Without an operating system on your hard […]

  10. […] Repairing hard drive firmware when it fails isn’t easy. It takes special tools to access the firmware. And furthermore, there’s no real documentation out there explaining how hard drive firmware works—especially since each hard drive manufacturer programs theirs differently. Even among the same model of hard disk drive, there can be dozens of different firmware revisions out there. […]

  11. […] of the blue, some sort of logical corruption is often to blame. Boot sector corruption, or even firmware corruption, can make a hard drive appear to be unformatted. A raw hard drive can also be suffering […]

  12. […] disk drives actually have an operating system of their own, just like the computers they live in. The hard drive’s firmware makes up its “operating system”. Firmware serves as the connective tissue between the hard drive’s user and the data on its hard […]

  13. […] Hard drive firmware fulfills a crucial function, and when it develops a fault, it can make the drive completely unusable. Put simply, hard drive firmware instructs the hard drive on how to “talk” to the data on its hard disk platters. […]

  14. […] Hard drive firmware does a lot of the heavy lifting when it comes to your hard drive’s daily functions. In fact, it works like the “operating system” of your hard drive. Just as your computer’s O/S brings all of the separate components of your computer together to make it work, hard drive firmware brings all of your hard drive’s components together. […]

  15. […] engineers found, upon evaluation, that the hard drive had suffered a failure of its firmware. Hard drive firmware acts as the drive’s “operating system”. Like this client’s Windows 10 […]

  16. […] hard disk drives, SSDs have firmware of their own to deal with the flow of data in and out of the device. SSD firmware is a complicated […]

  17. […] computer’s pace. The heads’ faltering performance had also created a glitch in the drive’s firmware, the “operating system” of the hard drive with which the read/write heads must “shake […]

  18. […] the fact that they both hold your data, of course). In both hard drives and solid-state drives, firmware acts as the drives’ “operating system”. It manages your computer’s access to the data on […]

  19. […] Just like traditional spinning disk hard drives, SSDs rely on complex firmware to manage the flow of…. When the 320 Series goes bad, an interesting quirk manifests itself in the firmware. When asked to report its capacity, the SSD responds by listing itself as having a whopping grand total of eight megabytes of storage space in its NAND chips. (For reference, the ancient ST-506 had a five-megabyte capacity—talk about a blast from the past!) […]

  20. […] Hard drives are naturally noisy creatures, although most modern ones are relatively quiet. (Old Maxtor and Hitachi hard drives especially can make quite a racket, though, even when they are healthy.) When a hard drive’s read/write heads unpark and position themselves above the spinning hard disk platters, a single click usually accompanies the movement. Our engineers sometimes refer to this as the “happy sound” because it means that the heads have positioned themselves properly over the firmware sectors and made a handshake with the drive’s firmware. […]

  21. […] with our engineers. Normally, when a hard drive powers on, its read/write heads find the firmware, read it, and store the data in the drive’s RAM before continuing its normal operations. The […]

Leave a Reply

Your email address will not be published. Required fields are marked *