Dell PERC Data Recovery | Failed Controller & Foreign Config

If a Dell PERC RAID controller has dropped your array offline, refused to import a foreign configuration, or lost virtual disks after a routine reboot, you’ve reached the right team. The PERC line is the most widely deployed RAID controller family in small and mid-market server fleets, and we have recovered data from more PERC arrays at our lab than any other RAID controller brand. Gillware has operated as a dedicated data recovery laboratory since 2004 from our ISO 5 Class 100 cleanroom in Madison, Wisconsin. PERC cases are scoped at intake by an engineer who has handled the failure mode you’re looking at — not by a generic sales gate. See also our RAID data recovery hub.

Open a Dell PERC recovery case →

How Dell PERC Controllers Work

The Dell PowerEdge RAID Controller (PERC) family has shipped under that name since the early 2000s. Early PERC cards (through PERC 5) used Adaptec-derived silicon; PERC 6 onward switched to LSI — now Broadcom — silicon under Dell-customized firmware. The H310 through H755 generation that’s most active in the field today is built on Broadcom MegaRAID hardware with Dell firmware overlays and tight integration with iDRAC and the Lifecycle Controller.

Every PERC from the 6/i generation onward writes its array configuration as SNIA Disk Data Format (DDF) metadata to a reserved region at the end of each member drive. That on-disk metadata records the stripe size, member ordering, parity rotation, and spare assignments for the array. The controller card itself is not the only thing that knows the array configuration — everything needed to reconstruct the array is also written on the drives, which is the property our recovery process relies on when the original controller is dead or refusing to import. Controller events are written to the PERC event log accessible through PERCCLI, iDRAC Lifecycle Logs, and OpenManage Server Administrator (OMSA), and the patterns we see most often in those logs are the ones documented below.

PERC Error Conditions That Lead to Data Loss

Dell’s own PERC user guides and PowerEdge servers troubleshooting documentation catalog dozens of error conditions. The patterns below are the ones that disproportionately end up at our lab — either because they imply data loss in progress, multiple drive failure beyond the array’s redundancy, or a configuration state where the next attempted action commonly destroys the array. We are quoting the exact error strings from Dell’s documentation where applicable, because matching what you’re seeing on screen to a documented condition is the first step toward not making it worse.

Foreign Configuration errors. The single most common state that brings PERC cases to us. The controller has detected an array configuration on the attached drives that doesn’t match its current configuration. Sometimes the cause is benign (a controller swap, a chassis move), sometimes the array drops to a foreign state after an otherwise routine reboot with no apparent trigger. The Dell PERC User Guide describes a clean import path, but in the field we frequently see the import fail with Incomplete foreign configuration from PERCCLI, drives stuck in a UGood (Unconfigured Good) state instead of Foreign, or an import that succeeds but warns: Import Warning: possible outdated physical disk data. Insert missing Physical Disk before importing the configuration. That warning is the controller telling you the metadata is mismatched across the surviving members — importing in this state often triggers a rebuild that propagates stale data across the array.

Multiple drive failure beyond fault tolerance. Dell’s PowerEdge servers troubleshooting guide includes a dedicated Troubleshooting multiple Drive failure section because the scenario is common enough on aging fleets to warrant its own diagnostic flow. RAID 5 tolerates the loss of one disk; RAID 6 tolerates two; RAID 10 tolerates one disk per mirror pair. When a second or third drive fails before the controller has completed a rebuild, the array drops offline and the controller refuses to mount it. We see this most often on H700/H710-era arrays where the drives were purchased together and are reaching end of life in the same window. The first drive failure isn’t a surprise; the second arriving 48 hours later is what takes the array down. In a substantial fraction of these cases the second drive’s failure is partial — a region of unreadable sectors rather than total drive death — which is why physical drive imaging in the cleanroom often resurrects enough surface to make recovery possible even after a multi-drive event.

RAID puncture and double fault conditions. RAID puncture is a Dell-specific PERC feature, documented in Dell’s troubleshooting guide as “rebuild with errors,” and it is one of the few conditions where Dell’s own documentation explicitly acknowledges data loss. The user guide describes two situations: “Double Fault already exists (Data already lost)”, meaning the data was already gone when the second error occurred, and “Double Fault does not exist (Data is lost when second error occurs)”, meaning the data is destroyed in the moment the second error happens. The controller punctures the affected stripe to keep the array serving the rest of the data and to allow the rebuild to continue. After a puncture, Check Consistency and Patrol Read operations against the affected LBAs return Sense code 3/11/00 (Medium Error — Unrecovered Read Error), and over time the Bad Block Management (BBM) table fills up. As that table fills, drives that are physically healthy start getting flagged as predictive failure because they hold the propagated errors from the punctured stripe. Dell’s resolution path is explicit and final: “An array that is RAID punctured will eventually have to be deleted and recreated to eliminate the RAID puncture. This procedure causes all data to be erased.” Before anyone executes that step, the array should be imaged and evaluated. We cover the scenario in detail on our RAID puncture page.

Preserved (pinned) cache states. When a virtual disk goes offline or its drives go missing while the controller is holding dirty cache that hasn’t been flushed to disk, the controller preserves that cache in a state Dell calls “pinned cache.” The pre-OS message, quoted directly from the Dell PERC User Guide, reads: “There are offline or missing virtual drives with preserved cache. Please check the cables and ensure that all drives are present. Press any key to enter the configuration utility.” From there an operator has two options — import the virtual disk and let the cache flush, or discard the cache. Both options can cause data loss in the wrong scenario. Discarding pinned cache that contains the last writes before failure means those writes are gone. Importing a virtual disk where the pinned cache was generated against a different state flushes stale writes to wrong logical addresses and corrupts the file system on top of an otherwise intact array. The safe path when this error appears is to image the drives first, evaluate the cache contents, and then decide whether the flush is recoverable.

Failed rebuild on H730 / H730P / H740P. These cards shipped between 2014 and 2020 with 13th and 14th generation PowerEdge servers, which puts the installed fleet squarely in the peak failure window now. A drive fails, the controller marks the array degraded, a replacement is inserted, and during the rebuild a second drive develops read errors. The rebuild aborts and the controller refuses to mount the array. Large arrays on high-capacity drives hit this often: the statistical odds of an unrecoverable read error during a full-array rebuild climb sharply with disk size, and an 8-disk, 8-TB array reads roughly 56 TB of surface during a rebuild — well into the range where a URE is likely.

Virtual disk missing after reboot. Documented repeatedly in the Dell community forums and reproduced in our own incoming case load: a previously-healthy RAID 1 or RAID 5 disappears from the controller view after an otherwise routine reboot, sometimes returning, sometimes not. H730 Mini variants are particularly prone. A typical case profile: a healthy R730 reboots for patching, comes back up with one drive marked Foreign, one marked Ready (as if it had never been part of the array), and the virtual disk gone. The drives are healthy, the data is intact, the array map is what’s missing.

Physical disk predictive failure cascades. PERC controllers use Bad Block Management (BBM) tables to track media errors at the physical drive level. When a drive’s BBM table fills, the controller flags it as “predictive failure” and prompts the operator to replace it. Critically, the drive flagged as predictive failure is often not the source of the problem — errors from a punctured stripe or a propagated bad block on a neighboring drive end up logged against the drive that read them. Replacing the flagged drive triggers a rebuild, the rebuild encounters the propagated bad blocks again, and the replacement drive eventually gets flagged the same way. We see this play out as long-running cycles of drive replacements that never solve the underlying media-error or puncture condition.

BBU failure and ESM log battery messages. The Dell troubleshooting guide names “PERC battery failure message is displayed in ESM log” as a specific scenario. A failing battery forces write-back cache to write-through, slows the array dramatically, and can leave preserved cache from a previous boot that flushes incorrectly on power-up. Battery learn cycles — a routine maintenance operation — are themselves a window of vulnerability because the controller temporarily operates in write-through during the cycle. File-system corruption layered on top of an otherwise intact array is the common outcome of unsupervised BBU degradation.

Cross-vendor migration disasters. Drives moved from a PERC to an Adaptec, HP Smart Array, or non-Dell MegaRAID will not import. The metadata formats are not compatible, and a controller that doesn’t recognize the existing metadata sometimes offers to initialize the drives. Accepting that prompt destroys the array. We see this case regularly after attempted hardware refreshes where someone assumed a controller swap would be transparent.

Lifecycle Controller side effects. Certain Lifecycle operations on H330 and H730 cards have been observed to wipe partition tables on previously-good virtual disks. The array itself stays intact at the RAID layer, but the file-system signatures at the head of the assembled volume don’t. The result is a healthy array with what looks like a blank disk on top of it — recoverable, because the file-system structures are reconstructible from the underlying data, but not trivially so without proper imaging and forensic recovery.

Front-panel and OMSA event codes. When the front-panel LCD on a PowerEdge displays E1810 Hard Drive X fault alongside any of the conditions above, you have a documented physical drive failure on top of the controller-level condition. Likewise, OMSA event entries for “Physical disk in failed state,” “Virtual disk degraded,” or “Virtual disk failed” line up directly with the controller states we’ve described — what matters is identifying which of the patterns above applies, because the recovery path differs significantly between them.

One pattern worth naming separately. The standard OEM support-engineer instruction for several of the conditions above — pinned cache, foreign configurations that won’t import cleanly, RAID punctures — is to delete and recreate the array. That advice gets the production server back into a working state. It also irreversibly destroys whatever data was on the array at the moment it executes. The real decision point in front of a downed PERC isn’t “OEM support versus recovery shop.” It’s “rebuild the server now and lose the data” versus “image the drives first and recover the data.” A short call with our engineering team scopes which path applies.

How We Recover Dell PERC Arrays

We never operate a failed PERC array during recovery. Running a degraded array during diagnostic work risks pushing the next drive over the edge and turning a recoverable case into an unrecoverable one. Each drive is removed from the chassis, bay positions documented, and imaged on isolated, write-blocked hardware in our cleanroom. Physically damaged drives are repaired with donor parts as needed before imaging — head replacements, PCB swaps, firmware recovery, and platter burnishing where the surface has been damaged. We work from drive images for everything that follows; the originals stay shelved and untouched.

Once we have a verified image of every drive, our reconstruction work begins. HOMBRE — Gillware’s in-house RAID and file-system reconstruction software, built and maintained by the engineers who use it — inspects every single sector of every drive image, identifying SNIA DDF metadata blocks at the tail of each disk and file-system forensic artifacts throughout. That sector-by-sector inspection is the key to rebuilding a PERC array without the original controller. We don’t depend on the controller to tell us what the array looked like; HOMBRE reads it directly from the drives.

On PERC arrays specifically, HOMBRE locates the DDF Anchor and Header structures in the reserved region near the end of each disk, cross-validates the configuration records across the disk images, and reconstructs the stripe size, member ordering, parity rotation algorithm, and starting LBA offset that the original PERC firmware was using. Where a puncture is present in the DDF metadata, HOMBRE flags the affected stripes so the recovered file system can be reported with file-level accuracy about which files were impacted by the puncture. Where individual drives have unrecoverable surface regions, the same approach applies — we report which files are affected rather than declaring the entire array a loss. Where preserved cache is involved, the cache contents are inspected directly and decisions about whether to apply or discard particular writes are made with full visibility, not blind.

The engineers running this work see the failure modes catalogued above on a weekly basis. There is no PERC condition on this page that we are encountering for the first time. HOMBRE assembles the array as a virtual volume from the images, and the file-system layer above it — NTFS, ReFS, VMFS, ext4, XFS, ZFS, whatever the array was hosting — is recovered against the assembled volume. The deliverable is a file list and an outcome you can act on, rather than a controller that’s been talked back into mounting and then expected to keep working.

Related RAID Recovery Pages

By RAID level: RAID 0 · RAID 1 · RAID 5 · RAID 6 · RAID 10 · RAID puncture. By controller brand: HPE Smart Array · LSI MegaRAID (PERC controllers are OEM-rebranded MegaRAID hardware and the recovery process is the same) · Adaptec · IBM ServeRAID · Intel RAID · 3ware. By Dell platform: Dell server data recovery · PowerEdge data recovery · PowerVault data recovery. Return to the RAID data recovery hub for the full overview.

Start Your Dell PERC Recovery

If your PERC array is offline and production data is on it, power the system down before any other action. Do not attempt another rebuild, do not accept any “import foreign configuration” prompt, do not initialize any of the drives, and do not delete and recreate the array even if the OEM support engineer on the phone is suggesting it. Label each drive with its bay position before removing it from the chassis — drive order is critical for reconstruction. Ship the full set of drives together; we don’t need the server or the controller card.

Open a case or call and you’ll reach our engineering team. The initial scoping call covers feasibility, recovery approach, and turnaround — production-critical PERC cases enter the work queue same-day. Recovery is billed on a standard time-and-materials basis.

Open a Dell PERC recovery case →

Or skip the form and call 1-877-624-7206 during business hours (M–F 8 am–7 pm, Sat 10 am–3 pm Central), or schedule a 15-minute consultation with a client advisor.