ServeRAID Data Recovery | IBM M-Series & Lenovo ThinkSystem

If an IBM or Lenovo ServeRAID controller has dropped your array offline, lost a foreign configuration after a controller swap, flagged a VPD mismatch, or stranded drives after a ThinkSystem firmware update, you’ve reached the right team. ServeRAID covers a long history of enterprise RAID hardware that started under IBM, transitioned to Lenovo when IBM sold the x86 server business in 2014, and continues today across the ThinkSystem RAID 430, 530, 730, 930, 940, 4350, 5350, and 9350 generations. The underlying silicon across that history is LSI / Broadcom MegaRAID, which means ServeRAID arrays share a fundamental architectural property with PERC and direct-MegaRAID deployments: array geometry is recorded in SNIA DDF metadata on the drives themselves and can be reconstructed off the original controller entirely. Gillware has operated as a dedicated data recovery laboratory since 2004 from our ISO 5 Class 100 cleanroom in Madison, Wisconsin. ServeRAID cases are scoped at intake by an engineer who has handled the failure mode you’re looking at — not by a generic sales gate. See also our RAID data recovery hub.

Open a ServeRAID recovery case →

How IBM and Lenovo ServeRAID Controllers Work

The ServeRAID brand has changed hands once and gone through several architectural transitions, but the underlying silicon has been LSI / Broadcom MegaRAID throughout the active deployment we see in the lab. IBM sold its x86 server business to Lenovo in October 2014, and Lenovo has been shipping ServeRAID controllers and the ThinkSystem successor line ever since. The deployed fleet today spans three identifiable eras.

IBM-era ServeRAID M-series. The IBM ServeRAID M1015, M1115, M5014, M5015, M5016, M5025, M5110, M5120, M5210, and M5225 form the bulk of the legacy ServeRAID fleet still in production. The M1015 and M1115 are LSI SAS2008-based entry HBAs commonly flashed to IT mode by ZFS and software-RAID communities. The M5014 / M5015 / M5025 are LSI SAS2108-based 6 Gb cards. The M5016 / M5110 / M5120 are LSI SAS2208-based 6 Gb cards with full RAID 5 / 6 / 50 / 60 support, large cache, and battery or supercap protection. The M5210 is the LSI SAS3108-based 12 Gb card, and the M5225 is the SAS3008-based external variant. These cards entered service between roughly 2010 and 2016 and represent the active IBM-branded ServeRAID inventory we see most often.

Lenovo ThinkServer transition. After the 2014 acquisition, Lenovo continued shipping the M-series with the Lenovo brand and IBM-compatible firmware, and gradually transitioned the platform to the ThinkServer naming for non-System-x lines. M5110, M5210, and M5225 controllers from this period carry Lenovo firmware revisions but the same underlying hardware as their IBM-era predecessors.

Lenovo ThinkSystem RAID adapters. The current Lenovo lineup spans entry HBAs through high-end RAID-on-Chip cards: the 430-8i / 430-16i (no RAID 5), the 530-8i (basic RAID), the 730-8i 2GB (mid-range with cache), the 930-4i / 930-8i / 930-8e / 930-16i / 930-24i family (the Gen9-era workhorse, now withdrawn from marketing but still dominant in production), the 940-8i / 940-16i / 940-32i / 940-8e family (current high-end with NVMe support), and the Tri-Mode 4350-8i / 4350-16i (entry), 5350-8i (mid), and 9350-8i / 9350-16i (high-end with supercapacitor). The 940 and 9350 series are the controllers we see most often on cases originating from new deployments, while the 930 series — despite its withdrawn-from-marketing status — represents a substantial active deployment with cases entering the lab at increasing frequency as those controllers approach their second decade of service.

Across all three eras, the silicon is MegaRAID and the on-disk metadata is SNIA DDF written to the trailing sectors of each member drive. The administrative surface includes the LSI Storage Authority (LSA) GUI, StorCLI, the Lenovo-specific MegaCLI and OneCLI utilities, the XCC (XClarity Controller) web interface, and the LXPM (Lenovo XClarity Provisioning Manager) at boot. Cross-tool compatibility with generic Broadcom MegaRAID utilities is generally good on the M-series and 930 generations, with somewhat more constrained out-of-band capability on the 9350 series — Lenovo documents that the 4350, 5350, and 9350 controllers cannot be configured through OneCLI or XCC and require LXPM at boot for configuration changes.

ServeRAID Error Conditions That Lead to Data Loss

Lenovo publishes a large Tech Tips library covering ServeRAID and ThinkSystem RAID issues, and the IBM-era ServeRAID documentation remains in their support archive. The patterns below are the ones that disproportionately end up at our lab — either because they imply data loss in progress, multiple drive failure beyond the array’s redundancy, or a configuration state where the next attempted action commonly destroys the array.

Foreign Configuration on a replacement controller. Most common ServeRAID condition we see. The replacement card — whether IBM-to-IBM, IBM-to-Lenovo, or generation-to-generation within Lenovo — reads DDF metadata from the attached drives that doesn’t match the controller’s current NVRAM. Import promotes the DDF into NVRAM and activates the virtual disk. On a healthy array the import is generally safe. On a degraded array, the import can commit an incorrect topology and force a rebuild against stale parity. Lenovo also documents (Tech Tip HT504684) a specific issue where the “Import foreign configuration” message displays when clearing foreign configuration through XCC — the wording in the dialog is easy to misread and a clear command can be issued accidentally when the operator meant to import.

VPD (Vital Product Data) mismatch after controller swap. IBM and Lenovo controllers carry inventory data — serial numbers, part numbers, machine identification — that the XCC and the host management firmware track. When a replacement controller is installed with different VPD than the original, the XCC may flag a mismatch that prevents firmware updates from running and can cause subsequent boot-time controller initialization issues. The array data on the drives is unaffected by VPD mismatch, but the operational state of the controller subsystem may be, and recovery work needs to factor in any controller-side warnings before drive imaging begins.

Drive uninitialization required on 4350 / 5350 / 9350 series. Lenovo Tech Tip HT512744 documents that physical drives being reused with the ThinkSystem 4350, 5350, and 9350 series adapters need to be uninitialized before they can be added to an array. The uninitialization step is destructive — it removes any pre-existing data and metadata from the drive. When a customer is reusing drives that previously held data they care about (production array migration, drive reuse from a different system), the uninitialization prompt destroys whatever was there. We see this case when a Tri-Mode controller refuses to accept previously-used drives and the operator follows the documented Lenovo procedure without realizing it is destructive.

Slot numbering changes after firmware update on 4350 / 5350 / 9350. Lenovo Tech Tip HT513045 documents that some systems experience incorrect drive slot numbering after updating ThinkSystem 4350, 5350, and 9350 series adapter firmware from version 3.82. Slot numbering is part of how the controller maintains the relationship between physical drives and the virtual disks they belong to; a firmware update that changes the slot map can cause the controller to see drives as foreign or unconfigured. Operators following standard “update firmware to current” guidance can find themselves with an array that no longer appears as expected.

Firmware update breaking VMware ESXi access. Lenovo Tech Tip HT504819 documents that ServeRAID M51xx and LSI9286CV-8E controllers running firmware version 23.34.0.0023 or higher may cause Linux and VMware ESXi install issues, VM start failures, and access-to-datastore problems. This is a documented case where the OEM-recommended firmware update is itself the data-access failure event. The data on the array remains physically intact; the hypervisor can no longer mount it. Rolling firmware back is sometimes effective but not always, and the cases that arrive at our lab are the ones where the rollback failed and the array stayed inaccessible.

Cache I/O policy deprecation on 930 series. Lenovo Tech Tip HT509129 documents that the Cache I/O Policy has been deprecated on ThinkSystem RAID 930 controllers. The deprecation itself doesn’t cause data loss, but it changes default behavior around write-back caching in ways that affect array recovery from unclean shutdown — the controller’s view of what writes were committed versus pending after a power event differs from the array’s actual on-disk state. Recoveries on 930-series arrays that had relied on specific cache policy settings need to account for the firmware-version-dependent behavior.

Multiple drive failure beyond fault tolerance. The same condition that takes down PERC and MegaRAID arrays applies on ServeRAID. The pattern we see most often on the IBM/Lenovo side is the M5210 or 930-8i deployment from 2014-2018 that’s been in service for eight to twelve years with the original drive cohort — same manufacturing lot, same hours on platters, same end-of-life window. First drive fails, rebuild starts, second drive surfaces a media error mid-rebuild that the controller flags as a failure, and the virtual disk drops to Offline. As with PERC and MegaRAID, the surviving drives are often only partially failed at this point and most of the surface is still readable, which is why cleanroom imaging resurrects enough of the second drive to make recovery possible.

Pinned cache states. ServeRAID’s pinned cache behavior mirrors MegaRAID directly — when a virtual disk goes offline or is deleted because of missing physical disks, the controller preserves the dirty cache from that virtual disk. Discarding pinned cache that contains the last writes before failure means those writes are gone for good. Flushing pinned cache against a different set of drives than the cache was generated against writes stale data to wrong logical block addresses and corrupts the file system on top of an otherwise intact array.

Battery and supercap module failures. ServeRAID write-back cache is protected by a battery backup unit on older M-series cards and by supercap modules on the 730, 930, 940, and 9350 generations. Battery learn cycles — routine maintenance — are themselves a window of vulnerability because the controller operates in write-through during the cycle. The pattern we see is sudden file-system corruption layered on top of an otherwise intact array, where the protection module failed during a learn cycle and the operator didn’t notice until a subsequent unclean shutdown left writes stranded.

Mixed-generation adapter conflicts. Lenovo documents (in the SR630 setup guide and elsewhere) that the 4350, 5350, and 9350 SAS/SATA adapters cannot be mixed in the same chassis with the 430, 440, 530, 730, 930, or 940 series. Operators who attempt a mixed-generation deployment can find drives that worked on one controller flagged as foreign or unconfigured by the other, with no clean migration path between them.

Drive isolation causing redundancy mismatch. Lenovo Tech Tip HT512502 documents an “Isolation of drive causes redundancy mismatch” condition reported in the MEL (machine event log). The isolation step — intended to take a marginal drive out of service safely — can leave the array in a state where the controller’s accounting of redundancy doesn’t match the physical disk state, leading to refused mounts on subsequent reboots.

Cross-vendor migration. Drives moved from a ServeRAID to a non-MegaRAID-family controller (HPE Smart Array, Adaptec / SmartRAID) will not import — different metadata formats. Drives moved from PERC or direct-LSI MegaRAID to a ServeRAID controller often do import, because the underlying silicon and DDF format are shared, but the OEM-specific firmware customizations can cause subtle differences in how the import is handled.

Predictive failure cascades. ServeRAID inherits MegaRAID’s media-error tracking and Predictive Failure status flagging. As with PERC, Smart Array, and direct MegaRAID, the drive flagged is often not the source of the underlying problem — errors propagated from a marginal stripe on a neighboring drive end up logged against the drive that performed the read. Drive-replacement cycles that don’t address the underlying media-error condition are a recurring pattern across older ServeRAID deployments, particularly on the M5014 / M5015 / M5110 fleet now approaching end-of-life.

One pattern worth naming separately. The standard support-engineer instruction for several of the conditions above — foreign config import with a degraded array, drive uninitialization on the 4350/5350/9350, firmware update through the version range with documented issues, clearing foreign configurations through XCC — is typically some variant of “follow the documented procedure and continue.” Those procedures work cleanly when the array is healthy underneath. When the underlying condition is multi-drive degradation, an XCC dialog that displayed the wrong message, or a firmware update that lands in HT504819 territory, the same procedures destroy the data the customer called to save. The real decision in front of a downed ServeRAID array is not “Lenovo support versus recovery shop” — it is “execute the documented procedure now and accept whatever happens” versus “image the drives first and recover before any further controller-side action.” A short call with our engineering team scopes which path applies.

How We Recover IBM and Lenovo ServeRAID Arrays

We never operate a failed ServeRAID array during recovery. Running a degraded array during diagnostic work risks pushing the next drive over the edge, triggering an unwanted Patrol Read or Consistency Check, or letting the controller decide on its own to rebuild against the wrong member. Each drive is removed from the chassis, bay positions documented, and imaged on isolated, write-blocked hardware in our cleanroom. SAS and SATA members are imaged through HBAs in IT mode; NVMe members from 940-series and 9350-series arrays are imaged through PCIe interposers on dedicated workstations. Physically damaged drives are repaired with donor parts as needed before imaging — head replacements, PCB swaps, firmware recovery, and platter burnishing where the surface has been damaged. We work from drive images for everything that follows; the originals stay shelved and untouched.

Once we have a verified image of every drive, our reconstruction work begins. HOMBRE — Gillware’s in-house RAID and file-system reconstruction software, built and maintained by the engineers who use it — inspects every single sector of every drive image, identifying SNIA DDF metadata blocks at the tail of each disk and file-system forensic artifacts throughout. That sector-by-sector inspection is the key to rebuilding a ServeRAID array without the original controller. We don’t depend on the controller to tell us what the array looked like; HOMBRE reads it directly from the drives.

Because ServeRAID is MegaRAID silicon throughout the active fleet we see, the reconstruction process is the same proven workflow we run on direct-LSI MegaRAID and Dell PERC cases. HOMBRE locates the DDF Anchor and Header structures in the reserved trailing-sector region of each disk, cross-validates the configuration records across the disk images, and reconstructs the stripe size, member ordering, parity rotation algorithm, and starting LBA offset that the original controller was using. The Lenovo and IBM firmware customizations layered on top of the MegaRAID base affect operational behavior but do not change the on-disk format, which is what reconstruction depends on. Where pinned cache contents matter, those are read out of cache module dumps and evaluated for staleness against the rest of the array state. Where a firmware-update event in the HT504819 or HT513045 window has left the array unreadable to the controller, the on-disk DDF is generally intact and HOMBRE reads it directly.

The engineers running this work see the failure modes catalogued above on a weekly basis. There is no ServeRAID condition on this page that we are encountering for the first time. HOMBRE assembles the array as a virtual volume from the images, and the file-system layer above it — NTFS, ReFS, VMFS, ext4, XFS, ZFS, whatever the array was hosting — is recovered against the assembled volume. The deliverable is a file list and an outcome you can act on, rather than a controller that’s been talked into picking up an array its firmware-version state is fighting against.

Related RAID Recovery Pages

By RAID level: RAID 0 · RAID 1 · RAID 5 · RAID 6 · RAID 10 · RAID puncture. By controller brand: LSI MegaRAID (ServeRAID controllers are built on MegaRAID silicon and the recovery process is the same) · Dell PERC · HPE Smart Array · Adaptec. Return to the RAID data recovery hub for the full overview.

Start Your ServeRAID Recovery

If your ServeRAID array is offline and production data is on it, power the system down before any other action. Do not initialize or uninitialize any of the drives in arcconf, OneCLI, XCC, or LXPM. Do not clear the foreign configuration. Do not discard pinned cache. Do not update the controller firmware to resolve the issue — particularly if the array is on M51xx hardware running anywhere near the HT504819 firmware-version window, or on 4350/5350/9350 hardware in the version range covered by HT513045. Do not accept any rebuild prompt at POST or in LXPM. Label each drive with its bay position before removing it from the chassis. Ship the full set of drives together; we don’t need the server or the controller card.

Open a case or call and you’ll reach our engineering team. The initial scoping call covers feasibility, recovery approach, and turnaround — production-critical ServeRAID cases enter the work queue same-day. Recovery is billed on a standard time-and-materials basis.

Open a ServeRAID recovery case →

Or skip the form and call 1-877-624-7206 during business hours (M–F 8 am–7 pm, Sat 10 am–3 pm Central), or schedule a 15-minute consultation with a client advisor.