Expand your knowledge of hardware, software and supercomputing

Replacing an LSI raid disk with MegaCli

If you have identified a failed, or failing disk, it is possible to replace it using the MegaCli utility. In the example below we will cover replacing a failed disk from a raid 5 that has three disks total.

The first thing we want to check is the status of our raid 5.

[root@raid log]# MegaCli64 -ldinfo -lALL -aALL
Adapter 0 — Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
Size : 929.458 GB
Parity Size : 464.729 GB
State : Degraded
Strip Size : 64 KB
Number Of Drives : 3
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk’s Default
Encryption Type : None
Is VD Cached: Yes
Cache Cade Type : Read Only

You can see in the example above that the state of the array is showing up as ‘State : Degraded’. This means that at least one disk has failed, or is not present in the array. Next we will want to look at all of our disks:

[root@raid log]# MegaCli64 -pdlist -aALL

The output of that command is quite long, but in our example it shows three disks and their primary information is:

Enclosure Device ID: 252
Slot Number: 0
Firmware state: Online, Spun Up

Enclosure Device ID: 252
Slot Number: 1
Firmware state: Online, Spun Up

Enclosure Device ID: 252
Slot Number: 2
Firmware state: Online, Spun Up

Enclosure Device ID: 252
Slot Number: 3
Firmware state: Offline

In our example the failed disk is shown as ‘Enclosure Device ID:252′ and ‘Slot Number: 3′. So for MegaCli syntax this drive will be reference as [252:3] in the examples below. Now that we know the EIDs and slot numbers of each of the drives we can go ahead and remove the failed drive.

1) First we set the original disk offline if an error has not already cause the controller to set it offline

[root@raid log]# MegaCli64 -pdoffline -physdrv[252:3] -a0

Adapter: 0: EnclId-252 SlotId-3 state changed to OffLine.

Exit Code: 0x00

2) Mark the failed disk as missing

[root@raid log]# MegaCli64 -pdmarkmissing -physdrv[252:3] -aAll

EnclId-252 SlotId-3 is marked Missing.

Exit Code: 0x00

3) Mark the failed disk as prepared for removal

[root@raid log]# MegaCli64 -pdprprmv -physdrv[252:3] -a0

Prepare for removal Success

Exit Code: 0x00

4) Now you can go replace the faulty disk, it might help to use the hdd identify command to locate the disk

[root@raid log]# MegaCli64 -pdlocate -start -physdrv[252:3] -a0

Adapter: 0: Device at EnclId-252 SlotId-3 — PD Locate Start Command was successfully sent to Firmware

Exit Code: 0x00

*** Step 5 has two options below ***

5) If you use hot spares and the original hot spare was already put into the raid array, set the new disk to replace the hot spare that just went into service

# MegaCli64 -PDHSP -Set -PhysDrv[<enclosure#>:<disk#>] -a<adapter#>

5) If you don’t use hot spares you will need to add the disk to the array and start the rebuild manually

# MegaCli64 -PdReplaceMissing -PhysDrv[252:3] -Array0 -row0 -a0
# MegaCli64 -PDRbld -Start -PhysDrv[252:3] -a0

6) Optional: We can watch the rebuild progress. Depending on the size of the array this may take a considerable amount of time. Also the raid array is usable during this time, but you can expect to encounter performance hits while the raid array is rebuilding.

# MegaCli64 -PDRbld -ShowProg -PhysDrv[252:3] -a0

Use our Breakin stress test and diagnostics tool to pinpoint hardware issues and component failures.
Check out our product catalog and use our Configurator to plan your next system and get a price estimate.

Request a Consultation from our team of HPC Experts

Would you like to speak to one of our HPC experts? We are here to help you. Submit your details, and we'll be in touch shortly.

  • This field is for validation purposes and should be left unchanged.