Expand your knowledge of hardware, software and supercomputing

Replacing an LSI raid disk with MegaCli

If you have identified a failed, or failing disk, it is possible to replace it using the MegaCli utility. In the example below we will cover replacing a failed disk from a raid 5 that has three disks total.

The first thing we want to check is the status of our raid 5.

[root@raid log]# MegaCli64 -ldinfo -lALL -aALL
 Adapter 0 — Virtual Drive Information:
 Virtual Drive: 0 (Target Id: 0)
 Name :
 RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
 Size : 929.458 GB
 Parity Size : 464.729 GB
 State : Degraded
 Strip Size : 64 KB
 Number Of Drives : 3
 Span Depth : 1
 Default Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
 Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
 Default Access Policy: Read/Write
 Current Access Policy: Read/Write
 Disk Cache Policy : Disk’s Default
 Encryption Type : None
 Is VD Cached: Yes
 Cache Cade Type : Read Only

You can see in the example above that the state of the array is showing up as ‘State : Degraded’. This means that at least one disk has failed, or is not present in the array. Next we will want to look at all of our disks:

[root@raid log]# MegaCli64 -pdlist -aALL

The output of that command is quite long, but in our example it shows three disks and their primary information is:

Enclosure Device ID: 252
 Slot Number: 0
 ….
 Firmware state: Online, Spun Up

Enclosure Device ID: 252
 Slot Number: 1
 ….
 Firmware state: Online, Spun Up

Enclosure Device ID: 252
 Slot Number: 2
 ….
 Firmware state: Online, Spun Up

Enclosure Device ID: 252
 Slot Number: 3
 ….
 Firmware state: Offline

In our example the failed disk is shown as ‘Enclosure Device ID:252′ and ‘Slot Number: 3′. So for MegaCli syntax this drive will be reference as [252:3] in the examples below. Now that we know the EIDs and slot numbers of each of the drives we can go ahead and remove the failed drive.

1) First we set the original disk offline if an error has not already cause the controller to set it offline

[root@raid log]# MegaCli64 -pdoffline -physdrv[252:3] -a0
Adapter: 0: EnclId-252 SlotId-3 state changed to OffLine.
Exit Code: 0x00

2) Mark the failed disk as missing

[root@raid log]# MegaCli64 -pdmarkmissing -physdrv[252:3] -aAll

EnclId-252 SlotId-3 is marked Missing.

Exit Code: 0x00

3) Mark the failed disk as prepared for removal

[root@raid log]# MegaCli64 -pdprprmv -physdrv[252:3] -a0
Prepare for removal Success
Exit Code: 0x00

4) Now you can go replace the faulty disk, it might help to use the hdd identify command to locate the disk

[root@raid log]# MegaCli64 -pdlocate -start -physdrv[252:3] -a0
Adapter: 0: Device at EnclId-252 SlotId-3 — PD Locate Start Command was successfully sent to Firmware
Exit Code: 0x00

Step 5 has two options below

5a) If you use hot spares and the original hot spare was already put into the raid array, set the new disk to replace the hot spare that just went into service

# MegaCli64 -PDHSP -Set -PhysDrv[<enclosure#>:<disk#>] -a<adapter#>

5b) If you don’t use hot spares you will need to add the disk to the array and start the rebuild manually

# MegaCli64 -PdReplaceMissing -PhysDrv[252:3] -Array0 -row0 -a0
# MegaCli64 -PDRbld -Start -PhysDrv[252:3] -a0

6) Optional: We can watch the rebuild progress. Depending on the size of the array this may take a considerable amount of time. Also the raid array is usable during this time, but you can expect to encounter performance hits while the raid array is rebuilding.

# MegaCli64 -PDRbld -ShowProg -PhysDrv[252:3] -a0
Use our Breakin stress test and diagnostics tool to pinpoint hardware issues and component failures.
Check out our product catalog and use our Configurator to plan your next system and get a price estimate.

Request a Consultation from our team of HPC and AI Experts

Would you like to speak to one of our HPC or AI experts? We are here to help you. Submit your details, and we'll be in touch shortly.

  • This field is for validation purposes and should be left unchanged.