ACT knowledge base

10 most recent knowledge base articles

Update Initrd

Have you blacklisted a kernel module, but it’s still showing up at boot? You probably need to update your initrd, a compressed filesystem used to bootstrap the OS. Simply run “dracut –force”, and the initrd will be recreated, taking into account any configuration changes made in your /etc filesystem. Then reboot. Your changes are now …

 

Troubleshooting OpenMPI Invocation Problems

OpenMPI works with a large number of transport mechanisms, from shared memory on the local machine, to IP over Ethernet or even RDMA over InfiniBand. With default settings, when you start your program using mpirun, OpenMPI will choose the best interface available.. Unfortunately, the logic isn’t foolproof, and sometimes you will hit snags and your …

 

Standard Cluster – InfiniBand Networking

This is the InfiniBand configuration for most of the clusters we build.

 

Getting package information

By using the ‘rpm’ command (RPM Package Manager) is is possible to get a lot of information about installed packages on your system. To start, say we want to see if we have a specific package name installed on our system. We can search all the currently installed packages for a package named ‘actutil’ by: …

 

Changing Contents in a File in Every Node

Occasionally you may want to change a a single string inside of a file that is on every compute node. If the file was the same on every node you could change it in one place and then copy it out like so: $ act_cp -g nodes /path/to/file Some config files are unique to each …

 

Checking and Clearing Infiniband Errors

An easy way to check for errors on your entire cluster IB network is to run the command ‘ibcheckerrors.’ This will print any errors that can range from a port being down (even just unplugged temporarily) to transmission errors. After troubleshooting any errors you find, you can clear out the error counters with the command …

 

Use act_locate to identify a node

Most Advanced Clustering chassis are equipped with a large locater LED on the front that can be used to easily identify a node when it’s turned on. If you’re remotely attempting to notify a technician as to which compute node needs work, you can simply run the following command from your head node: $ act_locate …

 

Pinpoint a failed drive in your array

If you see that your LSI RAID array has a failed disk, but you’re not sure which physical disk in the machine it is, use the MegaCli command line utility to flash the drive’s LEDs: Command syntax: MegaCli64 -PdLocate <-start|-stop> -physdrv[<enclosure#>:<disk#>] -a<adapter#> In this example, we will locate disk 0 on adapter 0 (the first …

 

Viewing your system’s event log through IPMI

If your system has IPMI (Intelligent Platform Management Interface), it can be useful to pull its system event log when encountering odd behavior. If you have a cluster installed with our act_utils software tools, you can use the act_ipmi_log command (replace “node01″ with the hostname of the machine you wish to query): $ act_ipmi_log -n …

 

Checking InfiniBand

If one of your machines has an InfiniBand device installed and you want to know what state the device is in, you can use the “ibstat” command. The output of “ibstat” shows a lot of information, but the two main lines you should look at are: State: Active Physical state: LinkUp The “State” line can …

 
Menu

Advanced Clustering Technologies