Expand your knowledge of hardware, software and supercomputing

How to locate a physical disk in an LSI raid array

The MegaCli command line utility can be used to locate a physical disk in an LSI raid array by blinking the disks activity LED. The blinking will continue until directed to stop. Syntax: MegaCli64 -PdLocate <-start|-stop> -physdrv[<enclosure#>:<disk#>] -a<adapter#> In this example we will locate disk 0 on adapter 0: [root@localhost MegaCli]# ./MegaCli64 -PdLocate -start -physdrv[252:0] […]

RAM – Checking for errors

Run BreakIn It can be difficult to tell if a memory error is related to hardware or software. To help determine this we suggest running the ACT breakin utility to remove any possibility of software related errors. Breakin for compute nodes Breakin for head nodes and CentOS work stations Run memtest86+ memtest86+ is a free utility […]

Repairing a corrupted SGE database

Note: Understanding the cause of sgemaster failing to start is important.  Before running these steps, there should be some indication of a database corruption issue in the logs.  These logs are located in /act/sge/default/spool/qmaster/messages.  A typical corruption error message may look like this: 03/07/2015 17:34:07| main|head|E|couldn’t open berkeley database “sge”: (22) Invalid argument 03/07/2015 17:34:07| […]

Using the ACT Yum Repo

Advanced Clustering Technologies maintains a software repository called actrepo for our ACT Utilities and other commonly used cluster software. To access the ACT yum repo, install actrepo RPM with these commands: CentOS 5 $ rpm -Uvh http://lab.advancedclustering.com/yum/centos5/actrepo-1.0-centos5.noarch.rpm CentOS 6 $ rpm -Uhv http://lab.advancedclustering.com/yum/centos6/actrepo-1.0-centos6.noarch.rpm CentOS 7 $ yum -y install http://lab.advancedclustering.com/yum/actel7/actrepo-7.0-el7.noarch.rpm

An Easier Way to Back Up Your HPC Cluster

Last month we reviewed the importance of making backups. Perhaps the simplest form of backup can occur by taking an image of the head node. Today, Advanced Clustering Technologies releases an update to the Cloner utility that makes this a whole lot easier.  The new cloner_usb command will create a bootable USB key which can restore […]

Installing Libraries for Python Outside of System Directories

Python is being used more frequently in HPC applications. Whether a job is being run by the scheduler or pre/post-processing on login nodes, there’s a chance you may run into it. With Python comes the need for libraries. Installing the libraries in system directories normally isn’t possible, but there is a good solution for that. […]

Taking Compute Nodes Down for Maintenance

When taking your compute nodes down for any reason, it’s good to take that node out of any job queues in which it may be a member. Nodes coming up temporarily may start new jobs, only to be shut down again, killing the user’s job. Here’s how to safely pull a node out of service […]

Pinpoint a failed drive in your array

If you see that your LSI RAID array has a failed disk, but you’re not sure which physical disk in the machine it is, use the MegaCli command line utility to flash the drive’s LEDs: Command syntax: MegaCli64 -PdLocate <-start|-stop> -physdrv[<enclosure#>:<disk#>] -a<adapter#> In this example, we will locate disk 0 on adapter 0 (the first […]

Getting package information

By using the ‘rpm’ command (RPM Package Manager) is is possible to get a lot of information about installed packages on your system. To start, say we want to see if we have a specific package name installed on our system. We can search all the currently installed packages for a package named ‘actutil’ by: […]

Viewing your system’s event log through IPMI

If your system has IPMI (Intelligent Platform Management Interface), it can be useful to pull its system event log when encountering odd behavior. If you have a cluster installed with our act_utils software tools, you can use the act_ipmi_log command (replace “node01″ with the hostname of the machine you wish to query): $ act_ipmi_log -n […]

Use our Breakin stress test and diagnostics tool to pinpoint hardware issues and component failures.
Check out our product catalog and use our Configurator to plan your next system and get a price estimate.

Request a Consultation from our team of HPC Experts

Would you like to speak to one of our HPC experts? We are here to help you. Submit your details, and we'll be in touch shortly.

  • This field is for validation purposes and should be left unchanged.