Expand your knowledge of hardware, software and supercomputing

How to identify and prevent overheating

How to identify and prevent overheating Symptoms of Overheating Turning off on its own Freezing Frequent memory errors Most commonly a computer that is overheating will turn off unexpectedly, and repeat the behavior shortly after being turned back on. What causes this behavior is that the CPU temperatures are always monitored and the system will […]

Identifying Issues with Network Connectivity

Network connectivity can cover many different areas, and diagnosing which area your problem lays in is the first step to fixing the problem. Below we will cover multiple steps for identifying a problem. Verify connections and LEDs Verify that the network cable is properly connected to the back of the computer and at the switch. […]

Standard Cluster – InfiniBand Networking

This is the InfiniBand configuration for most of the HPC clusters we build.

How do I rack a .5U blade or a 2U Flex Chassis?

1U blade or 2U Flex Chassis installation & removal PLEASE NOTE: The pictorial illustrations in this FAQ show a 2U Flex chassis, however the same procedures are applicable to the 1U blade except for the fact that the 1U chassis is 1U shorter in height, uses a different size rear mounting bracket, and has fewer […]

Checking InfiniBand

If one of your machines has an InfiniBand device installed and you want to know what state the device is in, you can use the “ibstat” command. The output of “ibstat” shows a lot of information, but the two main lines you should look at are: State: Active Physical state: LinkUp The “State” line can […]

Installing NVIDIA Drivers on RHEL or CentOS 7

Most users of NVIDIA graphics cards prefer to use the drivers provided by NVIDIA. These more fully support the capabilities of the card when compared to the nouveau driver that is included with the distribution. These are the steps to install the NVIDIA driver and disable the nouveau driver. Prepare your machine yum -y update yum […]

Checking and Clearing InfiniBand Errors

An easy way to check for errors on your entire cluster IB network is to run the command ‘ibcheckerrors.’ This will print any errors that can range from a port being down (even just unplugged temporarily) to transmission errors. After troubleshooting any errors you find, you can clear out the error counters with the command […]

Replacing an LSI raid card with a pre-configured raid array

Newer LSI raid cards (depending on their current firmware version it seems) will auto-import raid configurations from previous raid cards. However on older cards you have to import the disks ‘foreign’ configuration. In order to check if your raid array was automatically imported by your new raid card you can run the following command: $ MegaCli64 […]

Create a raid array with MegaCli64

Note: The following is assuming that you have attached new drives to a newly installed LSI raid controller. The first thing to do is to get a list of all the drives attached to the raid controller. The way the LSI raid controllers identify/label their attached disks is by an ‘Enclosure ID’ and the drive […]

How to expand an existing LSI raid array using MegaCli

Warning: You should ALWAYS make a backup of all of your information on the raid array before performing any of these steps. The exact commands to do this vary on your current configuration and number of disks in the raid. Before adding in the disks you need to get a feel for your current setup by […]

Use our Breakin stress test and diagnostics tool to pinpoint hardware issues and component failures.
Check out our product catalog and use our Configurator to plan your next system and get a price estimate.

Request a Consultation from our team of HPC Experts

Would you like to speak to one of our HPC experts? We are here to help you. Submit your details, and we'll be in touch shortly.

  • This field is for validation purposes and should be left unchanged.