Managing a cluster might seem like a daunting task, but it can be quite easy with the inclusion of Advanced Clustering's management software packages — free with any Apex HPC cluster purchase — and some of the hardware devices describe below.
IPMI (Intelligent Platform Management Interface) is an open-standard management system designed for remote monitoring and control of servers. IPMI is available as an option on most of Advanced Clustering's Pinnacle Servers.
IPMI works by embedding a small service processor or Baseboard Management Controller (BMC) in the system. The BMC will be powered on and operational as long as the system is plugged into the main electrical power — it operates even when the system is actually turned off, the operating system has crashed, and during most hardware failures.
The BMC can be controlled either in-band via the operating system running on the server or out-of-band via a TCP/IP network connection. In a cluster environment, the out-of-band management functionality is especially helpful. The out-of-band management allows your system administrator to control all nodes in the cluster from a central point. The admin would have the option of checking fan, temperature, and power supply voltage sensor data, powering on or off a system, or even connecting to the console of the machine.
Serial consoles are another hardware device that can be used for out-of-band management and allow you to connect to the console of each node from anywhere you allow access. This kind of access can be a real time-saver when your cluster is located in the data-center down the hall or across the world. Most of Advanced Clustering's compute nodes allow for serial BIOS redirection, so you can monitor or change any board level setting without being in front of the machine. When you purchase a serial console as part of your cluster, Advanced Clustering will set up and enable serial re-direction of the BIOS, boot-loader, and operating system — giving you complete remote control over your entire system.
A KVM (Keyboard, Video, Mouse) switch is a hardware device that allows a user to control multiple systems from a single keyboard, video monitor and mouse.
Each system is connected to the KVM device via a dedicated cable. Control between systems can be achieved by pressing buttons on the KVM device or via hotkeys on the keyboard (often combinations of CTRL or SCROLL LOCK). Many KVM options are available that allow as few as two and up to hundreds of computers, thus making them suitable for most any size cluster.
Although KVMs are useful management devices, they do have some limitations. Access is limited to only a few feet away from the cluster without the additional KVM over IP capabilities, and they typically only allow for one console access at a time. Some enterprise models do offer multiple console access, but they can be quite expensive.
Network Controlled PDU
To allow administrators complete management over their clusters, we recommend a remote power control device. These stand-alone devices are a combination power controller and power distribution unit. Through extensive testing, we've found APC's line of MasterSwitch devices to provide the best feature set, and they are available in multiple configurations to meet the needs of any datacenter.
Because most clusters larger than a few nodes in size would require more than one power control device, Advanced Clustering's Beo Utils package is included with all cluster purchases to make using these devices easier. Instead of remembering which outlet and on which device a particular node is plugged into, you can use a simple command-line tool to turn on, off, or hardware reboot a node if you know its hostname.