ACT knowledge base

10 most recent knowledge base articles

Repairing a corrupted SGE database

If your filesystem ever fills up or the system crashes as the wrong time, your SGE database may get corrupted.  Here are steps that can usually repair the database so SGE will run properly again. cd $SGE_ROOT/default/spool cp -a spooldb spooldb.bak cd spooldb  db_verify sge db_recover db_dump -f sge.out sge mv sge sge.old db_load -f …

 

Tech tip: An Easier Way to Back Up Your HPC Cluster

Last month we reviewed the importance of making backups. Perhaps the simplest form of backup can occur by taking an image of the head node. Today, Advanced Clustering Technologies releases an update to the Cloner utility that makes this a whole lot easier.  The new cloner_usb command will create a bootable USB key which can restore …

 

Tech tip: Backing Up Your HPC Cluster

With World Backup Day just around the corner on March 31st, now is a great time to talk about backing up your cluster.  Sometimes overlooked, making backups of your data is important for any cluster owner.  Here are some tips to put you in the right direction for backup planning. Data on RAID 6 or mirrored …

 

Tech Tip: Taking Compute Nodes Down for Maintenance

When taking your compute nodes down for any reason, it’s good to take that node out of any job queues in which it may be a member. Nodes coming up temporarily may start new jobs, only to be shut down again, killing the user’s job. Here’s how to safely pull a node out of service …

 

Tech Tip: Installing Libraries for Python Outside of System Directories

  Python is being used more frequently in HPC applications. Whether a job is being run by the scheduler or pre/post-processing on login nodes, there’s a chance you may run into it. With Python comes the need for libraries. Installing the libraries in system directories normally isn’t possible, but there is a good solution for …

 

Keep an Eye on Your RAID Status

Our customers frequently order systems with two hard drives to hold a RAID 1 volume mirroring the OS filesystems. This is done with Linux software RAID, and it’s important to periodically check the health of the drives. To do this, run cat /proc/mdstat. If all volume members are working properly, you should see [UU]. For …

 

Using VNC to Speed Up Slow X-forwarded Sessions

Most of you know that you can use X-forwarding built into SSH to run a graphical application on a remote host: laptop$ ssh -X head.mycluster head$ firefox & (Firefox session displays on your laptop, running on the remote host) But sometimes these programs run very slowly over the network. Firefox can be slow to render, …

 

Use Screen to Run Long Processes

Tech TipScreen is a Linux utility that allows you to run multiple terminals all within a single terminal window manager. It can be used for many things and greatly increases workflow. Screen enables you to run your long scripts/processes within a screen session. If you want to execute a script that generally takes a very …

 

Keeping the Shell from Hanging Up After You Log Out or Disconnect

Have you ever started running something on a remote machine, only to realize that it won’t complete before you need to close your SSH connection? Running a screen session is nice, but what do you do if you didn’t start one? Have no fear – we can keep the shell from hanging up the job …

 

Update Initrd

Have you blacklisted a kernel module, but it’s still showing up at boot? You probably need to update your initrd, a compressed filesystem used to bootstrap the OS. Simply run “dracut –force”, and the initrd will be recreated, taking into account any configuration changes made in your /etc filesystem. Then reboot. Your changes are now …

 
Menu

Advanced Clustering Technologies