We recommend placing orders as soon as possible to minimize wait times and price increases caused by global supply chain issues.

Expand your knowledge of hardware, software and supercomputing

Tech Support Advisory: Yum updates fail from slurm package conflicts

When performing a yum update or dnf update on your system, the update may fail with messages about conflicts between Slurm packages. This is caused by the addition of new Slurm packages in upstream repos that collide with custom packages installed by ACT.

The errors may look like some of the following:

Transaction check error:
file /usr/share/man/man3/Slurm.3pm.gz from install of slurm-perlapi-20.11.2-2.el7.x86_64 conflicts with file from package slurm-18.08.9-1.el7.x86_64
file /usr/share/man/man3/Slurm::Bitstr.3pm.gz from install of slurm-perlapi-20.11.2-2.el7.x86_64 conflicts with file from package slurm-18.08.9-1.el7.x86_64

or

Problem: cannot install both slurm-20.11.2-2.el8.x86_64 and slurm-20.02.3-1.el8.x86_64
package slurm-libpmi-20.02.3-1.el8.x86_64 requires libslurmfull.so()(64bit), but none of the providers can be installed
package slurm-libpmi-20.02.3-1.el8.x86_64 requires slurm(x86-64) = 20.02.3-1.el8, but none of the providers can be installed
cannot install the best update candidate for package slurm-20.02.3-1.el8.x86_64
problem with installed package slurm-libpmi-20.02.3-1.el8.x86_64

To work around this problem, we are recommending our customers who want to continue to have ACT’s Slurm integration exclude Slurm from updates from repos by excluding it in the repo configs. This can be done by running the following command as root:

echo "exclude=slurm*" >> /etc/yum.conf

Once excluded, your yum/dnf updates will work again.

Background/Cause
In response to a bug report[1], On January 23rd 2021, a new package, slurm-20.11.2-2, was added to the EPEL 7[2] and 8[3] repositories. The Slurm packages provided by ACT are built on your system from source downloaded from SchedMD, the developers of Slurm. The primary package name, slurm, is the same for both, therefore the package manager attempts to install the prevailing version. The structure of the packages are actually different, however, and the package manager fails to keep things safe.

Keeping ACT version of Slurm
We haven’t fully evaluated the new package provided by EPEL, so we can’t recommend that anybody switch to them at this time unless the user is confident in Slurm administration and is fully willing to handle management on their own. There are a few concerns we have with using the EPEL provided version and will take a closer look at whether or not using that version is worthwhile.

Some of the configuration choices we’ve made with Slurm are targeted for the majority of our customer base. For instance, the slurm configuration files reside in /opt/slurm (/act/slurm on EL 7 installations) which is NFS mounted across the cluster. With this, only a single file modification is instantly visible to all nodes, making management simpler and less error prone.

Another concern is how the repo provided Slurm package manages updates. Minor updates within major Slurm versions are relatively safe to perform. However, major version updates require more scrutiny. In the past, these updates have required backing up and upgrading the backend database (MariaDB), or added or deprecated settings in the config. Every Slurm update should begin with a review of their documentation, so we recommend against a blind update from repo unless appropriate safeguards are added to the RPMs. See also the “Upgrade” section of Slurm’s Quick Start Administration Guide[4] for more detail.

Contact your support team at Advanced Clustering Technologies if you would like to discuss Slurm packages and managing updates.

[1] – https://bugzilla.redhat.com/show_bug.cgi?id=1912491
[2] – https://lists.fedoraproject.org/archives/list/epel-package-announce@lists.fedoraproject.org/message/2NOE6TYZKUCRYEY4Z754IGHAORRZG6SC/
[3] – https://lists.fedoraproject.org/archives/list/epel-package-announce@lists.fedoraproject.org/message/KPO7IONLWK627CTAG2JWBEIOE7LCK7BQ/
[4] – https://slurm.schedmd.com/quickstart_admin.html

Use our Breakin stress test and diagnostics tool to pinpoint hardware issues and component failures.
Check out our product catalog and use our Configurator to plan your next system and get a price estimate.

Request a Consultation from our team of HPC Experts

Would you like to speak to one of our HPC experts? We are here to help you. Submit your details, and we'll be in touch shortly.

  • This field is for validation purposes and should be left unchanged.