Tech Support Advisory: Read Before Updating Kernel to 7.5

Posted on August 27, 2018
When updating the kernel to 3.10.0-862.11.1.el7 through 3.10.0-8.62.11.6.el7, there’s a bug where InfiniBand or Omni-Path cards do not come back up after a kernel update (i.e. they cannot communicate with the rest of the IB/OPA network). This will be accompanied by the message “failed to modify QP to RTR: -22” in either dmesg or /var/log/messages.

Diagnostic Steps

Run a RDMA write bandwidth test. ib_write_bw is provided by the package perftest. On target node run :

$ ib_write_bw

On client side run :

$ ib_write_bw <target-IP>

If the test should fails (with the “failed to modify QP to RTR” message), the node is affected by the kernel bug.

Temporary Fixes

While the next kernel version should have the long-term fix for this by patching the bug that causes this issue, the issue can be temporarily resolved until then.

If you have not yet updated to 7.5, you can prevent yum from using the kernel with the bug by running:

echo "exclude=kernel*-3.10.0-862.11.*" >> /etc/yum.conf

You can then run yum update and instead it will use 3.10.0-862.9.1, which we have found still works with IB/OPA. If you ever want to update to kernel 3.10.0-862.11.* in the future (e.g. if the fixed patch is in 862.11.7), just remove the exclude line from /etc/yum.conf and you can update the kernel to 3.10.0-862.11.* then.

If you’ve already updated to 7.5 and are experiencing this issue, you can get IB/OPA working again by booting from the older kernel on your system you had before the update (i.e. any kernel on your system older than 862.11.1). This can be done either interactively through the GRUB menu on boot, or by modifying the /etc/default/grub file to change GRUB_DEFAULT from “saved” to the index of the previous kernel version (typically, this will be “1“). You can verify which index to use by running:

awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg

Which should output something similar to:

0 : CentOS Linux (3.10.0-862.11.6.el7.x86_64) 7 (Core)

1 : CentOS Linux (3.10.0-862.9.1.el7.x86_64) 7 (Core)

2 : CentOS Linux (3.10.0-862.el7.x86_64) 7 (Core)

From the example above, the line in /etc/default/grub would be changed to:

GRUB_DEFAULT=1

After which, to rebuild the grub file run:

$ grub2-mkconfig -o /boot/grub2/grub.cfg

Then reboot so that the older kernel version is being used, at this point IB/OPA should be working on the node again.

Additional Information

As of August 21st, 2018:

  • The issue is being tracked with Red Hat’s bugzilla 1619624: Bug 1619624 – [Intel] RC QP failure to modify QP to RTR on -862.11.1 kernel [rhel-7.5.z] (RHEL 7.5.z). As of Tue, August 21 2018, the status of 1619624 is ASSIGNED. An engineer has been assigned to the bug but no patch has been posted that fixes the bug.
  • The issue is being tracked with Red Hat’s bugzilla 1616346: Bug 1616346 – [Intel] RC QP failure to modify QP to RTR on -862.11.1 kernel (RHEL 7). As of Tue, August 21 2018, the status of 1616346 is POST. A patch has been submitted to resolve this issue and is under review for inclusion in the next minor release of RHEL 7.
  • Red Hat Customer Portal Link: https://access.redhat.com/solutions/3568891
  • CentOS Bugzilla Link: https://bugs.centos.org/view.php?id=15193
Download our HPC Pricing Guide
Get our Guide to Grant Writing

Request a Consultation from our team of HPC Experts

Would you like to speak to one of our HPC experts? We are here to help you. Submit your details, and we'll be in touch shortly.

  • This field is for validation purposes and should be left unchanged.