Run a RDMA write bandwidth test. ib_write_bw is provided by the package perftest. On target node run :
On client side run :
If the test should fails (with the “failed to modify QP to RTR” message), the node is affected by the kernel bug.
While the next kernel version should have the long-term fix for this by patching the bug that causes this issue, the issue can be temporarily resolved until then.
If you have not yet updated to 7.5, you can prevent yum from using the kernel with the bug by running:
You can then run yum update and instead it will use 3.10.0-862.9.1, which we have found still works with IB/OPA. If you ever want to update to kernel 3.10.0-862.11.* in the future (e.g. if the fixed patch is in 862.11.7), just remove the exclude line from /etc/yum.conf and you can update the kernel to 3.10.0-862.11.* then.
If you’ve already updated to 7.5 and are experiencing this issue, you can get IB/OPA working again by booting from the older kernel on your system you had before the update (i.e. any kernel on your system older than 862.11.1). This can be done either interactively through the GRUB menu on boot, or by modifying the /etc/default/grub file to change GRUB_DEFAULT from “saved” to the index of the previous kernel version (typically, this will be “1“). You can verify which index to use by running:
Which should output something similar to:
0 : CentOS Linux (3.10.0-862.11.6.el7.x86_64) 7 (Core)
1 : CentOS Linux (3.10.0-862.9.1.el7.x86_64) 7 (Core)
2 : CentOS Linux (3.10.0-862.el7.x86_64) 7 (Core)
From the example above, the line in /etc/default/grub would be changed to:
After which, to rebuild the grub file run:
Then reboot so that the older kernel version is being used, at this point IB/OPA should be working on the node again.
As of August 21st, 2018:
- The issue is being tracked with Red Hat’s bugzilla 1619624: Bug 1619624 – [Intel] RC QP failure to modify QP to RTR on -862.11.1 kernel [rhel-7.5.z] (RHEL 7.5.z). As of Tue, August 21 2018, the status of 1619624 is ASSIGNED. An engineer has been assigned to the bug but no patch has been posted that fixes the bug.
- The issue is being tracked with Red Hat’s bugzilla 1616346: Bug 1616346 – [Intel] RC QP failure to modify QP to RTR on -862.11.1 kernel (RHEL 7). As of Tue, August 21 2018, the status of 1616346 is POST. A patch has been submitted to resolve this issue and is under review for inclusion in the next minor release of RHEL 7.
- Red Hat Customer Portal Link: https://access.redhat.com/solutions/3568891
- CentOS Bugzilla Link: https://bugs.centos.org/view.php?id=15193