In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix the recovery flow of the UMR QP This patch addresses an issue in the recovery flow of the UMR QP, ensuring tasks do not get stuck, as highlighted by the call trace [1]. During recovery, before transitioning the QP to the RESET state, the software must wait for all outstanding WRs to complete. Failing to do so can cause the firmware to skip sending some flushed CQEs with errors and simply discard them upon the RESET, as per the IB specification. This race condition can result in lost CQEs and tasks becoming stuck. To resolve this, the patch sends a final WR which serves only as a barrier before moving the QP state to RESET. Once a CQE is received for that final WR, it guarantees that no outstanding WRs remain, making it safe to transition the QP to RESET and subsequently back to RTS, restoring proper functionality. Note: For the barrier WR, we simply reuse the failed and ready WR. Since the QP is in an error state, it will only receive IB_WC_WR_FLUSH_ERR. However, as it serves only as a barrier we don't care about its status. [1] INFO: task rdma_resource_l:1922 blocked for more than 120 seconds. Tainted: G W 6.12.0-rc7+ #1626 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:rdma_resource_l state:D stack:0 pid:1922 tgid:1922 ppid:1369 flags:0x00004004 Call Trace: <TASK> __schedule+0x420/0xd30 schedule+0x47/0x130 schedule_timeout+0x280/0x300 ? mark_held_locks+0x48/0x80 ? lockdep_hardirqs_on_prepare+0xe5/0x1a0 wait_for_completion+0x75/0x130 mlx5r_umr_post_send_wait+0x3c2/0x5b0 [mlx5_ib] ? __pfx_mlx5r_umr_done+0x10/0x10 [mlx5_ib] mlx5r_umr_revoke_mr+0x93/0xc0 [mlx5_ib] __mlx5_ib_dereg_mr+0x299/0x520 [mlx5_ib] ? _raw_spin_unlock_irq+0x24/0x40 ? wait_for_completion+0xfe/0x130 ? rdma_restrack_put+0x63/0xe0 [ib_core] ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs]...
4.13.0-16.193.11.0-12.195.3.0-18.195.3.0-24.265.4.0-9.125.13.0-19.194.2.0-16.194.2.0-17.214.2.0-19.234.3.0-1.104.3.0-2.114.3.0-5.164.3.0-6.174.3.0-7.184.4.0-2.166.5.0-9.96.6.0-14.146.8.0-11.116.8.0-20.206.8.0-22.226.8.0-28.286.8.0-31.316.8.0-35.356.8.0-36.366.8.0-38.38+21 more6.11.0-8.86.12.0-12.126.12.0-15.156.12.0-16.166.14.0-7.76.14.0-10.105.19.0-1007.7~22.04.15.19.0-1009.9~22.04.15.19.0-1010.10~22.04.15.19.0-1011.11~22.04.15.19.0-1012.12~22.04.15.19.0-1013.13~22.04.15.19.0-1014.14~22.04.15.19.0-1015.15~22.04.16.11.0-1004.46.12.0-1002.26.12.0-1004.46.14.0-1003.36.14.0-1004.46.5.0-1008.86.6.0-1001.16.8.0-1001.16.8.0-1006.66.8.0-1008.86.8.0-1009.96.8.0-1010.106.8.0-1011.126.8.0-1012.136.8.0-1013.14+17 more