1. 10 Jul, 2019 7 commits
    • Santosh Shilimkar's avatar
      rds: avoid version downgrade to legitimate newer peer connections · dc205a8d
      Santosh Shilimkar authored
      Connections with legitimate tos values can get into usual connection
      race. It can result in consumer reject. We don't want tos value or
      protocol version to be demoted for such connections otherwise
      piers would end up different tos values which can results in
      no connection. Example a peer initiated connection with say
      tos 8 while usual connection racing can get downgraded to tos 0
      which is not desirable.
      
      Patch fixes above issue introduced by commit
      commit d021fabf ("rds: rdma: add consumer reject")
      Reported-by: default avatarYanjun Zhu <yanjun.zhu@oracle.com>
      Tested-by: default avatarYanjun Zhu <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      dc205a8d
    • Gerd Rausch's avatar
      rds: Return proper "tos" value to user-space · fc640d4c
      Gerd Rausch authored
      The proper "tos" value needs to be returned
      to user-space (sockopt RDS_INFO_CONNECTIONS).
      
      Fixes: 3eb45036 ("rds: add type of service(tos) infrastructure")
      Signed-off-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      fc640d4c
    • Gerd Rausch's avatar
      rds: Accept peer connection reject messages due to incompatible version · 8c6166cf
      Gerd Rausch authored
      Prior to
      commit d021fabf ("rds: rdma: add consumer reject")
      
      function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
      connection attempt by issuing a "rds_conn_drop".
      
      The commit mentioned above added a "break", eliminating
      the "fallthrough" case and made the "rds_conn_drop" rather conditional:
      
      Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
      carries an integer-value of "1" inside "private_data":
      
        if (!conn)
          break;
          err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
          if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
            pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
                    &conn->c_laddr, &conn->c_faddr);
                    conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
                    rds_conn_drop(conn);
          }
          rdsdebug("Connection rejected: %s\n",
                   rdma_reject_msg(cm_id, event->status));
          break;
          /* FALLTHROUGH */
      A number of issues are worth mentioning here:
         #1) Previous versions of the RDS code simply rejected a connection
             by calling "rdma_reject(cm_id, NULL, 0);"
             So the value of the payload in "private_data" will not be "1",
             but "0".
      
         #2) Now the code has become dependent on host byte order and sizing.
             If one peer is big-endian, the other is little-endian,
             or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
             the *err check does not work as intended.
      
         #3) There is no check for "len" to see if the data behind *err is even valid.
             Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
             carry 148 bytes of zeroized payload.
             But that should probably not be relied upon here.
      
         #4) With the added "break;",
             we might as well drop the misleading "/* FALLTHROUGH */" comment.
      
      This commit does _not_ address issue #2, as the sender would have to
      agree on a byte order as well.
      
      Here is the sequence of messages in this observed error-scenario:
         Host-A is pre-QoS changes (excluding the commit mentioned above)
         Host-B is post-QoS changes (including the commit mentioned above)
      
         #1 Host-B
            issues a connection request via function "rds_conn_path_transition"
            connection state transitions to "RDS_CONN_CONNECTING"
      
         #2 Host-A
            rejects the incompatible connection request (from #1)
            It does so by calling "rdma_reject(cm_id, NULL, 0);"
      
         #3 Host-B
            receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
            But since the code is changed in the way described above,
            it won't drop the connection here, simply because "*err == 0".
      
         #4 Host-A
            issues a connection request
      
         #5 Host-B
            receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
            and ends up calling "rds_ib_cm_handle_connect".
            But since the state is already in "RDS_CONN_CONNECTING"
            (as of #1) it will end up issuing a "rdma_reject" without
            dropping the connection:
               if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
                   /* Wait and see - our connect may still be succeeding */
                   rds_ib_stats_inc(s_ib_connect_raced);
               }
               goto out;
      
         #6 Host-A
            receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
            drops the connection and tries again (goto #4) until it gives up.
      Tested-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      8c6166cf
    • Gerd Rausch's avatar
      Revert "RDS: IB: split the mr registration and invalidation path" · a5520788
      Gerd Rausch authored
      This reverts commit 56012459.
      
      RDS kept spinning inside function "rds_ib_post_reg_frmr", waiting for
      "i_fastreg_wrs" to become incremented:
               while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
                       atomic_inc(&ibmr->ic->i_fastreg_wrs);
                       cpu_relax();
               }
      
      Looking at the original commit:
      
      commit 56012459 ("RDS: IB: split the mr registration and
      invalidation path")
      
      In there, the "rds_ib_mr_cqe_handler" was changed in the following
      way:
      
       void rds_ib_mr_cqe_handler(struct
       rds_ib_connection *ic,
       struct ib_wc *wc)
              if (frmr->fr_inv) {
                        frmr->fr_state = FRMR_IS_FREE;
                        frmr->fr_inv = false;
                      atomic_inc(&ic->i_fastreg_wrs);
              } else {
                      atomic_inc(&ic->i_fastunreg_wrs);
              }
      
      It looks like it's got it exactly backwards:
      
      Function "rds_ib_post_reg_frmr" keeps track of the outstanding
      requests via "i_fastreg_wrs".
      
      Function "rds_ib_post_inv" keeps track of the outstanding requests
      via "i_fastunreg_wrs" (post original commit). It also sets:
               frmr->fr_inv = true;
      
      However the completion handler "rds_ib_mr_cqe_handler" adjusts
      "i_fastreg_wrs" when "fr_inv" had been true, and adjusts
      "i_fastunreg_wrs" otherwise.
      
      The original commit was done in the name of performance:
      to remove the performance bottleneck
      
      No performance benefit could be observed with a fixed-up version
      of the original commit measured between two Oracle X7 servers,
      both equipped with Mellanox Connect-X5 HCAs.
      
      The prudent course of action is to revert this commit.
      Signed-off-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      a5520788
    • Santosh Shilimkar's avatar
      rds: fix reordering with composite message notification · 616d37a0
      Santosh Shilimkar authored
      RDS composite message(rdma + control) user notification needs to be
      triggered once the full message is delivered and such a fix was
      added as part of commit 941f8d55 ("RDS: RDMA: Fix the composite
      message user notification"). But rds_send_remove_from_sock is missing
      data part notify check and hence at times the user don't get
      notification which isn't desirable.
      
      One way is to fix the rds_send_remove_from_sock to check of that case
      but considering the ordering complexity with completion handler and
      rdma + control messages are always dispatched back to back in same send
      context, just delaying the signaled completion on rmda work request also
      gets the desired behaviour. i.e Notifying application only after
      RDMA + control message send completes. So patch updates the earlier
      fix with this approach. The delay signaling completions of rdma op
      till the control message send completes fix was done by Venkat
      Venkatsubra in downstream kernel.
      Reviewed-and-tested-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      616d37a0
    • Nathan Chancellor's avatar
      net/mlx5e: Return in default case statement in tx_post_resync_params · 1ff2f0fa
      Nathan Chancellor authored
      clang warns:
      
      drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:251:2:
      warning: variable 'rec_seq_sz' is used uninitialized whenever switch
      default is taken [-Wsometimes-uninitialized]
              default:
              ^~~~~~~
      drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:255:46: note:
      uninitialized use occurs here
              skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz);
                                                          ^~~~~~~~~~
      drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:239:16: note:
      initialize the variable 'rec_seq_sz' to silence this warning
              u16 rec_seq_sz;
                            ^
                             = 0
      1 warning generated.
      
      This case statement was clearly designed to be one that should not be
      hit during runtime because of the WARN_ON statement so just return early
      to prevent copying uninitialized memory up into rn_be.
      
      Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
      Link: https://github.com/ClangBuiltLinux/linux/issues/590Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ff2f0fa
    • David S. Miller's avatar
      mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync(). · cacf32e9
      David S. Miller authored
      Return value was changes to 'int' from void but this return statement
      was not updated, or it slipped in via a merge.
      
      Fixes: b5d9a834 ("net/tls: don't clear TX resync flag on error")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cacf32e9
  2. 09 Jul, 2019 33 commits