1. 07 Mar, 2016 17 commits
  2. 06 Mar, 2016 8 commits
    • Nicholas Bellinger's avatar
      target: Fix linux-4.1.y specific compile warning · e94991eb
      Nicholas Bellinger authored
      The linux-4.1.y specific patch to fix a previous v4.1 UNIT_ATTENTION
      read-copy-update conversion regression:
      
        commit 35afa656
        Author: Nicholas Bellinger <nab@linux-iscsi.org>
        Date:   Wed Sep 23 07:49:26 2015 +0000
      
            target: Fix v4.1 UNIT_ATTENTION se_node_acl->device_list[] NULL pointer
      
      introduced the following compile warning:
      
        drivers/target/target_core_pr.c: In function ‘core_scsi3_pr_seq_non_holder’:
        drivers/target/target_core_pr.c:332:3: warning: ‘return’ with no value, in function returning non-void [-Wreturn-type]
      
      Go ahead and fix this up to always returning zero when no ACL
      device list exists within core_scsi3_pr_seq_non_holder().
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e94991eb
    • Nicholas Bellinger's avatar
      target: Fix race with SCF_SEND_DELAYED_TAS handling · 4aa04a25
      Nicholas Bellinger authored
      commit 310d3d31 upstream.
      
      This patch fixes a race between setting of SCF_SEND_DELAYED_TAS
      in transport_send_task_abort(), and check of the same bit in
      transport_check_aborted_status().
      
      It adds a __transport_check_aborted_status() version that is
      used by target_execute_cmd() when se_cmd->t_state_lock is
      held, and a transport_check_aborted_status() wrapper for
      all other existing callers.
      
      Also, it handles the case where the check happens before
      transport_send_task_abort() gets called.  For this, go
      ahead and set SCF_SEND_DELAYED_TAS early when necessary,
      and have transport_send_task_abort() send the abort.
      
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4aa04a25
    • Nicholas Bellinger's avatar
      target: Fix remote-port TMR ABORT + se_cmd fabric stop · 08367c9e
      Nicholas Bellinger authored
      commit 0f4a9431 upstream.
      
      To address the bug where fabric driver level shutdown
      of se_cmd occurs at the same time when TMR CMD_T_ABORTED
      is happening resulting in a -1 ->cmd_kref, this patch
      adds a CMD_T_FABRIC_STOP bit that is used to determine
      when TMR + driver I_T nexus shutdown is happening
      concurrently.
      
      It changes target_sess_cmd_list_set_waiting() to obtain
      se_cmd->cmd_kref + set CMD_T_FABRIC_STOP, and drop local
      reference in target_wait_for_sess_cmds() and invoke extra
      target_put_sess_cmd() during Task Aborted Status (TAS)
      when necessary.
      
      Also, it adds a new target_wait_free_cmd() wrapper around
      transport_wait_for_tasks() for the special case within
      transport_generic_free_cmd() to set CMD_T_FABRIC_STOP,
      and is now aware of CMD_T_ABORTED + CMD_T_TAS status
      bits to know when an extra transport_put_cmd() during
      TAS is required.
      
      Note transport_generic_free_cmd() is expected to block on
      cmd->cmd_wait_comp in order to follow what iscsi-target
      expects during iscsi_conn context se_cmd shutdown.
      
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@daterainc.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      08367c9e
    • Nicholas Bellinger's avatar
      target: Fix TAS handling for multi-session se_node_acls · 29190c77
      Nicholas Bellinger authored
      commit ebde1ca5 upstream.
      
      This patch fixes a bug in TMR task aborted status (TAS)
      handling when multiple sessions are connected to the
      same target WWPN endpoint and se_node_acl descriptor,
      resulting in TASK_ABORTED status to not be generated
      for aborted se_cmds on the remote port.
      
      This is due to core_tmr_handle_tas_abort() incorrectly
      comparing se_node_acl instead of se_session, for which
      the multi-session case is expected to be sharing the
      same se_node_acl.
      
      Instead, go ahead and update core_tmr_handle_tas_abort()
      to compare tmr_sess + cmd->se_sess in order to determine
      if the LUN_RESET was received on a different I_T nexus,
      and TASK_ABORTED status response needs to be generated.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      29190c77
    • Nicholas Bellinger's avatar
      target: Fix LUN_RESET active I/O handling for ACK_KREF · 3286a2fe
      Nicholas Bellinger authored
      commit febe562c upstream.
      
      This patch fixes a NULL pointer se_cmd->cmd_kref < 0
      refcount bug during TMR LUN_RESET with active se_cmd
      I/O, that can be triggered during se_cmd descriptor
      shutdown + release via core_tmr_drain_state_list() code.
      
      To address this bug, add common __target_check_io_state()
      helper for ABORT_TASK + LUN_RESET w/ CMD_T_COMPLETE
      checking, and set CMD_T_ABORTED + obtain ->cmd_kref for
      both cases ahead of last target_put_sess_cmd() after
      TFO->aborted_task() -> transport_cmd_finish_abort()
      callback has completed.
      
      It also introduces SCF_ACK_KREF to determine when
      transport_cmd_finish_abort() needs to drop the second
      extra reference, ahead of calling target_put_sess_cmd()
      for the final kref_put(&se_cmd->cmd_kref).
      
      It also updates transport_cmd_check_stop() to avoid
      holding se_cmd->t_state_lock while dropping se_cmd
      device state via target_remove_from_state_list(), now
      that core_tmr_drain_state_list() is holding the
      se_device lock while checking se_cmd state from
      within TMR logic.
      
      Finally, move transport_put_cmd() release of SGL +
      TMR + extended CDB memory into target_free_cmd_mem()
      in order to avoid potential resource leaks in TMR
      ABORT_TASK + LUN_RESET code-paths.  Also update
      target_release_cmd_kref() accordingly.
      Reviewed-by: default avatarQuinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3286a2fe
    • Jan Engelhardt's avatar
      target: fix COMPARE_AND_WRITE non zero SGL offset data corruption · 7a290324
      Jan Engelhardt authored
      [ Upstream commit d94e5a61 ]
      
      target_core_sbc's compare_and_write functionality suffers from taking
      data at the wrong memory location when writing a CAW request to disk
      when a SGL offset is non-zero.
      
      This can happen with loopback and vhost-scsi fabric drivers when
      SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC is used to map existing user-space
      SGL memory into COMPARE_AND_WRITE READ/WRITE payload buffers.
      
      Given the following sample LIO subtopology,
      
      % targetcli ls /loopback/
      o- loopback ................................. [1 Target]
        o- naa.6001405ebb8df14a ....... [naa.60014059143ed2b3]
          o- luns ................................... [2 LUNs]
            o- lun0 ................ [iblock/ram0 (/dev/ram0)]
            o- lun1 ................ [iblock/ram1 (/dev/ram1)]
      % lsscsi -g
      [3:0:1:0]    disk    LIO-ORG  IBLOCK           4.0   /dev/sdc   /dev/sg3
      [3:0:1:1]    disk    LIO-ORG  IBLOCK           4.0   /dev/sdd   /dev/sg4
      
      the following bug can be observed in Linux 4.3 and 4.4~rc1:
      
      % perl -e 'print chr$_ for 0..255,reverse 0..255' >rand
      % perl -e 'print "\0" x 512' >zero
      % cat rand >/dev/sdd
      % sg_compare_and_write -i rand -D zero --lba 0 /dev/sdd
      % sg_compare_and_write -i zero -D rand --lba 0 /dev/sdd
      Miscompare reported
      % hexdump -Cn 512 /dev/sdd
      00000000  0f 0e 0d 0c 0b 0a 09 08  07 06 05 04 03 02 01 00
      00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00000200
      
      Rather than writing all-zeroes as instructed with the -D file, it
      corrupts the data in the sector by splicing some of the original
      bytes in. The page of the first entry of cmd->t_data_sg includes the
      CDB, and sg->offset is set to a position past the CDB. I presume that
      sg->offset is also the right choice to use for subsequent sglist
      members.
      Signed-off-by: default avatarJan Engelhardt <jengelh@netitwork.de>
      Tested-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7a290324
    • Nicholas Bellinger's avatar
      target: Fix race for SCF_COMPARE_AND_WRITE_POST checking · 5dd73bb3
      Nicholas Bellinger authored
      [ Upstream commit 057085e5 ]
      
      This patch addresses a race + use after free where the first
      stage of COMPARE_AND_WRITE in compare_and_write_callback()
      is rescheduled after the backend sends the secondary WRITE,
      resulting in second stage compare_and_write_post() callback
      completing in target_complete_ok_work() before the first
      can return.
      
      Because current code depends on checking se_cmd->se_cmd_flags
      after return from se_cmd->transport_complete_callback(),
      this results in first stage having SCF_COMPARE_AND_WRITE_POST
      set, which incorrectly falls through into second stage CAW
      processing code, eventually triggering a NULL pointer
      dereference due to use after free.
      
      To address this bug, pass in a new *post_ret parameter into
      se_cmd->transport_complete_callback(), and depend upon this
      value instead of ->se_cmd_flags to determine when to return
      or fall through into ->queue_status() code for CAW.
      
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5dd73bb3
    • Nicholas Bellinger's avatar
      iscsi-target: Fix rx_login_comp hang after login failure · 4b4bc57a
      Nicholas Bellinger authored
      [ Upstream commit ca82c2bd ]
      
      This patch addresses a case where iscsi_target_do_tx_login_io()
      fails sending the last login response PDU, after the RX/TX
      threads have already been started.
      
      The case centers around iscsi_target_rx_thread() not invoking
      allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
      from the failure path, resulting in RX thread hanging
      indefinately on iscsi_conn->rx_login_comp.
      
      Note this bug is a regression introduced by:
      
        commit e5419865
        Author: Nicholas Bellinger <nab@linux-iscsi.org>
        Date:   Wed Jul 22 23:14:19 2015 -0700
      
            iscsi-target: Fix iscsit_start_kthreads failure OOPs
      
      To address this bug, complete ->rx_login_complete for good
      measure in the failure path, and immediately return from
      RX thread context if connection state did not actually reach
      full feature phase (TARG_CONN_STATE_LOGGED_IN).
      
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: <stable@vger.kernel.org> # v3.10+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4b4bc57a
  3. 04 Mar, 2016 15 commits
    • Sasha Levin's avatar
      Linux 4.1.19 · b9a9cfdb
      Sasha Levin authored
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b9a9cfdb
    • Neil Horman's avatar
      sctp: Fix port hash table size computation · 00731a62
      Neil Horman authored
      [ Upstream commit d9749fb5 ]
      
      Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
      its size computation, observing that the current method never guaranteed
      that the hashsize (measured in number of entries) would be a power of two,
      which the input hash function for that table requires.  The root cause of
      the problem is that two values need to be computed (one, the allocation
      order of the storage requries, as passed to __get_free_pages, and two the
      number of entries for the hash table).  Both need to be ^2, but for
      different reasons, and the existing code is simply computing one order
      value, and using it as the basis for both, which is wrong (i.e. it assumes
      that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).
      
      To fix this, we change the logic slightly.  We start by computing a goal
      allocation order (which is limited by the maximum size hash table we want
      to support.  Then we attempt to allocate that size table, decreasing the
      order until a successful allocation is made.  Then, with the resultant
      successful order we compute the number of buckets that hash table supports,
      which we then round down to the nearest power of two, giving us the number
      of entries the table actually supports.
      
      I've tested this locally here, using non-debug and spinlock-debug kernels,
      and the number of entries in the hashtable consistently work out to be
      powers of two in all cases.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      CC: Dmitry Vyukov <dvyukov@google.com>
      CC: Vladislav Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      00731a62
    • Dmitry V. Levin's avatar
      unix_diag: fix incorrect sign extension in unix_lookup_by_ino · fdbc49cb
      Dmitry V. Levin authored
      [ Upstream commit b5f05492 ]
      
      The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
      __u32, but unix_lookup_by_ino's argument ino has type int, which is not
      a problem yet.
      However, when ino is compared with sock_i_ino return value of type
      unsigned long, ino is sign extended to signed long, and this results
      to incorrect comparison on 64-bit architectures for inode numbers
      greater than INT_MAX.
      
      This bug was found by strace test suite.
      
      Fixes: 5d3cae8b ("unix_diag: Dumping exact socket core")
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fdbc49cb
    • Anton Protopopov's avatar
      rtnl: RTM_GETNETCONF: fix wrong return value · 3aa450dc
      Anton Protopopov authored
      [ Upstream commit a97eb33f ]
      
      An error response from a RTM_GETNETCONF request can return the positive
      error value EINVAL in the struct nlmsgerr that can mislead userspace.
      Signed-off-by: default avatarAnton Protopopov <a.s.protopopov@gmail.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3aa450dc
    • Xin Long's avatar
      route: check and remove route cache when we get route · 1610a64d
      Xin Long authored
      [ Upstream commit deed49df ]
      
      Since the gc of ipv4 route was removed, the route cached would has
      no chance to be removed, and even it has been timeout, it still could
      be used, cause no code to check it's expires.
      
      Fix this issue by checking  and removing route cache when we get route.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1610a64d
    • Guillaume Nault's avatar
      pppoe: fix reference counting in PPPoE proxy · dda12d1b
      Guillaume Nault authored
      [ Upstream commit 29e73269 ]
      
      Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
      This is already handled correctly in the error path.
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dda12d1b
    • Mark Tomlinson's avatar
      l2tp: Fix error creating L2TP tunnels · 89afe370
      Mark Tomlinson authored
      [ Upstream commit 853effc5 ]
      
      A previous commit (33f72e6f) added notification via netlink for tunnels
      when created/modified/deleted. If the notification returned an error,
      this error was returned from the tunnel function. If there were no
      listeners, the error code ESRCH was returned, even though having no
      listeners is not an error. Other calls to this and other similar
      notification functions either ignore the error code, or filter ESRCH.
      This patch checks for ESRCH and does not flag this as an error.
      Reviewed-by: default avatarHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: default avatarMark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      89afe370
    • Eugenia Emantayev's avatar
      net/mlx4_en: Avoid changing dev->features directly in run-time · 29d526ac
      Eugenia Emantayev authored
      [ Upstream commit 925ab1aa ]
      
      It's forbidden to manually change dev->features in run-time. Currently, this is
      done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when
      VXLAN tunnel is set. However, since the stack actually does features intersection
      with hw_enc_features, we can safely revert to advertizing features early when
      registering the netdevice.
      
      Fixes: f4a1edd5 ('net/mlx4_en: Advertize encapsulation offloads [...]')
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      29d526ac
    • Eugenia Emantayev's avatar
      net/mlx4_en: Choose time-stamping shift value according to HW frequency · 5c72a2dd
      Eugenia Emantayev authored
      [ Upstream commit 31c128b6 ]
      
      Previously, the shift value used for time-stamping was constant and didn't
      depend on the HW chip frequency. Change that to take the frequency into account
      and calculate the maximal value in cycles per wraparound of ten seconds. This
      time slot was chosen since it gives a good accuracy in time synchronization.
      
      Algorithm for shift value calculation:
       * Round up the maximal value in cycles to nearest power of two
      
       * Calculate maximal multiplier by division of all 64 bits set
         to above result
      
       * Then, invert the function clocksource_khz2mult() to get the shift from
         maximal mult value
      
      Fixes: ec693d47 ('net/mlx4_en: Add HW timestamping (TS) support')
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Reviewed-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5c72a2dd
    • Amir Vadai's avatar
      net/mlx4_en: Count HW buffer overrun only once · 7a9a967f
      Amir Vadai authored
      [ Upstream commit 281e8b2f ]
      
      RdropOvflw counts overrun of HW buffer, therefore should
      be used for rx_fifo_errors only.
      
      Currently RdropOvflw counter is mistakenly also set into
      rx_missed_errors and rx_over_errors too, which makes the
      device total dropped packets accounting to show wrong results.
      
      Fix that. Use it for rx_fifo_errors only.
      
      Fixes: c27a02cd ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
      Signed-off-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7a9a967f
    • Bjørn Mork's avatar
      qmi_wwan: add "4G LTE usb-modem U901" · ce643bc3
      Bjørn Mork authored
      [ Upstream commit aac8d3c2 ]
      
      Thomas reports:
      
      T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=05c6 ProdID=6001 Rev=00.00
      S:  Manufacturer=USB Modem
      S:  Product=USB Modem
      S:  SerialNumber=1234567890ABCDEF
      C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
      Reported-by: default avatarThomas Schäfer <tschaefer@t-online.de>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ce643bc3
    • Rainer Weikusat's avatar
      af_unix: Guard against other == sk in unix_dgram_sendmsg · 73fd505d
      Rainer Weikusat authored
      [ Upstream commit a5527dda ]
      
      The unix_dgram_sendmsg routine use the following test
      
      if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
      
      to determine if sk and other are in an n:1 association (either
      established via connect or by using sendto to send messages to an
      unrelated socket identified by address). This isn't correct as the
      specified address could have been bound to the sending socket itself or
      because this socket could have been connected to itself by the time of
      the unix_peer_get but disconnected before the unix_state_lock(other). In
      both cases, the if-block would be entered despite other == sk which
      might either block the sender unintentionally or lead to trying to unlock
      the same spin lock twice for a non-blocking send. Add a other != sk
      check to guard against this.
      
      Fixes: 7d267278 ("unix: avoid use-after-free in ep_remove_wait_queue")
      Reported-By: default avatarPhilipp Hahn <pmhahn@pmhahn.de>
      Signed-off-by: default avatarRainer Weikusat <rweikusat@mobileactivedefense.com>
      Tested-by: default avatarPhilipp Hahn <pmhahn@pmhahn.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      73fd505d
    • Eric Dumazet's avatar
      ipv4: fix memory leaks in ip_cmsg_send() callers · ffe8615c
      Eric Dumazet authored
      [ Upstream commit 91948309 ]
      
      Dmitry reported memory leaks of IP options allocated in
      ip_cmsg_send() when/if this function returns an error.
      
      Callers are responsible for the freeing.
      
      Many thanks to Dmitry for the report and diagnostic.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ffe8615c
    • Jay Vosburgh's avatar
      bonding: Fix ARP monitor validation · 54d0901d
      Jay Vosburgh authored
      [ Upstream commit 21a75f09 ]
      
      The current logic in bond_arp_rcv will accept an incoming ARP for
      validation if (a) the receiving slave is either "active" (which includes
      the currently active slave, or the current ARP slave) or, (b) there is a
      currently active slave, and it has received an ARP since it became active.
      For case (b), the receiving slave isn't the currently active slave, and is
      receiving the original broadcast ARP request, not an ARP reply from the
      target.
      
      	This logic can fail if there is no currently active slave.  In
      this situation, the ARP probe logic cycles through all slaves, assigning
      each in turn as the "current_arp_slave" for one arp_interval, then setting
      that one as "active," and sending an ARP probe from that slave.  The
      current logic expects the ARP reply to arrive on the sending
      current_arp_slave, however, due to switch FDB updating delays, the reply
      may be directed to another slave.
      
      	This can arise if the bonding slaves and switch are working, but
      the ARP target is not responding.  When the ARP target recovers, a
      condition may result wherein the ARP target host replies faster than the
      switch can update its forwarding table, causing each ARP reply to be sent
      to the previous current_arp_slave.  This will never pass the logic in
      bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
      
      	Some experimentation on a LAN shows ARP reply round trips in the
      200 usec range, but my available switches never update their FDB in less
      than 4000 usec.
      
      	This patch changes the logic in bond_arp_rcv to additionally
      accept an ARP reply for validation on any slave if there is a current ARP
      slave and it sent an ARP probe during the previous arp_interval.
      
      Fixes: aeea64ac ("bonding: don't trust arp requests unless active slave really works")
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      54d0901d
    • Daniel Borkmann's avatar
      bpf: fix branch offset adjustment on backjumps after patching ctx expansion · 0f912f67
      Daniel Borkmann authored
      [ Upstream commit a1b14d27 ]
      
      When ctx access is used, the kernel often needs to expand/rewrite
      instructions, so after that patching, branch offsets have to be
      adjusted for both forward and backward jumps in the new eBPF program,
      but for backward jumps it fails to account the delta. Meaning, for
      example, if the expansion happens exactly on the insn that sits at
      the jump target, it doesn't fix up the back jump offset.
      
      Analysis on what the check in adjust_branches() is currently doing:
      
        /* adjust offset of jmps if necessary */
        if (i < pos && i + insn->off + 1 > pos)
          insn->off += delta;
        else if (i > pos && i + insn->off + 1 < pos)
          insn->off -= delta;
      
      First condition (forward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- i/insn            insns[1] <--- i/insn
        insns[2] <--- pos               insns[P] <--- pos
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- target_X          insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- target_X
                                        insns[5]
      
      First case is if we cross pos-boundary and the jump instruction was
      before pos. This is handeled correctly. I.e. if i == pos, then this
      would mean our jump that we currently check was the patchlet itself
      that we just injected. Since such patchlets are self-contained and
      have no awareness of any insns before or after the patched one, the
      delta is correctly not adjusted. Also, for the second condition in
      case of i + insn->off + 1 == pos, means we jump to that newly patched
      instruction, so no offset adjustment are needed. That part is correct.
      
      Second condition (backward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- target_X          insns[1] <--- target_X
        insns[2] <--- pos <-- target_Y  insns[P] <--- pos <-- target_Y
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- i/insn            insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- i/insn
                                        insns[5]
      
      Second interesting case is where we cross pos-boundary and the jump
      instruction was after pos. Backward jump with i == pos would be
      impossible and pose a bug somewhere in the patchlet, so the first
      condition checking i > pos is okay only by itself. However, i +
      insn->off + 1 < pos does not always work as intended to trigger the
      adjustment. It works when jump targets would be far off where the
      delta wouldn't matter. But, for example, where the fixed insn->off
      before pointed to pos (target_Y), it now points to pos + delta, so
      that additional room needs to be taken into account for the check.
      This means that i) both tests here need to be adjusted into pos + delta,
      and ii) for the second condition, the test needs to be <= as pos
      itself can be a target in the backjump, too.
      
      Fixes: 9bac3d6d ("bpf: allow extended BPF programs access skb fields")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0f912f67