1. 16 Apr, 2018 1 commit
  2. 14 Apr, 2018 4 commits
    • David S. Miller's avatar
      Merge branch 'sfc-ARFS-fixes' · d6606bcc
      David S. Miller authored
      Edward Cree says:
      
      ====================
      sfc: ARFS fixes
      
      Three issues introduced by my recent asynchronous filter handling changes:
      1. The old filter_rfs_insert would replace a matching filter of equal
         priority; we need to pass the appropriate argument to filter_insert to
         make it do the same.
      2. We're lying to the kernel with our return value from ndo_rx_flow_steer,
         so we need to lie consistently when calling rps_may_expire_flow.  This
         is only a partial fix, as the lie still prevents us from steering
         multiple flows with the same ID to different queues; a proper fix that
         stops us lying at all will hopefully follow later.
      3. It's possible to cause the kernel to hammer ndo_rx_flow_steer very
         hard, so make sure we don't build up too huge a backlog of workitems.
      
      Possibly it would be better to fix #3 on the kernel side; I have a patch
       which I think does that but it's not a regression in 4.17 so isn't 'net'
       material.
      There's also the issue that we come up in the bad configuration that
       triggers #3 by default, but that too is a problem for another time.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6606bcc
    • Edward Cree's avatar
      sfc: limit ARFS workitems in flight per channel · f993740e
      Edward Cree authored
      A misconfigured system (e.g. with all interrupts affinitised to all CPUs)
       may produce a storm of ARFS steering events.  With the existing sfc ARFS
       implementation, that could create a backlog of workitems that grinds the
       system to a halt.  To prevent this, limit the number of workitems that
       may be in flight for a given SFC device to 8 (EFX_RPS_MAX_IN_FLIGHT), and
       return EBUSY from our ndo_rx_flow_steer method if the limit is reached.
      Given this limit, also store the workitems in an array of slots within the
       struct efx_nic, rather than dynamically allocating for each request.
      The limit should not negatively impact performance, because it is only
       likely to be hit in cases where ARFS will be ineffective anyway.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f993740e
    • Edward Cree's avatar
      sfc: pass the correctly bogus filter_id to rps_may_expire_flow() · a7f80189
      Edward Cree authored
      When we inserted an ARFS filter for ndo_rx_flow_steer(), we didn't know
       what the filter ID would be, so we just returned 0.  Thus, we must also
       pass 0 as the filter ID when calling rps_may_expire_flow() for it, and
       rely on the flow_id to identify what we're talking about.
      
      Fixes: 3af0f342 ("sfc: replace asynchronous filter operations")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7f80189
    • Edward Cree's avatar
      sfc: insert ARFS filters with replace_equal=true · 494bef4c
      Edward Cree authored
      Necessary to allow redirecting a flow when the application moves.
      
      Fixes: 3af0f342 ("sfc: replace asynchronous filter operations")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      494bef4c
  3. 13 Apr, 2018 26 commits
    • David S. Miller's avatar
      Merge branch 'l2tp-remove-unsafe-calls-to-l2tp_tunnel_find_nth' · c5042dac
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      l2tp: remove unsafe calls to l2tp_tunnel_find_nth()
      
      Using l2tp_tunnel_find_nth() is racy, because the returned tunnel can
      go away as soon as this function returns. This series introduce
      l2tp_tunnel_get_nth() as a safe replacement to fixes these races.
      
      With this series, all unsafe tunnel/session lookups are finally gone.
      ====================
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5042dac
    • Guillaume Nault's avatar
      l2tp: hold reference on tunnels printed in l2tp/tunnels debugfs file · f726214d
      Guillaume Nault authored
      Use l2tp_tunnel_get_nth() instead of l2tp_tunnel_find_nth(), to be safe
      against concurrent tunnel deletion.
      
      Use the same mechanism as in l2tp_ppp.c for dropping the reference
      taken by l2tp_tunnel_get_nth(). That is, drop the reference just
      before looking up the next tunnel. In case of error, drop the last
      accessed tunnel in l2tp_dfs_seq_stop().
      
      That was the last use of l2tp_tunnel_find_nth().
      
      Fixes: 0ad66140 ("l2tp: Add debugfs files for dumping l2tp debug info")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f726214d
    • Guillaume Nault's avatar
      l2tp: hold reference on tunnels printed in pppol2tp proc file · 0e0c3fee
      Guillaume Nault authored
      Use l2tp_tunnel_get_nth() instead of l2tp_tunnel_find_nth(), to be safe
      against concurrent tunnel deletion.
      
      Unlike sessions, we can't drop the reference held on tunnels in
      pppol2tp_seq_show(). Tunnels are reused across several calls to
      pppol2tp_seq_start() when iterating over sessions. These iterations
      need the tunnel for accessing the next session. Therefore the only safe
      moment for dropping the reference is just before searching for the next
      tunnel.
      
      Normally, the last invocation of pppol2tp_next_tunnel() doesn't find
      any new tunnel, so it drops the last tunnel without taking any new
      reference. However, in case of error, pppol2tp_seq_stop() is called
      directly, so we have to drop the reference there.
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e0c3fee
    • Guillaume Nault's avatar
      l2tp: hold reference on tunnels in netlink dumps · 5846c131
      Guillaume Nault authored
      l2tp_tunnel_find_nth() is unsafe: no reference is held on the returned
      tunnel, therefore it can be freed whenever the caller uses it.
      This patch defines l2tp_tunnel_get_nth() which works similarly, but
      also takes a reference on the returned tunnel. The caller then has to
      drop it after it stops using the tunnel.
      
      Convert netlink dumps to make them safe against concurrent tunnel
      deletion.
      
      Fixes: 309795f4 ("l2tp: Add netlink control API for L2TP")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5846c131
    • Jason Wang's avatar
      virtio-net: add missing virtqueue kick when flushing packets · 9267c430
      Jason Wang authored
      We tends to batch submitting packets during XDP_TX. This requires to
      kick virtqueue after a batch, we tried to do it through
      xdp_do_flush_map() which only makes sense for devmap not XDP_TX. So
      explicitly kick the virtqueue in this case.
      Reported-by: default avatarKimitoshi Takahashi <ktaka@nii.ac.jp>
      Tested-by: default avatarKimitoshi Takahashi <ktaka@nii.ac.jp>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9267c430
    • Richard Cochran's avatar
      net: dsa: mv88e6xxx: Fix receive time stamp race condition. · 22904823
      Richard Cochran authored
      The DSA stack passes received PTP frames to this driver via
      mv88e6xxx_port_rxtstamp() for deferred delivery.  The driver then
      queues the frame and kicks the worker thread.  The work callback reads
      out the latched receive time stamp and then works through the queue,
      delivering any non-matching frames without a time stamp.
      
      If a new frame arrives after the worker thread has read out the time
      stamp register but enters the queue before the worker finishes
      processing the queue, that frame will be delivered without a time
      stamp.
      
      This patch fixes the race by moving the queue onto a list on the stack
      before reading out the latched time stamp value.
      
      Fixes: c6fe0ad2 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22904823
    • Wolfgang Bumiller's avatar
      net: fix deadlock while clearing neighbor proxy table · 53b76cdf
      Wolfgang Bumiller authored
      When coming from ndisc_netdev_event() in net/ipv6/ndisc.c,
      neigh_ifdown() is called with &nd_tbl, locking this while
      clearing the proxy neighbor entries when eg. deleting an
      interface. Calling the table's pndisc_destructor() with the
      lock still held, however, can cause a deadlock: When a
      multicast listener is available an IGMP packet of type
      ICMPV6_MGM_REDUCTION may be sent out. When reaching
      ip6_finish_output2(), if no neighbor entry for the target
      address is found, __neigh_create() is called with &nd_tbl,
      which it'll want to lock.
      
      Move the elements into their own list, then unlock the table
      and perform the destruction.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199289
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Signed-off-by: default avatarWolfgang Bumiller <w.bumiller@proxmox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53b76cdf
    • Xin Long's avatar
      sctp: do not check port in sctp_inet6_cmp_addr · 1071ec9d
      Xin Long authored
      pf->cmp_addr() is called before binding a v6 address to the sock. It
      should not check ports, like in sctp_inet_cmp_addr.
      
      But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr,
      sctp_v6_cmp_addr where it also compares the ports.
      
      This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind
      multiple duplicated IPv6 addresses after Commit 40b4f0fd ("sctp:
      lack the check for ports in sctp_v6_cmp_addr").
      
      This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr,
      but do the proper check for both v6 addrs and v4mapped addrs.
      
      v1->v2:
        - define __sctp_v6_cmp_addr to do the common address comparison
          used for both pf and af v6 cmp_addr.
      
      Fixes: 40b4f0fd ("sctp: lack the check for ports in sctp_v6_cmp_addr")
      Reported-by: default avatarJianwen Ji <jiji@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1071ec9d
    • David S. Miller's avatar
      Merge branch... · 837708a8
      David S. Miller authored
      Merge branch 'nfp-improve-signal-handing-on-FW-waits-and-flower-control-message-Jakub Kicinski says:
      
      ====================
      nfp: improve signal handing on FW waits and flower control message processing
      
      The first part of this set aims to improve handling of interrupted
      waits.  Patch 1 makes waiting for management FW responses
      uninterruptible while patch 2 adds a message when signal arrives
      while waiting for an NFP mutex.  We can't interrupt execution of
      FW commands so uninterruptible sleep seems reasonable there.
      Exiting a wait for a mutex should be clean and have no side affects
      so we are allowing to abort it.  Note that both waits have rather
      large timeouts (tens of seconds).
      
      Patches 3 and 4 improve flower offload operation under heavy load.
      Currently there is no cap on the number of queued FW notifications.
      Some of the notifications have to be processed from a workqueue
      which may lead to very large number of messages getting queued
      if workqueue never gets a chance to run.  Pieter puts a limit
      on number of queued messages, tries to drop some messages we ignore
      without queuing and process more important messages first.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      processing'
      837708a8
    • Pieter Jansen van Vuuren's avatar
      nfp: flower: split and limit cmsg skb lists · cf2cbadc
      Pieter Jansen van Vuuren authored
      Introduce a second skb list for handling control messages and limit the
      number of allowed messages. Some control messages are considered more
      crucial than others, resulting in the need for a second skb list. By
      splitting the list into a separate high and low priority list we can
      ensure that messages on the high list get added to the head of the list
      that gets processed, this however has no functional impact. Previously
      there was no limit on the number of messages allowed on the queue, this
      could result in the queue growing boundlessly and eventually the host
      running out of memory.
      
      Fixes: b985f870 ("nfp: process control messages in workqueue in flower app")
      Signed-off-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf2cbadc
    • Pieter Jansen van Vuuren's avatar
      nfp: flower: move route ack control messages out of the workqueue · 0b1a989e
      Pieter Jansen van Vuuren authored
      Previously we processed the route ack control messages in the workqueue,
      this unnecessarily loads the workqueue. We can deal with these messages
      sooner as we know we are going to drop them.
      
      Fixes: 8e6a9046 ("nfp: flower vxlan neighbour offload")
      Signed-off-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b1a989e
    • Jakub Kicinski's avatar
      nfp: print a message when mutex wait is interrupted · bc05f9bc
      Jakub Kicinski authored
      When waiting for an NFP mutex is interrupted print a message
      to make root causing later error messages easier.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc05f9bc
    • Jakub Kicinski's avatar
      nfp: ignore signals when communicating with management FW · 5496295a
      Jakub Kicinski authored
      We currently allow signals to interrupt the wait for management FW
      commands.  Exiting the wait should not cause trouble, the FW will
      just finish executing the command in the background and new commands
      will wait for the old one to finish.
      
      However, this may not be what users expect (Ctrl-C not actually stopping
      the command).  Moreover some systems routinely request link information
      with signals pending (Ubuntu 14.04 runs a landscape-sysinfo python tool
      from MOTD) worrying users with errors like these:
      
      nfp 0000:04:00.0: nfp_nsp: Error -512 waiting for code 0x0007 to start
      nfp 0000:04:00.0: nfp: reading port table failed -512
      
      Make the wait for management FW responses non-interruptible.
      
      Fixes: 1a64821c ("nfp: add support for service processor access")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5496295a
    • Jon Maloy's avatar
      tipc: fix missing initializer in tipc_sendmsg() · 335b929b
      Jon Maloy authored
      The stack variable 'dnode' in __tipc_sendmsg() may theoretically
      end up tipc_node_get_mtu() as an unitilalized variable.
      
      We fix this by intializing the variable at declaration. We also add
      a default else clause to the two conditional ones already there, so
      that we never end up in the named function if the given address
      type is illegal.
      
      Reported-by: syzbot+b0975ce9355b347c1546@syzkaller.appspotmail.com
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      335b929b
    • Doron Roberts-Kedes's avatar
      strparser: Fix incorrect strp->need_bytes value. · 9d0c75bf
      Doron Roberts-Kedes authored
      strp_data_ready resets strp->need_bytes to 0 if strp_peek_len indicates
      that the remainder of the message has been received. However,
      do_strp_work does not reset strp->need_bytes to 0. If do_strp_work
      completes a partial message, the value of strp->need_bytes will continue
      to reflect the needed bytes of the previous message, causing
      future invocations of strp_data_ready to return early if
      strp->need_bytes is less than strp_peek_len. Resetting strp->need_bytes
      to 0 in __strp_recv on handing a full message to the upper layer solves
      this problem.
      
      __strp_recv also calculates strp->need_bytes using stm->accum_len before
      stm->accum_len has been incremented by cand_len. This can cause
      strp->need_bytes to be equal to the full length of the message instead
      of the full length minus the accumulated length. This, in turn, causes
      strp_data_ready to return early, even when there is sufficient data to
      complete the partial message. Incrementing stm->accum_len before using
      it to calculate strp->need_bytes solves this problem.
      
      Found while testing net/tls_sw recv path.
      
      Fixes: 43a0c675 ("strparser: Stream parser for messages")
      Signed-off-by: default avatarDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d0c75bf
    • Anders Roxell's avatar
      selftests: net: add in_netns.sh to TEST_PROGS · 5ff9c1a3
      Anders Roxell authored
      Script in_netns.sh isn't installed.
      --------------------
      running psock_fanout test
      --------------------
      ./run_afpackettests: line 12: ./in_netns.sh: No such file or directory
      [FAIL]
      --------------------
      running psock_tpacket test
      --------------------
      ./run_afpackettests: line 22: ./in_netns.sh: No such file or directory
      [FAIL]
      
      In current code added in_netns.sh to be installed.
      
      Fixes: cc30c93f ("selftests/net: ignore background traffic in psock_fanout")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ff9c1a3
    • David S. Miller's avatar
      Merge branch 'ibmvnic-Fix-parameter-change-request-handling' · 095d3701
      David S. Miller authored
      Nathan Fontenot says:
      
      ====================
      ibmvnic: Fix parameter change request handling
      
      When updating parameters for the ibmvnic driver there is a possibility
      of entering an infinite loop if a return value other that a partial
      success is received from sending the login CRQ.
      
      Also, a deadlock can occur on the rtnl lock if netdev_notify_peers()
      is called during driver reset for a parameter change reset.
      
      This patch set corrects both of these issues by updating the return
      code handling in ibmvnic_login() nand gaurding against calling
      netdev_notify_peers() for parameter change requests.
      
      Updates for V2: Correct spelling mistakes in commit messages.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      095d3701
    • Nathan Fontenot's avatar
      ibmvnic: Do not notify peers on parameter change resets · ebc701b7
      Nathan Fontenot authored
      When attempting to change the driver parameters, such as the MTU
      value or number of queues, do not call netdev_notify_peers().
      Doing so will deadlock on the rtnl_lock.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebc701b7
    • Nathan Fontenot's avatar
      ibmvnic: Handle all login error conditions · 64d92aa2
      Nathan Fontenot authored
      There is a bug in handling the possible return codes from sending the
      login CRQ. The current code treats any non-success return value,
      minus failure to send the crq and a timeout waiting for a login response,
      as a need to re-send the login CRQ. This can put the drive in an
      infinite loop of trying to login when getting return values other
      that a partial success such as a return code of aborted. For these
      scenarios the login will not ever succeed at this point and the
      driver would need to be reset again.
      
      To resolve this loop trying to login is updated to only retry the
      login if the driver gets a return code of a partial success. Other
      return codes are treated as an error and the driver returns an error
      from ibmvnic_login().
      
      To avoid infinite looping in the partial success return cases, the
      number of retries is capped at the maximum number of supported
      queues. This value was chosen because the driver does a renegotiation
      of capabilities which sets the number of queues possible and allows
      the driver to attempt a login for possible value for the number
      of queues supported.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64d92aa2
    • Eric Dumazet's avatar
      net: validate attribute sizes in neigh_dump_table() · 7dd07c14
      Eric Dumazet authored
      Since neigh_dump_table() calls nlmsg_parse() without giving policy
      constraints, attributes can have arbirary size that we must validate
      
      Reported by syzbot/KMSAN :
      
      BUG: KMSAN: uninit-value in neigh_master_filtered net/core/neighbour.c:2292 [inline]
      BUG: KMSAN: uninit-value in neigh_dump_table net/core/neighbour.c:2348 [inline]
      BUG: KMSAN: uninit-value in neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
      CPU: 1 PID: 3575 Comm: syzkaller268891 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       neigh_master_filtered net/core/neighbour.c:2292 [inline]
       neigh_dump_table net/core/neighbour.c:2348 [inline]
       neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
       netlink_dump+0x9ad/0x1540 net/netlink/af_netlink.c:2225
       __netlink_dump_start+0x1167/0x12a0 net/netlink/af_netlink.c:2322
       netlink_dump_start include/linux/netlink.h:214 [inline]
       rtnetlink_rcv_msg+0x1435/0x1560 net/core/rtnetlink.c:4598
       netlink_rcv_skb+0x355/0x5f0 net/netlink/af_netlink.c:2447
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4653
       netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
       netlink_unicast+0x1672/0x1750 net/netlink/af_netlink.c:1337
       netlink_sendmsg+0x1048/0x1310 net/netlink/af_netlink.c:1900
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x43fed9
      RSP: 002b:00007ffddbee2798 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fed9
      RDX: 0000000000000000 RSI: 0000000020005000 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401800
      R13: 0000000000401890 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
       slab_post_alloc_hook mm/slab.h:445 [inline]
       slab_alloc_node mm/slub.c:2737 [inline]
       __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:984 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1183 [inline]
       netlink_sendmsg+0x9a6/0x1310 net/netlink/af_netlink.c:1875
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 21fdd092 ("net: Add support for filtering neigh dump by master device")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7dd07c14
    • Eric Dumazet's avatar
      tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets · 72123032
      Eric Dumazet authored
      syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
      
      I believe this was caused by a TCP_MD5SIG being set on live
      flow.
      
      This is highly unexpected, since TCP option space is limited.
      
      For instance, presence of TCP MD5 option automatically disables
      TCP TimeStamp option at SYN/SYNACK time, which we can not do
      once flow has been established.
      
      Really, adding/deleting an MD5 key only makes sense on sockets
      in CLOSE or LISTEN state.
      
      [1]
      BUG: KMSAN: uninit-value in tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
      CPU: 1 PID: 6177 Comm: syzkaller192004 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
       tcp_fast_parse_options net/ipv4/tcp_input.c:3858 [inline]
       tcp_validate_incoming+0x4f1/0x2790 net/ipv4/tcp_input.c:5184
       tcp_rcv_established+0xf60/0x2bb0 net/ipv4/tcp_input.c:5453
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x448fe9
      RSP: 002b:00007fd472c64d38 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000006e5a30 RCX: 0000000000448fe9
      RDX: 000000000000029f RSI: 0000000020a88f88 RDI: 0000000000000004
      RBP: 00000000006e5a34 R08: 0000000020e68000 R09: 0000000000000010
      R10: 00000000200007fd R11: 0000000000000216 R12: 0000000000000000
      R13: 00007fff074899ef R14: 00007fd472c659c0 R15: 0000000000000009
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
       slab_post_alloc_hook mm/slab.h:445 [inline]
       slab_alloc_node mm/slub.c:2737 [inline]
       __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:984 [inline]
       tcp_send_ack+0x18c/0x910 net/ipv4/tcp_output.c:3624
       __tcp_ack_snd_check net/ipv4/tcp_input.c:5040 [inline]
       tcp_ack_snd_check net/ipv4/tcp_input.c:5053 [inline]
       tcp_rcv_established+0x2103/0x2bb0 net/ipv4/tcp_input.c:5469
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72123032
    • Jon Maloy's avatar
      tipc: fix unbalanced reference counter · c3317f4d
      Jon Maloy authored
      When a topology subscription is created, we may encounter (or KASAN
      may provoke) a failure to create a corresponding service instance in
      the binding table. Instead of letting the tipc_nametbl_subscribe()
      report the failure back to the caller, the function just makes a warning
      printout and returns, without incrementing the subscription reference
      counter as expected by the caller.
      
      This makes the caller believe that the subscription was successful, so
      it will at a later moment try to unsubscribe the item. This involves
      a sub_put() call. Since the reference counter never was incremented
      in the first place, we get a premature delete of the subscription item,
      followed by a "use-after-free" warning.
      
      We fix this by adding a return value to tipc_nametbl_subscribe() and
      make the caller aware of the failure to subscribe.
      
      This bug seems to always have been around, but this fix only applies
      back to the commit shown below. Given the low risk of this happening
      we believe this to be sufficient.
      
      Fixes: commit 218527fe ("tipc: replace name table service range
      array with rb tree")
      Reported-by: syzbot+aa245f26d42b8305d157@syzkaller.appspotmail.com
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3317f4d
    • Raghuram Chary J's avatar
      lan78xx: PHY DSP registers initialization to address EEE link drop issues with long cables · 1c2734b3
      Raghuram Chary J authored
      The patch is to configure DSP registers of PHY device
      to handle Gbe-EEE failures with >40m cable length.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: default avatarRaghuram Chary J <raghuramchary.jallipalli@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c2734b3
    • Laura Abbott's avatar
      mISDN: Remove VLAs · 9a438161
      Laura Abbott authored
      There's an ongoing effort to remove VLAs[1] from the kernel to eventually
      turn on -Wvla. Remove the VLAs from the mISDN code by switching to using
      kstrdup in one place and using an upper bound in another.
      Signed-off-by: default avatarLaura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a438161
    • Kees Cook's avatar
      net/tls: Remove VLA usage · b16520f7
      Kees Cook authored
      In the quest to remove VLAs from the kernel[1], this replaces the VLA
      size with the only possible size used in the code, and adds a mechanism
      to double-check future IV sizes.
      
      [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b16520f7
    • Kees Cook's avatar
      ibmvnic: Define vnic_login_client_data name field as unsized array · 08ea556e
      Kees Cook authored
      The "name" field of struct vnic_login_client_data is a char array of
      undefined length. This should be written as "char name[]" so the compiler
      can make better decisions about the field (for example, not assuming
      it's a single character). This was noticed while trying to tighten the
      CONFIG_FORTIFY_SOURCE checking.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08ea556e
  4. 12 Apr, 2018 9 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5d136594
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) In ip_gre tunnel, handle the conflict between TUNNEL_{SEQ,CSUM} and
          GSO/LLTX properly. From Sabrina Dubroca.
      
       2) Stop properly on error in lan78xx_read_otp(), from Phil Elwell.
      
       3) Don't uncompress in slip before rstate is initialized, from Tejaswi
          Tanikella.
      
       4) When using 1.x firmware on aquantia, issue a deinit before we
          hardware reset the chip, otherwise we break dirty wake WOL. From
          Igor Russkikh.
      
       5) Correct log check in vhost_vq_access_ok(), from Stefan Hajnoczi.
      
       6) Fix ethtool -x crashes in bnxt_en, from Michael Chan.
      
       7) Fix races in l2tp tunnel creation and duplicate tunnel detection,
          from Guillaume Nault.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
        l2tp: fix race in duplicate tunnel detection
        l2tp: fix races in tunnel creation
        tun: send netlink notification when the device is modified
        tun: set the flags before registering the netdevice
        lan78xx: Don't reset the interface on open
        bnxt_en: Fix NULL pointer dereference at bnxt_free_irq().
        bnxt_en: Need to include RDMA rings in bnxt_check_rings().
        bnxt_en: Support max-mtu with VF-reps
        bnxt_en: Ignore src port field in decap filter nodes
        bnxt_en: do not allow wildcard matches for L2 flows
        bnxt_en: Fix ethtool -x crash when device is down.
        vhost: return bool from *_access_ok() functions
        vhost: fix vhost_vq_access_ok() log check
        vhost: Fix vhost_copy_to_user()
        net: aquantia: oops when shutdown on already stopped device
        net: aquantia: Regression on reset with 1.x firmware
        cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
        slip: Check if rstate is initialized before uncompressing
        lan78xx: Avoid spurious kevent 4 "error"
        lan78xx: Correctly indicate invalid OTP
        ...
      5d136594
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 67a7a8ff
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "A few fixes of Xen related core code and drivers"
      
      * tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen
        xen/acpi: off by one in read_acpi_id()
        xen/acpi: upload _PSD info for non Dom0 CPUs too
        x86/xen: Delay get_cpu_cap until stack canary is established
        xen: xenbus_dev_frontend: Verify body of XS_TRANSACTION_END
        xen: xenbus: Catch closing of non existent transactions
        xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling
      67a7a8ff
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-4.17-2' of git://git.infradead.org/users/hch/dma-mapping · c5c177c5
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Fix for one swiotlb regression in 2.16 from Takashi"
      
      * tag 'dma-mapping-4.17-2' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: fix unexpected swiotlb_alloc_coherent failures
      c5c177c5
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · d1cb7718
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
         - Prevent bus reference leak in mmc_blk_init()
      
        MMC host:
         - tmio: Fix error handling when issuing CMD23
         - jz4740: Fix race condition in IRQ mask update"
      
      * tag 'mmc-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: tmio: Fix error handling when issuing CMD23
        mmc: core: Prevent bus reference leak in mmc_blk_init()
        mmc: jz4740: Fix race condition in IRQ mask update
      d1cb7718
    • Linus Torvalds's avatar
      Merge tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb · cb098d50
      Linus Torvalds authored
      Pull kdb updates from Jason Wessel:
      
       - fix 2032 time access issues and new compiler warnings
      
       - minor regression test cleanup
      
       - formatting fixes for end user use of kdb
      
      * tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
        kdb: use memmove instead of overlapping memcpy
        kdb: use ktime_get_mono_fast_ns() instead of ktime_get_ts()
        kdb: bl: don't use tab character in output
        kdb: drop newline in unknown command output
        kdb: make "mdr" command repeat
        kdb: use __ktime_get_real_seconds instead of __current_kernel_time
        misc: kgdbts: Display progress of asynchronous tests
      cb098d50
    • Linus Torvalds's avatar
      Merge tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze · 07820c3b
      Linus Torvalds authored
      Pull microblaze updates from Michal Simek:
       "Use generic pci_mmap_resource_range()"
      
      * tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Use generic pci_mmap_resource_range()
        microblaze: Provide pgprot_device/writecombine macros for nommu
      07820c3b
    • Linus Torvalds's avatar
      Merge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic · c17b0aad
      Linus Torvalds authored
      Pull asm-generic fixes from Arnd Bergmann:
       "I have one regression fix for a minor build problem after the
        architecture removal series, plus a rework of the barriers in the
        readl/writel functions, thanks to work by Sinan Kaya:
      
        This started from a discussion on the linuxpcc and rdma mailing
        lists[1]. To summarize, we decided that architectures are responsible
        to serialize readl() and writel() accesses on a device MMIO space
        relative to DMA performed by that device.
      
        This series provides a pessimistic implementation of that behavior for
        asm-generic/io.h, which is in turn used by a number of architectures
        (h8300, microblaze, nios2, openrisc, s390, sparc, um, unicore32, and
        xtensa). Some of those presumably need no extra barriers, or something
        weaker than rmb()/wmb(), and they are advised to override the new
        default for better performance.
      
        For inb()/outb(), the same barriers are used, but architectures might
        want to add another barrier to outb() here if that can guarantee
        non-posted behavior (some architectures can, others cannot do that).
      
        The readl_relaxed()/writel_relaxed() family of functions retains the
        existing behavior with no extra barriers"
      
      [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-March/170481.html
      
      * tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        io: change writeX_relaxed() to remove barriers
        io: change readX_relaxed() to remove barriers
        dts: remove cris & metag dts hard link file
        io: change inX() to have their own IO barrier overrides
        io: change outX() to have their own IO barrier overrides
        io: define stronger ordering for the default writeX() implementation
        io: define stronger ordering for the default readX() implementation
        io: define several IO & PIO barrier types for the asm-generic version
      c17b0aad
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · e241e3f2
      Linus Torvalds authored
      Pull virtio update from Michael Tsirkin:
       "This adds reporting hugepage stats to virtio-balloon"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_balloon: export hugetlb page allocation counts
      e241e3f2
    • Linus Torvalds's avatar
      Merge tag 'iommu-updates-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · e5c37228
      Linus Torvalds authored
      Pull IOMMU updates from Joerg Roedel:
      
       - OF_IOMMU support for the Rockchip iommu driver so that it can use
         generic DT bindings
      
       - rework of locking in the AMD IOMMU interrupt remapping code to make
         it work better in RT kernels
      
       - support for improved iotlb flushing in the AMD IOMMU driver
      
       - support for 52-bit physical and virtual addressing in the ARM-SMMU
      
       - various other small fixes and cleanups
      
      * tag 'iommu-updates-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits)
        iommu/io-pgtable-arm: Avoid warning with 32-bit phys_addr_t
        iommu/rockchip: Support sharing IOMMU between masters
        iommu/rockchip: Add runtime PM support
        iommu/rockchip: Fix error handling in init
        iommu/rockchip: Use OF_IOMMU to attach devices automatically
        iommu/rockchip: Use IOMMU device for dma mapping operations
        dt-bindings: iommu/rockchip: Add clock property
        iommu/rockchip: Control clocks needed to access the IOMMU
        iommu/rockchip: Fix TLB flush of secondary IOMMUs
        iommu/rockchip: Use iopoll helpers to wait for hardware
        iommu/rockchip: Fix error handling in attach
        iommu/rockchip: Request irqs in rk_iommu_probe()
        iommu/rockchip: Fix error handling in probe
        iommu/rockchip: Prohibit unbind and remove
        iommu/amd: Return proper error code in irq_remapping_alloc()
        iommu/amd: Make amd_iommu_devtable_lock a spin_lock
        iommu/amd: Drop the lock while allocating new irq remap table
        iommu/amd: Factor out setting the remap table for a devid
        iommu/amd: Use `table' instead `irt' as variable name in amd_iommu_update_ga()
        iommu/amd: Remove the special case from alloc_irq_table()
        ...
      e5c37228