1. 05 Dec, 2017 2 commits
    • Hendrik Brueckner's avatar
      s390/bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · 466698e6
      Hendrik Brueckner authored
      To mitigate and correct the broken uapi for the BPF_PROG_TYPE_PERF_EVENT
      program type, introduce a user_pt_regs structure (similar to arm64) that
      exports parts from the beginnig of the pt_regs structure.
      
      The export must start with the beginning of the pt_regs structure because
      to correctly calculate BPF prologues for perf (regs_query_register_offset()).
      
      For BPF_PROG_TYPE_PERF_EVENT program types, the BPF program is then passed
      a user_pt_regs structure.
      
      Note: Depending on future changes to the s390 pt_regs structure, consider
      the user_pt_regs structure to be stable for a particular kernel version
      only. (Of course, s390 tries to ensure keep it stable as much as possible.)
      Signed-off-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      466698e6
    • Hendrik Brueckner's avatar
      bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · c895f6f7
      Hendrik Brueckner authored
      Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
      program type") introduced the bpf_perf_event_data structure which
      exports the pt_regs structure.  This is OK for multiple architectures
      but fail for s390 and arm64 which do not export pt_regs.  Programs
      using them, for example, the bpf selftest fail to compile on these
      architectures.
      
      For s390, exporting the pt_regs is not an option because s390 wants
      to allow changes to it.  For arm64, there is a user_pt_regs structure
      that covers parts of the pt_regs structure for use by user space.
      
      To solve the broken uapi for s390 and arm64, introduce an abstract
      type for pt_regs and add an asm/bpf_perf_event.h file that concretes
      the type.  An asm-generic header file covers the architectures that
      export pt_regs today.
      
      The arch-specific enablement for s390 and arm64 follows in separate
      commits.
      Reported-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
      Signed-off-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c895f6f7
  2. 04 Dec, 2017 3 commits
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2391f0b4
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "virtio and qemu bugfixes
      
        A couple of bugfixes that just became ready"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_balloon: fix increment of vb->num_pfns in fill_balloon()
        virtio: release virtio index when fail to device_register
        fw_cfg: fix driver remove
      2391f0b4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 236fa078
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various TCP control block fixes, including one that crashes with
          SELinux, from David Ahern and Eric Dumazet.
      
       2) Fix ACK generation in rxrpc, from David Howells.
      
       3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
          from Gao Feng.
      
       4) SIT configuration doesn't take on the frag_off ipv4 field
          configuration properly, fix from Hangbin Liu.
      
       5) TSO can fail after device down/up on stmmac, fix from Lars Persson.
      
       6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.
      
       7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
          performance problems). From Wei Xu.
      
       8) mvpps's TX descriptors were not zero initialized, from Yan Markman.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
        tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
        tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
        rxrpc: Fix the MAINTAINERS record
        rxrpc: Use correct netns source in rxrpc_release_sock()
        liquidio: fix incorrect indentation of assignment statement
        stmmac: reset last TSO segment size after device open
        ipvlan: Add the skb->mark as flow4's member to lookup route
        s390/qeth: build max size GSO skbs on L2 devices
        s390/qeth: fix GSO throughput regression
        s390/qeth: fix thinko in IPv4 multicast address tracking
        tap: free skb if flags error
        tun: free skb in early errors
        vhost: fix skb leak in handle_rx()
        bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
        bnxt_en: fix dst/src fid for vxlan encap/decap actions
        bnxt_en: wildcard smac while creating tunnel decap filter
        bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
        phylink: ensure we take the link down when phylink_stop() is called
        sfp: warn about modules requiring address change sequence
        sfp: improve RX_LOS handling
        ...
      236fa078
    • Chris Metcalf's avatar
      arch/tile: mark as orphaned · 8ee5ad1d
      Chris Metcalf authored
      The chip family of TILEPro and TILE-Gx was developed by Tilera, which
      was eventually acquired by Mellanox.  The tile architecture was added to
      the kernel in 2010 and first appeared in 2.6.36.
      
      Now at Mellanox we are developing new chips based on the ARM64
      architecture; our last TILE-Gx chip (the Gx72) was released in 2013, and
      our customers using tile architecture products are not, as far as we
      know, looking to upgrade to newer kernel releases.  In the absence of
      someone in the community stepping up to take over maintainership, this
      commit marks the architecture as orphaned.
      
      Cc: Chris Metcalf <metcalf@alum.mit.edu>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ee5ad1d
  3. 03 Dec, 2017 28 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c2eb6d07
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2017-12-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a compilation warning in xdp redirect tracepoint due to
         missing bpf.h include that pulls in struct bpf_map, from Xie.
      
      2) Limit the maximum number of attachable BPF progs for a given
         perf event as long as uabi is not frozen yet. The hard upper
         limit is now 64 and therefore the same as with BPF multi-prog
         for cgroups. Also add related error checking for the sample
         BPF loader when enabling and attaching to the perf event, from
         Yonghong.
      
      3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
         case, so that the test case can always pass and not fail in
         some environments due to too low default limit, also from
         Yonghong.
      
      4) Fix up a missing license header comment for kernel/bpf/offload.c,
         from Jakub.
      
      5) Several fixes for bpftool, among others a crash on incorrect
         arguments when json output is used, error message handling
         fixes on unknown options and proper destruction of json writer
         for some exit cases, all from Quentin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2eb6d07
    • David S. Miller's avatar
      Merge branch 'tcp-cb-selinux-corruption' · e4485c74
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
      
      James Morris reported kernel stack corruption bug that
      we tracked back to commit 971f10ec ("tcp: better TCP_SKB_CB
      layout to reduce cache line misses")
      
      First patch needs to be backported to kernels >= 3.18,
      while second patch needs to be backported to kernels >= 4.9, since
      this was the time when inet_exact_dif_match appeared.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4485c74
    • David Ahern's avatar
      tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match() · b4d1605a
      David Ahern authored
      After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
      socket lookups happen while skb->cb[] has not been mangled yet by TCP.
      
      Fixes: a04a480d ("net: Require exact match for TCP socket lookups if dif is l3mdev")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4d1605a
    • Eric Dumazet's avatar
      tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb() · eeea10b8
      Eric Dumazet authored
      James Morris reported kernel stack corruption bug [1] while
      running the SELinux testsuite, and bisected to a recent
      commit bffa72cf ("net: sk_buff rbnode reorg")
      
      We believe this commit is fine, but exposes an older bug.
      
      SELinux code runs from tcp_filter() and might send an ICMP,
      expecting IP options to be found in skb->cb[] using regular IPCB placement.
      
      We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.
      
      This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
      similar way we added them for IPv6.
      
      [1]
      [  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
      [  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
      [  339.822505]
      [  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
      [  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
      [  339.885060] Call Trace:
      [  339.896875]  <IRQ>
      [  339.908103]  dump_stack+0x63/0x87
      [  339.920645]  panic+0xe8/0x248
      [  339.932668]  ? ip_push_pending_frames+0x33/0x40
      [  339.946328]  ? icmp_send+0x525/0x530
      [  339.958861]  ? kfree_skbmem+0x60/0x70
      [  339.971431]  __stack_chk_fail+0x1b/0x20
      [  339.984049]  icmp_send+0x525/0x530
      [  339.996205]  ? netlbl_skbuff_err+0x36/0x40
      [  340.008997]  ? selinux_netlbl_err+0x11/0x20
      [  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
      [  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
      [  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
      [  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
      [  340.074562]  ? tcp_filter+0x2c/0x40
      [  340.086400]  ? tcp_v4_rcv+0x820/0xa20
      [  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
      [  340.111279]  ? ip_local_deliver+0x6f/0xe0
      [  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
      [  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
      [  340.147442]  ? ip_rcv+0x27c/0x3c0
      [  340.158668]  ? inet_del_offload+0x40/0x40
      [  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
      [  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
      [  340.195282]  ? __netif_receive_skb+0x18/0x60
      [  340.207288]  ? process_backlog+0x95/0x140
      [  340.218948]  ? net_rx_action+0x26c/0x3b0
      [  340.230416]  ? __do_softirq+0xc9/0x26a
      [  340.241625]  ? do_softirq_own_stack+0x2a/0x40
      [  340.253368]  </IRQ>
      [  340.262673]  ? do_softirq+0x50/0x60
      [  340.273450]  ? __local_bh_enable_ip+0x57/0x60
      [  340.285045]  ? ip_finish_output2+0x175/0x350
      [  340.296403]  ? ip_finish_output+0x127/0x1d0
      [  340.307665]  ? nf_hook_slow+0x3c/0xb0
      [  340.318230]  ? ip_output+0x72/0xe0
      [  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
      [  340.340070]  ? ip_local_out+0x35/0x40
      [  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
      [  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
      [  340.372484]  ? __skb_clone+0x2e/0x130
      [  340.382633]  ? tcp_transmit_skb+0x558/0xa10
      [  340.393262]  ? tcp_connect+0x938/0xad0
      [  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
      [  340.414206]  ? tcp_v4_connect+0x457/0x4e0
      [  340.424471]  ? __inet_stream_connect+0xb3/0x300
      [  340.435195]  ? inet_stream_connect+0x3b/0x60
      [  340.445607]  ? SYSC_connect+0xd9/0x110
      [  340.455455]  ? __audit_syscall_entry+0xaf/0x100
      [  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
      [  340.476636]  ? __audit_syscall_exit+0x209/0x290
      [  340.487151]  ? SyS_connect+0xe/0x10
      [  340.496453]  ? do_syscall_64+0x67/0x1b0
      [  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJames Morris <james.l.morris@oracle.com>
      Tested-by: default avatarJames Morris <james.l.morris@oracle.com>
      Tested-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeea10b8
    • Linus Torvalds's avatar
      Linux 4.15-rc2 · ae64f9bd
      Linus Torvalds authored
      ae64f9bd
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · 87fc5c68
      Linus Torvalds authored
      Pull ARM fix from Russell King:
       "Just one fix this time around, for the late commit in the merge window
        that triggered a problem with qemu. Qemu is apparently also going to
        receive a fix for the discovered issue"
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: avoid faulting on qemu
      87fc5c68
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · ae4806a3
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Here are two bugfixes for I2C, fixing a memleak in the core and irq
        allocation for i801.
      
        Also three bugfixes for the at24 eeprom driver which Bartosz collected
        while taking over maintainership for this driver"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        eeprom: at24: check at24_read/write arguments
        eeprom: at24: fix reading from 24MAC402/24MAC602
        eeprom: at24: correctly set the size for at24mac402
        i2c: i2c-boardinfo: fix memory leaks on devinfo
        i2c: i801: Fix Failed to allocate irq -2147483648 error
      ae4806a3
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.15-rc2' of... · 49a418d7
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
       "Fixes:
      
         - Drop reference to obsolete maintainer tree
      
         - Fix overflow bug in pmbus driver
      
         - Fix SMBUS timeout problem in jc42 driver
      
        For the SMBUS timeout handling, we had a brief discussion if this
        should be considered a bug fix or a feature. Peter says "it fixes real
        problems where the application misbehave due to faulty content when
        reading from an eeprom", and he needs the patch in his company's v4.14
        images. This is good enough for me and warrants backport to stable
        kernels"
      
      * tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (jc42) optionally try to disable the SMBUS timeout
        hwmon: (pmbus) Use 64bit math for DIRECT format values
        hwmon: Drop reference to Jean's tree
      49a418d7
    • David Howells's avatar
      rxrpc: Fix the MAINTAINERS record · bcd1d601
      David Howells authored
      Fix the MAINTAINERS record so that it's more obvious who the maintainer for
      AF_RXRPC is.
      Reported-by: default avatarJoe Perches <joe@perches.com>
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcd1d601
    • David Howells's avatar
      rxrpc: Use correct netns source in rxrpc_release_sock() · c5012564
      David Howells authored
      In rxrpc_release_sock() there may be no rx->local value to access, so we
      can't unconditionally follow it to the rxrpc network namespace information
      to poke the connection reapers.
      
      Instead, use the socket's namespace pointer to find the namespace.
      
      This unfixed code causes the following static checker warning:
      
      	net/rxrpc/af_rxrpc.c:898 rxrpc_release_sock()
      	error: we previously assumed 'rx->local' could be null (see line 887)
      
      Fixes: 3d18cbb7 ("rxrpc: Fix conn expiry timers")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5012564
    • Colin Ian King's avatar
      liquidio: fix incorrect indentation of assignment statement · 886afc1d
      Colin Ian King authored
      Remove one extraneous level of indentation on assignment statement.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      886afc1d
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-4.15-20171201' of... · ed75e1ac
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-4.15-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2017-12-01
      
      this is a pull for net consisting of nine patches.
      
      The first three patches are by Jimmy Assarsson for the kvaser_usb driver
      and add the missing free()s in some error path, a signed/unsigned
      comparison and ratelimit the error messages in case of incomplete
      messages. Oliver Stäbler's patch for the ti_hecc driver fix the napi
      poll function's return value. The return values of the probe function of
      the peak_canfd and peak_pci PCI drivers are fixed by Stephane Grosjean's
      patch. Two patches by me for the flexcan driver update the
      bugs/features/quirks overview table and fix the error state transition
      for the VF610 SoC. The two patches by Martin Kelly for the mcba_usb
      driver fix a typo and a device disconnect bug.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed75e1ac
    • Wolfram Sang's avatar
      Merge tag 'at24-4.15-fixes-for-wolfram' of... · edef3098
      Wolfram Sang authored
      Merge tag 'at24-4.15-fixes-for-wolfram' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current
      
      Please consider pulling the following fixes for v4.15. While it doesn't
      fix any regression introduced in the v4.15 merge window, we have a
      feature in at24 since linux v4.8 - reading the mac address block from
      at24mac series - which turned out to be not working.
      
      This pull request contains changes that fix it together with a patch
      that hardens the read and write argument sanitization with
      out-of-bounds checks that were missing.
      edef3098
    • Lars Persson's avatar
      stmmac: reset last TSO segment size after device open · 45ab4b13
      Lars Persson authored
      The mss variable tracks the last max segment size sent to the TSO
      engine. We do not update the hardware as long as we receive skb:s with
      the same value in gso_size.
      
      During a network device down/up cycle (mapped to stmmac_release() and
      stmmac_open() callbacks) we issue a reset to the hardware and it
      forgets the setting for mss. However we did not zero out our mss
      variable so the next transmission of a gso packet happens with an
      undefined hardware setting.
      
      This triggers a hang in the TSO engine and eventuelly the netdev
      watchdog will bark.
      
      Fixes: f748be53 ("stmmac: support new GMAC4")
      Signed-off-by: default avatarLars Persson <larper@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45ab4b13
    • Gao Feng's avatar
      ipvlan: Add the skb->mark as flow4's member to lookup route · a98a4ebc
      Gao Feng authored
      Current codes don't use skb->mark to assign flowi4_mark, it would
      make the policy route rule with fwmark doesn't work as expected.
      Signed-off-by: default avatarGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a98a4ebc
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · af57b7ff
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2017-12-01
      
      please apply the following three fixes for 4.15. These should also go
      back to stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af57b7ff
    • Julian Wiedmann's avatar
      s390/qeth: build max size GSO skbs on L2 devices · 0cbff6d4
      Julian Wiedmann authored
      The current GSO skb size limit was copy&pasted over from the L3 path,
      where it is needed due to a TSO limitation.
      As L2 devices don't offer TSO support (and thus all GSO skbs are
      segmented before they reach the driver), there's no reason to restrict
      the stack in how large it may build the GSO skbs.
      
      Fixes: d52aec97 ("qeth: enable scatter/gather in layer 2 mode")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cbff6d4
    • Julian Wiedmann's avatar
      s390/qeth: fix GSO throughput regression · 6d69b1f1
      Julian Wiedmann authored
      Using GSO with small MTUs currently results in a substantial throughput
      regression - which is caused by how qeth needs to map non-linear skbs
      into its IO buffer elements:
      compared to a linear skb, each GSO-segmented skb effectively consumes
      twice as many buffer elements (ie two instead of one) due to the
      additional header-only part. This causes the Output Queue to be
      congested with low-utilized IO buffers.
      
      Fix this as follows:
      If the MSS is low enough so that a non-SG GSO segmentation produces
      order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is
      where we anticipate the biggest savings, since an SG-enabled
      GSO segmentation produces skbs that always consume at least two
      buffer elements.
      
      Larger MSS values continue to get a SG-enabled GSO segmentation, since
      1) the relative overhead of the additional header-only buffer element
      becomes less noticeable, and
      2) the linearization overhead increases.
      
      With the throughput regression fixed, re-enable NETIF_F_SG by default to
      reap the significant CPU savings of GSO.
      
      Fixes: 5722963a ("qeth: do not turn on SG per default")
      Reported-by: default avatarNils Hoppmann <niho@de.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d69b1f1
    • Julian Wiedmann's avatar
      s390/qeth: fix thinko in IPv4 multicast address tracking · bc3ab705
      Julian Wiedmann authored
      Commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      reworked how secondary addresses are managed for qeth devices.
      Instead of dropping & subsequently re-adding all addresses on every
      ndo_set_rx_mode() call, qeth now keeps track of the addresses that are
      currently registered with the HW.
      On a ndo_set_rx_mode(), we thus only need to do (de-)registration
      requests for the addresses that have actually changed.
      
      On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong
      hashtable - and thus never finds a match. As a result, we first delete
      *all* such addresses, and then re-add them again. So each set_rx_mode()
      causes a short period where the IPv4 Multicast addresses are not
      registered, and the card stops forwarding inbound traffic for them.
      
      Fix this by setting the ->is_multicast flag on the lookup object, thus
      enabling qeth_l3_ip_from_hash() to search the correct hashtable and
      find a match there.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc3ab705
    • David S. Miller's avatar
      Merge branch 'vhost-skb-leaks' · 7344ba03
      David S. Miller authored
      Wei Xu says:
      
      ====================
      vhost: fix a few skb leaks
      
      Matthew found a roughly 40% tcp throughput regression with commit
      c67df11f(vhost_net: try batch dequing from skb array) as discussed
      in the following thread:
      https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
      
      v4:
      - fix zero iov iterator count in tap/tap_do_read()(Jason)
      - don't put tun in case of EBADFD(Jason)
      - Replace msg->msg_control with new 'skb' when calling tun/tap_do_read()
      
      v3:
      - move freeing skb from vhost to tun/tap recvmsg() to not
        confuse the callers.
      
      v2:
      - add Matthew as the reporter, thanks matthew.
      - moving zero headcount check ahead instead of defer consuming skb
        due to jason and mst's comment.
      - add freeing skb in favor of recvmsg() fails.
      ====================
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7344ba03
    • Wei Xu's avatar
      tap: free skb if flags error · 61d78537
      Wei Xu authored
      tap_recvmsg() supports accepting skb by msg_control after
      commit 3b4ba04a ("tap: support receiving skb from msg_control"),
      the skb if presented should be freed within the function, otherwise
      it would be leaked.
      Signed-off-by: default avatarWei Xu <wexu@redhat.com>
      Reported-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61d78537
    • Wei Xu's avatar
      tun: free skb in early errors · c33ee15b
      Wei Xu authored
      tun_recvmsg() supports accepting skb by msg_control after
      commit ac77cfd4 ("tun: support receiving skb through msg_control"),
      the skb if presented should be freed no matter how far it can go
      along, otherwise it would be leaked.
      
      This patch fixes several missed cases.
      Signed-off-by: default avatarWei Xu <wexu@redhat.com>
      Reported-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c33ee15b
    • Wei Xu's avatar
      vhost: fix skb leak in handle_rx() · 6e474083
      Wei Xu authored
      Matthew found a roughly 40% tcp throughput regression with commit
      c67df11f(vhost_net: try batch dequing from skb array) as discussed
      in the following thread:
      https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
      
      Eventually we figured out that it was a skb leak in handle_rx()
      when sending packets to the VM. This usually happens when a guest
      can not drain out vq as fast as vhost fills in, afterwards it sets
      off the traffic jam and leaks skb(s) which occurs as no headcount
      to send on the vq from vhost side.
      
      This can be avoided by making sure we have got enough headcount
      before actually consuming a skb from the batched rx array while
      transmitting, which is simply done by moving checking the zero
      headcount a bit ahead.
      Signed-off-by: default avatarWei Xu <wexu@redhat.com>
      Reported-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e474083
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · fa935ca2
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Fixes.
      
      A shutdown fix for SMARTNIC, 2 fixes related to TC Flower vxlan
      filters, and the last one fixes an out-of-scope variable when sending
      short firmware messages.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa935ca2
    • Vasundhara Volam's avatar
      bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg() · ebd5818c
      Vasundhara Volam authored
      short_input variable is assigned to another data pointer which is
      referred out of its scope. Fix it by moving short_input definition
      to the beginning of bnxt_hwrm_do_send_msg() function.
      
      No failure has been reported so far due to this issue.
      
      Fixes: e605db80 ("bnxt_en: Support for Short Firmware Message")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebd5818c
    • Sathya Perla's avatar
      bnxt_en: fix dst/src fid for vxlan encap/decap actions · e9ecc731
      Sathya Perla authored
      For flows that involve a vxlan encap action, the vxlan sock
      interface may be specified as the outgoing interface. The driver
      must resolve the outgoing PF interface used by this socket and
      use the dst_fid of the PF in the hwrm_cfa_encap_record_alloc cmd.
      
      Similarily for flows that have a vxlan decap action, the
      fid of the incoming PF interface must be used as the src_fid in
      the hwrm_cfa_decap_filter_alloc cmd.
      
      Fixes: 8c95f773 ("bnxt_en: add support for Flower based vxlan encap/decap offload")
      Signed-off-by: default avatarSathya Perla <sathya.perla@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9ecc731
    • Sunil Challa's avatar
      bnxt_en: wildcard smac while creating tunnel decap filter · c8fb7b82
      Sunil Challa authored
      While creating a decap filter the tunnel smac need not (and must not) be
      specified as we cannot ascertain the neighbor in the recv path. 'ttl'
      match is also not needed for the decap filter and must be wild-carded.
      
      Fixes: f484f678 ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter")
      Signed-off-by: default avatarSunil Challa <sunilkumar.challa@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8fb7b82
    • Ray Jui's avatar
      bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown · a7f3f939
      Ray Jui authored
      The current 'bnxt_shutdown' implementation only invokes
      'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in
      the path of power off (SYSTEM_POWER_OFF). While this may work in most
      cases, it does not work in the smart NIC case, when Linux 'reboot'
      command is initiated from the Linux that runs on the ARM cores of the
      NIC card. In this particular case, Linux 'reboot' results in a system
      'L3' level reset where the entire ARM and associated subsystems are
      being reset, but at the same time, Nitro core is being kept in sane state
      (to allow external PCIe connected servers to continue to work). Without
      properly shutting down RoCE and freeing all associated resources, it
      results in the ARM core to hang immediately after the 'reboot'
      
      By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the
      above issue
      
      Fixes: 0efd2fc6 ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.")
      Signed-off-by: default avatarRay Jui <ray.jui@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7f3f939
  4. 02 Dec, 2017 4 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 2db767d9
      Linus Torvalds authored
      Pull NFS client fixes from Anna Schumaker:
       "These patches fix a problem with compiling using an old version of
        gcc, and also fix up error handling in the SUNRPC layer.
      
         - NFSv4: Ensure gcc 4.4.4 can compile initialiser for
           "invalid_stateid"
      
         - SUNRPC: Allow connect to return EHOSTUNREACH
      
         - SUNRPC: Handle ENETDOWN errors"
      
      * tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        SUNRPC: Handle ENETDOWN errors
        SUNRPC: Allow connect to return EHOSTUNREACH
        NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
      2db767d9
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 788c1da0
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Here are some bug fixes for 4.15-rc2.
      
         - fix memory leaks that appeared after removing ifork inline data
           buffer
      
         - recover deferred rmap update log items in correct order
      
         - fix memory leaks when buffer construction fails
      
         - fix memory leaks when bmbt is corrupt
      
         - fix some uninitialized variables and math problems in the quota
           scrubber
      
         - add some omitted attribution tags on the log replay commit
      
         - fix some UBSAN complaints about integer overflows with large sparse
           files
      
         - implement an effective inode mode check in online fsck
      
         - fix log's inability to retry quota item writeout due to transient
           errors"
      
      * tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Properly retry failed dquot items in case of error during buffer writeback
        xfs: scrub inode mode properly
        xfs: remove unused parameter from xfs_writepage_map
        xfs: ubsan fixes
        xfs: calculate correct offset in xfs_scrub_quota_item
        xfs: fix uninitialized variable in xfs_scrub_quota
        xfs: fix leaks on corruption errors in xfs_bmap.c
        xfs: fortify xfs_alloc_buftarg error handling
        xfs: log recovery should replay deferred ops in order
        xfs: always free inline data before resetting inode fork during ifree
      788c1da0
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of... · e1ba1c99
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
      
      Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt:
       "This contains a handful of small cleanups that are a result of
        feedback that didn't make it into our original patch set, either
        because the feedback hadn't been given yet, I missed the original
        emails, or we weren't ready to submit the changes yet.
      
        I've been maintaining the various cleanup patch sets I have as their
        own branches, which I then merged together and signed. Each merge
        commit has a short summary of the changes, and each branch is based on
        your latest tag (4.15-rc1, in this case). If this isn't the right way
        to do this then feel free to suggest something else, but it seems sane
        to me.
      
        Here's a short summary of the changes, roughly in order of how
        interesting they are.
      
         - libgcc.h has been moved from include/lib, where it's the only
           member, to include/linux. This is meant to avoid tab completion
           conflicts.
      
         - VDSO entries for clock_get/gettimeofday/getcpu have been added.
           These are simple syscalls now, but we want to let glibc use them
           from the start so we can make them faster later.
      
         - A VDSO entry for instruction cache flushing has been added so
           userspace can flush the instruction cache.
      
         - The VDSO symbol versions for __vdso_cmpxchg{32,64} have been
           removed, as those VDSO entries don't actually exist.
      
         - __io_writes has been corrected to respect the given type.
      
         - A new READ_ONCE in arch_spin_is_locked().
      
         - __test_and_op_bit_ord() is now actually ordered.
      
         - Various small fixes throughout the tree to enable allmodconfig to
           build cleanly.
      
         - Removal of some dead code in our atomic support headers.
      
         - Improvements to various comments in our atomic support headers"
      
      * tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits)
        RISC-V: __io_writes should respect the length argument
        move libgcc.h to include/linux
        RISC-V: Clean up an unused include
        RISC-V: Allow userspace to flush the instruction cache
        RISC-V: Flush I$ when making a dirty page executable
        RISC-V: Add missing include
        RISC-V: Use define for get_cycles like other architectures
        RISC-V: Provide stub of setup_profiling_timer()
        RISC-V: Export some expected symbols for modules
        RISC-V: move empty_zero_page definition to C and export it
        RISC-V: io.h: type fixes for warnings
        RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
        RISC-V: use generic serial.h
        RISC-V: remove spin_unlock_wait()
        RISC-V: `sfence.vma` orderes the instruction cache
        RISC-V: Add READ_ONCE in arch_spin_is_locked()
        RISC-V: __test_and_op_bit_ord should be strongly ordered
        RISC-V: Remove smb_mb__{before,after}_spinlock()
        RISC-V: Remove __smp_bp__{before,after}_atomic
        RISC-V: Comment on why {,cmp}xchg is ordered how it is
        ...
      e1ba1c99
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 4b1967c9
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The critical one here is a fix for fpsimd register corruption across
        signals which was introduced by the SVE support code (the register
        files overlap), but the others are worth having as well.
      
        Summary:
      
         - Fix FP register corruption when SVE is not available or in use
      
         - Fix out-of-tree module build failure when CONFIG_ARM64_MODULE_PLTS=y
      
         - Missing 'const' generating errors with LTO builds
      
         - Remove unsupported events from Cortex-A73 PMU description
      
         - Removal of stale and incorrect comments"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: context: Fix comments and remove pointless smp_wmb()
        arm64: cpu_ops: Add missing 'const' qualifiers
        arm64: perf: remove unsupported events for Cortex-A73
        arm64: fpsimd: Fix failure to restore FPSIMD state after signals
        arm64: pgd: Mark pgd_cache as __ro_after_init
        arm64: ftrace: emit ftrace-mod.o contents through code
        arm64: module-plts: factor out PLT generation code for ftrace
        arm64: mm: cleanup stale AIVIVT references
      4b1967c9
  5. 01 Dec, 2017 3 commits