1. 26 Jun, 2018 3 commits
    • Florian Westphal's avatar
      netfilter: nf_conncount: fix garbage collection confirm race · b36e4523
      Florian Westphal authored
      Yi-Hung Wei and Justin Pettit found a race in the garbage collection scheme
      used by nf_conncount.
      
      When doing list walk, we lookup the tuple in the conntrack table.
      If the lookup fails we remove this tuple from our list because
      the conntrack entry is gone.
      
      This is the common cause, but turns out its not the only one.
      The list entry could have been created just before by another cpu, i.e. the
      conntrack entry might not yet have been inserted into the global hash.
      
      The avoid this, we introduce a timestamp and the owning cpu.
      If the entry appears to be stale, evict only if:
       1. The current cpu is the one that added the entry, or,
       2. The timestamp is older than two jiffies
      
      The second constraint allows GC to be taken over by other
      cpu too (e.g. because a cpu was offlined or napi got moved to another
      cpu).
      
      We can't pretend the 'doubtful' entry wasn't in our list.
      Instead, when we don't find an entry indicate via IS_ERR
      that entry was removed ('did not exist' or withheld
      ('might-be-unconfirmed').
      
      This most likely also fixes a xt_connlimit imbalance earlier reported by
      Dmitry Andrianov.
      
      Cc: Dmitry Andrianov <dmitry.andrianov@alertme.com>
      Reported-by: default avatarJustin Pettit <jpettit@vmware.com>
      Reported-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b36e4523
    • Jann Horn's avatar
      netfilter: nf_log: don't hold nf_log_mutex during user access · ce00bf07
      Jann Horn authored
      The old code would indefinitely block other users of nf_log_mutex if
      a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
      region. Fix it by moving proc_dostring() out of the locked region.
      
      This is a followup to commit 266d07cb ("netfilter: nf_log: fix
      sleeping function called from invalid context"), which changed this code
      from using rcu_read_lock() to taking nf_log_mutex.
      
      Fixes: 266d07cb ("netfilter: nf_log: fix sleeping function calle[...]")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ce00bf07
    • Jann Horn's avatar
      netfilter: nf_log: fix uninit read in nf_log_proc_dostring · dffd22ae
      Jann Horn authored
      When proc_dostring() is called with a non-zero offset in strict mode, it
      doesn't just write to the ->data buffer, it also reads. Make sure it
      doesn't read uninitialized data.
      
      Fixes: c6ac37d8 ("netfilter: nf_log: fix error on write NONE to [...]")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dffd22ae
  2. 18 Jun, 2018 3 commits
    • Gao Feng's avatar
      netfilter: nf_ct_helper: Fix possible panic after nf_conntrack_helper_unregister · ad9852af
      Gao Feng authored
      The helper module would be unloaded after nf_conntrack_helper_unregister,
      so it may cause a possible panic caused by race.
      
      nf_ct_iterate_destroy(unhelp, me) reset the helper of conntrack as NULL,
      but maybe someone has gotten the helper pointer during this period. Then
      it would panic, when it accesses the helper and the module was unloaded.
      
      Take an example as following:
      CPU0                                                   CPU1
      ctnetlink_dump_helpinfo
      helper = rcu_dereference(help->helper);
                                                             unhelp
                                                             set helper as NULL
                                                             unload helper module
      helper->to_nlattr(skb, ct);
      
      As above, the cpu0 tries to access the helper and its module is unloaded,
      then the panic happens.
      Signed-off-by: default avatarGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ad9852af
    • Eric Dumazet's avatar
      netfilter: ipv6: nf_defrag: reduce struct net memory waste · 9ce7bc03
      Eric Dumazet authored
      It is a waste of memory to use a full "struct netns_sysctl_ipv6"
      while only one pointer is really used, considering netns_sysctl_ipv6
      keeps growing.
      
      Also, since "struct netns_frags" has cache line alignment,
      it is better to move the frags_hdr pointer outside, otherwise
      we spend a full cache line for this pointer.
      
      This saves 192 bytes of memory per netns.
      
      Fixes: c038a767 ("ipv6: add a new namespace for nf_conntrack_reasm")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9ce7bc03
    • Eric Dumazet's avatar
      netfilter: nf_queue: augment nfqa_cfg_policy · ba062ebb
      Eric Dumazet authored
      Three attributes are currently not verified, thus can trigger KMSAN
      warnings such as :
      
      BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
      BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
      BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
      CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ #5
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117
       __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620
       __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
       __fswab32 include/uapi/linux/swab.h:59 [inline]
       nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
       nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212
       netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448
       nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x43fd59
      RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680
      R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
       kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2753 [inline]
       __kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:988 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
       netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: fdb694a0 ("netfilter: Add fail-open support")
      Fixes: 829e17a1 ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ba062ebb
  3. 16 Jun, 2018 7 commits
    • Konstantin Khlebnikov's avatar
      net_sched: blackhole: tell upper qdisc about dropped packets · 7e85dc8c
      Konstantin Khlebnikov authored
      When blackhole is used on top of classful qdisc like hfsc it breaks
      qlen and backlog counters because packets are disappear without notice.
      
      In HFSC non-zero qlen while all classes are inactive triggers warning:
      WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
      and schedules watchdog work endlessly.
      
      This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
      this flag tells upper layer: this packet is gone and isn't queued.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e85dc8c
    • David S. Miller's avatar
      bluetooth: hci_nokia: Don't include linux/unaligned/le_struct.h directly. · a9122886
      David S. Miller authored
      This breaks the build as this header is not meant to be used in this
      way.
      
      ./include/linux/unaligned/access_ok.h:8:28: error: redefinition of ‘get_unaligned_le16’
       static __always_inline u16 get_unaligned_le16(const void *p)
                                  ^~~~~~~~~~~~~~~~~~
      In file included from drivers/bluetooth/hci_nokia.c:32:
      ./include/linux/unaligned/le_struct.h:7:19: note: previous definition of ‘get_unaligned_le16’ was here
       static inline u16 get_unaligned_le16(const void *p)
      
      Use asm/unaligned.h instead.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9122886
    • David Woodhouse's avatar
      atm: Preserve value of skb->truesize when accounting to vcc · 9bbe60a6
      David Woodhouse authored
      ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
      which they are to be sent. But it doesn't take ownership of those
      packets from the sock (if any) which originally owned them. They should
      remain owned by their actual sender until they've left the box.
      
      There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
      for certain skbs, precisely to avoid messing up sk_wmem_alloc
      accounting. Ideally that hack would cover the ATM use case too, but it
      doesn't — skbs which aren't owned by any sock, for example PPP control
      frames, still get their truesize adjusted when the low-level ATM driver
      adds headroom.
      
      This has always been an issue, it seems. The truesize of a packet
      increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
      for normal traffic, only for control frames. So I think we just got away
      with it, and we probably needed to send 2GiB of LCP echo frames before
      the misaccounting would ever have caused a problem and caused
      atm_may_send() to start refusing packets.
      
      Commit 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to
      refcount_t") did exactly what it was intended to do, and turned this
      mostly-theoretical problem into a real one, causing PPPoATM to fail
      immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
      starts refusing to allow new packets.
      
      The least intrusive solution to this problem is to stash the value of
      skb->truesize that was accounted to the VCC, in a new member of the
      ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
      value instead of the then-current value of skb->truesize.
      
      Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Tested-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bbe60a6
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0841d986
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-06-16
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a panic in devmap handling in generic XDP where return type
         of __devmap_lookup_elem() got changed recently but generic XDP
         code missed the related update, from Toshiaki.
      
      2) Fix a freeze when BPF progs are loaded that include BPF to BPF
         calls when JIT is enabled where we would later bail out via error
         path w/o dropping kallsyms, and another one to silence syzkaller
         splats from locking prog read-only, from Daniel.
      
      3) Fix a bug in test_offloads.py BPF selftest which must not assume
         that the underlying system have no BPF progs loaded prior to test,
         and one in bpftool to fix accuracy of program load time, from Jakub.
      
      4) Fix a bug in bpftool's probe for availability of the bpf(2)
         BPF_TASK_FD_QUERY subcommand, from Yonghong.
      
      5) Fix a regression in AF_XDP's XDP_SKB receive path where queue
         id check got erroneously removed, from Björn.
      
      6) Fix missing state cleanup in BPF's xfrm tunnel test, from William.
      
      7) Check tunnel type more accurately in BPF's tunnel collect metadata
         kselftest, from Jian.
      
      8) Fix missing Kconfig fragments for BPF kselftests, from Anders.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0841d986
    • Linus Torvalds's avatar
      Merge branch 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 35773c93
      Linus Torvalds authored
      Pull AFS updates from Al Viro:
       "Assorted AFS stuff - ended up in vfs.git since most of that consists
        of David's AFS-related followups to Christoph's procfs series"
      
      * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        afs: Optimise callback breaking by not repeating volume lookup
        afs: Display manually added cells in dynamic root mount
        afs: Enable IPv6 DNS lookups
        afs: Show all of a server's addresses in /proc/fs/afs/servers
        afs: Handle CONFIG_PROC_FS=n
        proc: Make inline name size calculation automatic
        afs: Implement network namespacing
        afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
        afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
        proc: Add a way to make network proc files writable
        afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
        afs: Rearrange fs/afs/proc.c to move the show routines up
        afs: Rearrange fs/afs/proc.c by moving fops and open functions down
        afs: Move /proc management functions to the end of the file
      35773c93
    • Linus Torvalds's avatar
      Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 29d6849d
      Linus Torvalds authored
      Pull compat updates from Al Viro:
       "Some biarch patches - getting rid of assorted (mis)uses of
        compat_alloc_user_space().
      
        Not much in that area this cycle..."
      
      * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        orangefs: simplify compat ioctl handling
        signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
        vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
      29d6849d
    • Linus Torvalds's avatar
      Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a5b729ea
      Linus Torvalds authored
      Pull aio fixes from Al Viro:
       "Assorted AIO followups and fixes"
      
      * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        eventpoll: switch to ->poll_mask
        aio: only return events requested in poll_mask() for IOCB_CMD_POLL
        eventfd: only return events requested in poll_mask()
        aio: mark __aio_sigset::sigmask const
      a5b729ea
  4. 15 Jun, 2018 27 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9215310c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various netfilter fixlets from Pablo and the netfilter team.
      
       2) Fix regression in IPVS caused by lack of PMTU exceptions on local
          routes in ipv6, from Julian Anastasov.
      
       3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.
      
       4) Don't crash on poll in TLS, from Daniel Borkmann.
      
       5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
          including Avahi mDNS. From Bart Van Assche.
      
       6) Missing of_node_put in qcom/emac driver, from Yue Haibing.
      
       7) We lack checking of the TCP checking in one special case during SYN
          receive, from Frank van der Linden.
      
       8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.
      
       9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.
      
      10) Must grab HW caps before doing quirk checks in stmmac driver, from
          Jose Abreu.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
        net: stmmac: Run HWIF Quirks after getting HW caps
        neighbour: skip NTF_EXT_LEARNED entries during forced gc
        net: cxgb3: add error handling for sysfs_create_group
        tls: fix waitall behavior in tls_sw_recvmsg
        tls: fix use-after-free in tls_push_record
        l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
        l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
        mlxsw: spectrum_switchdev: Fix port_vlan refcounting
        mlxsw: spectrum_router: Align with new route replace logic
        mlxsw: spectrum_router: Allow appending to dev-only routes
        ipv6: Only emit append events for appended routes
        stmmac: added support for 802.1ad vlan stripping
        cfg80211: fix rcu in cfg80211_unregister_wdev
        mac80211: Move up init of TXQs
        mac80211_hwsim: fix module init error paths
        cfg80211: initialize sinfo in cfg80211_get_station
        nl80211: fix some kernel doc tag mistakes
        hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
        rds: avoid unenecessary cong_update in loop transport
        l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
        ...
      9215310c
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · de7f01c2
      Linus Torvalds authored
      Pull module updates from Jessica Yu:
       "Minor code cleanup and also allow sig_enforce param to be shown in
        sysfs with CONFIG_MODULE_SIG_FORCE"
      
      * tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        module: Allow to always show the status of modsign
        module: Do not access sig_enforce directly
      de7f01c2
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 8d1e5133
      Linus Torvalds authored
      Pull uml updates from Richard Weinberger:
       "Minor updates for UML:
      
         - fixes for our new vector network driver by Anton
      
         - initcall cleanup by Alexander
      
         - We have a new mailinglist, sourceforge.net sucks"
      
      * 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Fix raw interface options
        um: Fix initialization of vector queues
        um: remove uml initcalls
        um: Update mailing list address
      8d1e5133
    • Toshiaki Makita's avatar
      xdp: Fix handling of devmap in generic XDP · 6d5fc195
      Toshiaki Makita authored
      Commit 67f29e07 ("bpf: devmap introduce dev_map_enqueue") changed
      the return value type of __devmap_lookup_elem() from struct net_device *
      to struct bpf_dtab_netdev * but forgot to modify generic XDP code
      accordingly.
      
      Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
      net_device is expected, then skb->dev was set to invalid value.
      
      v2:
      - Fix compiler warning without CONFIG_BPF_SYSCALL.
      
      Fixes: 67f29e07 ("bpf: devmap introduce dev_map_enqueue")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6d5fc195
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.18-merge_window' of... · 6a4d4b32
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V updates from Palmer Dabbelt:
       "This contains some small RISC-V updates I'd like to target for 4.18.
      
        They are all fairly small this time. Here's a short summary, there's
        more info in the commits/merges:
      
         - a fix to __clear_user to respect the passed arguments.
      
         - enough support for the perf subsystem to work with RISC-V's ISA
           defined performance counters.
      
         - support for sparse and cleanups suggested by it.
      
         - support for R_RISCV_32 (a relocation, not the 32-bit ISA).
      
         - some MAINTAINERS cleanups.
      
         - the addition of CONFIG_HVC_RISCV_SBI to our defconfig, as it's
           always present.
      
        I've given these a simple build+boot test"
      
      * tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig
        RISC-V: Handle R_RISCV_32 in modules
        riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set
        riscv: add riscv-specific predefines to CHECKFLAGS
        riscv: split the declaration of __copy_user
        riscv: no __user for probe_kernel_address()
        riscv: use NULL instead of a plain 0
        perf: riscv: Add Document for Future Porting Guide
        perf: riscv: preliminary RISC-V support
        MAINTAINERS: Update Albert's email, he's back at Berkeley
        MAINTAINERS: Add myself as a maintainer for SiFive's drivers
        riscv: Fix the bug in memory access fixup code
      6a4d4b32
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 8949170c
      Linus Torvalds authored
      Pull more kvm updates from Paolo Bonzini:
       "Mostly the PPC part of the release, but also switching to Arnd's fix
        for the hyperv config issue and a typo fix.
      
        Main PPC changes:
      
         - reimplement the MMIO instruction emulation
      
         - transactional memory support for PR KVM
      
         - improve radix page table handling"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (63 commits)
        KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
        KVM: x86: fix typo at kvm_arch_hardware_setup comment
        KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulation
        KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT mode
        KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bit
        KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulation
        KVM: PPC: Book3S PR: Fix MSR setting when delivering interrupts
        KVM: PPC: Book3S PR: Handle additional interrupt types
        KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
        KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
        KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
        KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
        KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
        KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
        KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
        KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
        KVM: PPC: Book3S PR: Add emulation for trechkpt.
        KVM: PPC: Book3S PR: Add emulation for treclaim.
        KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
        KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
        ...
      8949170c
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2f3f0566
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "virtio, vhost: features, fixes
      
         - PCI virtual function support for virtio
      
         - DMA barriers for virtio strong barriers
      
         - bugfixes"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio: update the comments for transport features
        virtio_pci: support enabling VFs
        vhost: fix info leak due to uninitialized memory
        virtio_ring: switch to dma_XX barriers for rpmsg
      2f3f0566
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-fixes' · b5518c70
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      First one is a panic I ran into while testing the second
      one where we got several syzkaller reports. Series here
      fixes both.
      
      Thanks!
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b5518c70
    • Daniel Borkmann's avatar
      bpf: reject any prog that failed read-only lock · 9facc336
      Daniel Borkmann authored
      We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
      as well as the BPF image as read-only through bpf_prog_lock_ro(). In
      the case any of these would fail we throw a WARN_ON_ONCE() in order to
      yell loudly to the log. Perhaps, to some extend, this may be comparable
      to an allocation where __GFP_NOWARN is explicitly not set.
      
      Added via 65869a47 ("bpf: improve read-only handling"), this behavior
      is slightly different compared to any of the other in-kernel set_memory_ro()
      users who do not check the return code of set_memory_ro() and friends /at
      all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
      in BPF this is mandatory hardening step, we want to know whether there
      are any issues that would leave both BPF data writable. So it happens
      that syzkaller enabled fault injection and it triggered memory allocation
      failure deep inside x86's change_page_attr_set_clr() which was triggered
      from set_memory_ro().
      
      Now, there are two options: i) leaving everything as is, and ii) reworking
      the image locking code in order to have a final checkpoint out of the
      central bpf_prog_select_runtime() which probes whether any of the calls
      during prog setup weren't successful, and then bailing out with an error.
      Option ii) is a better approach since this additional paranoia avoids
      altogether leaving any potential W+X pages from BPF side in the system.
      Therefore, lets be strict about it, and reject programs in such unlikely
      occasion. While testing I noticed also that one bpf_prog_lock_ro()
      call was missing on the outer dummy prog in case of calls, e.g. in the
      destructor we call bpf_prog_free_deferred() on the main prog where we
      try to bpf_prog_unlock_free() the program, and since we go via
      bpf_prog_select_runtime() do that as well.
      
      Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
      Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9facc336
    • Daniel Borkmann's avatar
      bpf: fix panic in prog load calls cleanup · 7d1982b4
      Daniel Borkmann authored
      While testing I found that when hitting error path in bpf_prog_load()
      where we jump to free_used_maps and prog contained BPF to BPF calls
      that were JITed earlier, then we never clean up the bpf_prog_kallsyms_add()
      done under jit_subprogs(). Add proper API to make BPF kallsyms deletion
      more clear and fix that.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d1982b4
    • Jose Abreu's avatar
      net: stmmac: Run HWIF Quirks after getting HW caps · 7cfde0af
      Jose Abreu authored
      Currently we were running HWIF quirks before getting HW capabilities.
      This is not right because some HWIF callbacks depend on HW caps.
      
      Lets save the quirks callback and use it in a later stage.
      
      This fixes Altera socfpga.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Fixes: 5f0456b4 ("net: stmmac: Implement logic to automatically select HW Interface")
      Reported-by: default avatarDinh Nguyen <dinh.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Vitor Soares <soares@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Dinh Nguyen <dinh.linux@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cfde0af
    • Roopa Prabhu's avatar
      neighbour: skip NTF_EXT_LEARNED entries during forced gc · f6a6f203
      Roopa Prabhu authored
      Commit 9ce33e46 ("neighbour: support for NTF_EXT_LEARNED flag")
      added support for NTF_EXT_LEARNED for neighbour entries.
      NTF_EXT_LEARNED entries are neigh entries managed by control
      plane (eg: Ethernet VPN implementation in FRR routing suite).
      Periodic gc already excludes these entries. This patch extends
      it to forced gc which the earlier patch missed.
      
      Fixes: 9ce33e46 ("neighbour: support for NTF_EXT_LEARNED flag")
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6a6f203
    • Zhouyang Jia's avatar
      net: cxgb3: add error handling for sysfs_create_group · 7c099773
      Zhouyang Jia authored
      When sysfs_create_group fails, the lack of error-handling code may
      cause unexpected results.
      
      This patch adds error-handling code after calling sysfs_create_group.
      Signed-off-by: default avatarZhouyang Jia <jiazhouyang09@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c099773
    • David S. Miller's avatar
      Merge branch 'tls-fixes' · c14a0246
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Two tls fixes
      
      First one is syzkaller trigered uaf and second one noticed
      while writing test code with tls ulp. For details please see
      individual patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c14a0246
    • Daniel Borkmann's avatar
      tls: fix waitall behavior in tls_sw_recvmsg · 06030dba
      Daniel Borkmann authored
      Current behavior in tls_sw_recvmsg() is to wait for incoming tls
      messages and copy up to exactly len bytes of data that the user
      provided. This is problematic in the sense that i) if no packet
      is currently queued in strparser we keep waiting until one has been
      processed and pushed into tls receive layer for tls_wait_data() to
      wake up and push the decrypted bits to user space. Given after
      tls decryption, we're back at streaming data, use sock_rcvlowat()
      hint from tcp socket instead. Retain current behavior with MSG_WAITALL
      flag and otherwise use the hint target for breaking the loop and
      returning to application. This is done if currently no ctx->recv_pkt
      is ready, otherwise continue to process it from our strparser
      backlog.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06030dba
    • Daniel Borkmann's avatar
      tls: fix use-after-free in tls_push_record · a447da7d
      Daniel Borkmann authored
      syzkaller managed to trigger a use-after-free in tls like the
      following:
      
        BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
        Write of size 1 at addr ffff88037aa08000 by task a.out/2317
      
        CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        Call Trace:
         dump_stack+0x71/0xab
         print_address_description+0x6a/0x280
         kasan_report+0x258/0x380
         ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_sw_push_pending_record+0x2e/0x40 [tls]
         tls_sk_proto_close+0x3fe/0x710 [tls]
         ? tcp_check_oom+0x4c0/0x4c0
         ? tls_write_space+0x260/0x260 [tls]
         ? kmem_cache_free+0x88/0x1f0
         inet_release+0xd6/0x1b0
         __sock_release+0xc0/0x240
         sock_close+0x11/0x20
         __fput+0x22d/0x660
         task_work_run+0x114/0x1a0
         do_exit+0x71a/0x2780
         ? mm_update_next_owner+0x650/0x650
         ? handle_mm_fault+0x2f5/0x5f0
         ? __do_page_fault+0x44f/0xa50
         ? mm_fault_error+0x2d0/0x2d0
         do_group_exit+0xde/0x300
         __x64_sys_exit_group+0x3a/0x50
         do_syscall_64+0x9a/0x300
         ? page_fault+0x8/0x30
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happened through fault injection where aead_req allocation in
      tls_do_encryption() eventually failed and we returned -ENOMEM from
      the function. Turns out that the use-after-free is triggered from
      tls_sw_sendmsg() in the second tls_push_record(). The error then
      triggers a jump to waiting for memory in sk_stream_wait_memory()
      resp. returning immediately in case of MSG_DONTWAIT. What follows is
      the trim_both_sgl(sk, orig_size), which drops elements from the sg
      list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
      when the socket is being closed, where tls_sk_proto_close() callback
      is invoked. The tls_complete_pending_work() will figure that there's
      a pending closed tls record to be flushed and thus calls into the
      tls_push_pending_closed_record() from there. ctx->push_pending_record()
      is called from the latter, which is the tls_sw_push_pending_record()
      from sw path. This again calls into tls_push_record(). And here the
      tls_fill_prepend() will panic since the buffer address has been freed
      earlier via trim_both_sgl(). One way to fix it is to move the aead
      request allocation out of tls_do_encryption() early into tls_push_record().
      This means we don't prep the tls header and advance state to the
      TLS_PENDING_CLOSED_RECORD before allocation which could potentially
      fail happened. That fixes the issue on my side.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
      Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a447da7d
    • David S. Miller's avatar
      Merge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions' · 695ad876
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      l2tp: l2tp_ppp must ignore non-PPP sessions
      
      The original L2TP code was written for version 2 of the protocol, which
      could only carry PPP sessions. Then L2TPv3 generalised the protocol so that
      it could transport different kinds of pseudo-wires. But parts of the
      l2tp_ppp module still break in presence of non-PPP sessions.
      
      Assuming L2TPv2 tunnels can only transport PPP sessions is right, but
      l2tp_netlink failed to ensure that (fixed in patch 1).
      When retrieving a session from an arbitrary tunnel, l2tp_ppp needs to
      filter out non-PPP sessions (last occurrence fixed in patch 2).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      695ad876
    • Guillaume Nault's avatar
      l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl() · ecd012e4
      Guillaume Nault authored
      pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
      'session' may be an Ethernet pseudo-wire.
      
      However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
      assumes l2tp_session_priv() points to a pppol2tp_session structure. For
      an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
      structure instead, making pppol2tp_session_ioctl() access invalid
      memory.
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecd012e4
    • Guillaume Nault's avatar
      l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels · de9bada5
      Guillaume Nault authored
      The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
      L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
      found there. However, l2tp_netlink accepts creating Ethernet sessions
      regardless of the underlying tunnel version.
      
      This confuses pppol2tp_seq_session_show(), which expects that
      l2tp_session_priv() returns a pppol2tp_session structure. When the
      session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
      instead. This leads to invalid memory access when
      pppol2tp_session_get_sock() later tries to dereference ps->sk.
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de9bada5
    • David S. Miller's avatar
      Merge branch 'mlxsw-IPv6-and-reference-counting-fixes' · eab9a2d5
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: IPv6 and reference counting fixes
      
      The first three patches fix a mismatch between the new IPv6 behavior
      introduced in commit f34436a4 ("net/ipv6: Simplify route replace and
      appending into multipath route") and mlxsw. The patches allow the driver
      to support multipathing in IPv6 overlays with GRE tunnel devices. A
      selftest will be submitted when net-next opens.
      
      The last patch fixes a reference count problem of the port_vlan struct.
      I plan to simplify the code in net-next, so that reference counting is
      not necessary anymore.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eab9a2d5
    • Petr Machata's avatar
      mlxsw: spectrum_switchdev: Fix port_vlan refcounting · 9e25826f
      Petr Machata authored
      Switchdev notifications for addition of SWITCHDEV_OBJ_ID_PORT_VLAN are
      distributed not only on clean addition, but also when flags on an
      existing VLAN are changed. mlxsw_sp_bridge_port_vlan_add() calls
      mlxsw_sp_port_vlan_get() to get at the port_vlan in question, which
      implicitly references the object. This then leads to discrepancies in
      reference counting when the VLAN is removed. spectrum.c warns about the
      problem when the module is removed:
      
      [13578.493090] WARNING: CPU: 0 PID: 2454 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2973 mlxsw_sp_port_remove+0xfd/0x110 [mlxsw_spectrum]
      [...]
      [13578.627106] Call Trace:
      [13578.629617]  mlxsw_sp_fini+0x2a/0xe0 [mlxsw_spectrum]
      [13578.634748]  mlxsw_core_bus_device_unregister+0x3e/0x130 [mlxsw_core]
      [13578.641290]  mlxsw_pci_remove+0x13/0x40 [mlxsw_pci]
      [13578.646238]  pci_device_remove+0x31/0xb0
      [13578.650244]  device_release_driver_internal+0x14f/0x220
      [13578.655562]  driver_detach+0x32/0x70
      [13578.659183]  bus_remove_driver+0x47/0xa0
      [13578.663134]  pci_unregister_driver+0x1e/0x80
      [13578.667486]  mlxsw_sp_module_exit+0xc/0x3fa [mlxsw_spectrum]
      [13578.673207]  __x64_sys_delete_module+0x13b/0x1e0
      [13578.677888]  ? exit_to_usermode_loop+0x78/0x80
      [13578.682374]  do_syscall_64+0x39/0xe0
      [13578.685976]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fix by putting the port_vlan when mlxsw_sp_port_vlan_bridge_join()
      determines it's a flag-only change.
      
      Fixes: b3529af6 ("spectrum: Reference count VLAN entries")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e25826f
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Align with new route replace logic · ce45bded
      Ido Schimmel authored
      Commit f34436a4 ("net/ipv6: Simplify route replace and appending
      into multipath route") changed the IPv6 route replace logic so that the
      first matching route (i.e., same metric) is replaced.
      
      Have mlxsw replace the first matching route as well.
      
      Fixes: f34436a4 ("net/ipv6: Simplify route replace and appending into multipath route")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce45bded
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Allow appending to dev-only routes · 53b562df
      Ido Schimmel authored
      Commit f34436a4 ("net/ipv6: Simplify route replace and appending
      into multipath route") changed the IPv6 route append logic so that
      dev-only routes can be appended and not only gatewayed routes.
      
      Align mlxsw with the new behaviour.
      
      Fixes: f34436a4 ("net/ipv6: Simplify route replace and appending into multipath route")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53b562df
    • Ido Schimmel's avatar
      ipv6: Only emit append events for appended routes · 6eba08c3
      Ido Schimmel authored
      Current code will emit an append event in the FIB notification chain for
      any route added with NLM_F_APPEND set, even if the route was not
      appended to any existing route.
      
      This is inconsistent with IPv4 where such an event is only emitted when
      the new route is appended after an existing one.
      
      Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
      handle these events.
      
      Fixes: f34436a4 ("net/ipv6: Simplify route replace and appending into multipath route")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6eba08c3
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2018-06-15' of... · 41f9ba67
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2018-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A handful of fixes:
       * missing RCU grace period enforcement led to drivers freeing
         data structures before; fix from Dedy Lansky.
       * hwsim module init error paths were messed up; fixed it myself
         after a report from Colin King (who had sent a partial patch)
       * kernel-doc tag errors; fix from Luca Coelho
       * initialize the on-stack sinfo data structure when getting
         station information; fix from Sven Eckelmann
       * TXQ state dumping is now done from init, and when TXQs aren't
         initialized yet at that point, bad things happen, move the
         initialization; fix from Toke Høiland-Jørgensen.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41f9ba67
    • Elad Nachman's avatar
      stmmac: added support for 802.1ad vlan stripping · ab188e8f
      Elad Nachman authored
      stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before
      calling napi_gro_receive().
      
      The function assumes VLAN tagged frames are always tagged with
      802.1Q protocol, and assigns ETH_P_8021Q to the skb by hard-coding
      the parameter on call to __vlan_hwaccel_put_tag() .
      
      This causes packets not to be passed to the VLAN slave if it was created
      with 802.1AD protocol
      (ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).
      
      This fix passes the protocol from the VLAN header into
      __vlan_hwaccel_put_tag() instead of using the hard-coded value of
      ETH_P_8021Q.
      
      NETIF_F_HW_VLAN_STAG_RX check was added and the strip action is now
      dependent on the correct combination of features and the detected vlan tag.
      
      NETIF_F_HW_VLAN_STAG_RX feature was added to be in line with the driver
      actual abilities.
      Signed-off-by: default avatarElad Nachman <eladn@gilat.com>
      Reviewed-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab188e8f
    • David Howells's avatar
      afs: Optimise callback breaking by not repeating volume lookup · 47ea0f2e
      David Howells authored
      At the moment, afs_break_callbacks calls afs_break_one_callback() for each
      separate FID it was given, and the latter looks up the volume individually
      for each one.
      
      However, this is inefficient if two or more FIDs have the same vid as we
      could reuse the volume.  This is complicated by cell aliasing whereby we
      may have multiple cells sharing a volume and can therefore have multiple
      callback interests for any particular volume ID.
      
      At the moment afs_break_one_callback() scans the entire list of volumes
      we're getting from a server and breaks the appropriate callback in every
      matching volume, regardless of cell.  This scan is done for every FID.
      
      Optimise callback breaking by the following means:
      
       (1) Sort the FID list by vid so that all FIDs belonging to the same volume
           are clumped together.
      
           This is done through the use of an indirection table as we cannot do
           an insertion sort on the afs_callback_break array as we decode FIDs
           into it as we subsequently also have to decode callback info into it
           that corresponds by array index only.
      
           We also don't really want to bubblesort afterwards if we can avoid it.
      
       (2) Sort the server->cb_interests array by vid so that all the matching
           volumes are grouped together.  This permits the scan to stop after
           finding a record that has a higher vid.
      
       (3) When breaking FIDs, we try to keep server->cb_break_lock as long as
           possible, caching the start point in the array for that volume group
           as long as possible.
      
           It might make sense to add another layer in that list and have a
           refcounted volume ID anchor that has the matching interests attached
           to it rather than being in the list.  This would allow the lock to be
           dropped without losing the cursor.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      47ea0f2e