1. 23 Oct, 2020 1 commit
    • zhuoliang zhang's avatar
      net: xfrm: fix a race condition during allocing spi · a779d913
      zhuoliang zhang authored
      we found that the following race condition exists in
      xfrm_alloc_userspi flow:
      
      user thread                                    state_hash_work thread
      ----                                           ----
      xfrm_alloc_userspi()
       __find_acq_core()
         /*alloc new xfrm_state:x*/
         xfrm_state_alloc()
         /*schedule state_hash_work thread*/
         xfrm_hash_grow_check()   	               xfrm_hash_resize()
       xfrm_alloc_spi                                  /*hold lock*/
            x->id.spi = htonl(spi)                     spin_lock_bh(&net->xfrm.xfrm_state_lock)
            /*waiting lock release*/                     xfrm_hash_transfer()
            spin_lock_bh(&net->xfrm.xfrm_state_lock)      /*add x into hlist:net->xfrm.state_byspi*/
      	                                                hlist_add_head_rcu(&x->byspi)
                                                       spin_unlock_bh(&net->xfrm.xfrm_state_lock)
      
          /*add x into hlist:net->xfrm.state_byspi 2 times*/
          hlist_add_head_rcu(&x->byspi)
      
      1. a new state x is alloced in xfrm_state_alloc() and added into the bydst hlist
      in  __find_acq_core() on the LHS;
      2. on the RHS, state_hash_work thread travels the old bydst and tranfers every xfrm_state
      (include x) into the new bydst hlist and new byspi hlist;
      3. user thread on the LHS gets the lock and adds x into the new byspi hlist again.
      
      So the same xfrm_state (x) is added into the same list_hash
      (net->xfrm.state_byspi) 2 times that makes the list_hash become
      an inifite loop.
      
      To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved
      to the back of spin_lock_bh, sothat state_hash_work thread no longer add x
      which id.spi is zero into the hash_list.
      
      Fixes: f034b5d4 ("[XFRM]: Dynamic xfrm_state hash table sizing.")
      Signed-off-by: default avatarzhuoliang zhang <zhuoliang.zhang@mediatek.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      a779d913
  2. 09 Oct, 2020 1 commit
    • Xin Long's avatar
      xfrm: interface: fix the priorities for ipip and ipv6 tunnels · 7fe94612
      Xin Long authored
      As Nicolas noticed in his case, when xfrm_interface module is installed
      the standard IP tunnels will break in receiving packets.
      
      This is caused by the IP tunnel handlers with a higher priority in xfrm
      interface processing incoming packets by xfrm_input(), which would drop
      the packets and return 0 instead when anything wrong happens.
      
      Rather than changing xfrm_input(), this patch is to adjust the priority
      for the IP tunnel handlers in xfrm interface, so that the packets would
      go to xfrmi's later than the others', as the others' would not drop the
      packets when the handlers couldn't process them.
      
      Note that IPCOMP also defines its own IPIP tunnel handler and it calls
      xfrm_input() as well, so we must make its priority lower than xfrmi's,
      which means having xfrmi loaded would still break IPCOMP. We may seek
      another way to fix it in xfrm_input() in the future.
      Reported-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Tested-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Fixes: da9bbf05 ("xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler")
      FIxes: d7b360c2 ("xfrm: interface: support IP6IP6 and IP6IP tunnels processing with .cb_handler")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      7fe94612
  3. 08 Oct, 2020 19 commits
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 3fdd47c3
      Linus Torvalds authored
      Pull vhost fixes from Michael Tsirkin:
       "Some last minute vhost,vdpa fixes.
      
        The last two of them haven't been in next but they do seem kind of
        obvious, very small and safe, fix bugs reported in the field, and they
        are both in a new mlx5 vdpa driver, so it's not like we can introduce
        regressions"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vdpa/mlx5: Fix dependency on MLX5_CORE
        vdpa/mlx5: should keep avail_index despite device status
        vhost-vdpa: fix page pinning leakage in error path
        vhost-vdpa: fix vhost_vdpa_map() on error condition
        vhost: Don't call log_access_ok() when using IOTLB
        vhost: Use vhost_get_used_size() in vhost_vring_set_addr()
        vhost: Don't call access_ok() when using IOTLB
        vhost vdpa: fix vhost_vdpa_open error handling
      3fdd47c3
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6288c1d8
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "One more set of fixes from the networking tree:
      
         - add missing input validation in nl80211_del_key(), preventing
           out-of-bounds access
      
         - last minute fix / improvement of a MRP netlink (uAPI) interface
           introduced in 5.9 (current) release
      
         - fix "unresolved symbol" build error under CONFIG_NET w/o
           CONFIG_INET due to missing tcp_timewait_sock and inet_timewait_sock
           BTF.
      
         - fix 32 bit sub-register bounds tracking in the bpf verifier for OR
           case
      
         - tcp: fix receive window update in tcp_add_backlog()
      
         - openvswitch: handle DNAT tuple collision in conntrack-related code
      
         - r8169: wait for potential PHY reset to finish after applying a FW
           file, avoiding unexpected PHY behaviour and failures later on
      
         - mscc: fix tail dropping watermarks for Ocelot switches
      
         - avoid use-after-free in macsec code after a call to the GRO layer
      
         - avoid use-after-free in sctp error paths
      
         - add a device id for Cellient MPL200 WWAN card
      
         - rxrpc fixes:
            - fix the xdr encoding of the contents read from an rxrpc key
            - fix a BUG() for a unsupported encoding type.
            - fix missing _bh lock annotations.
            - fix acceptance handling for an incoming call where the incoming
              call is encrypted.
            - the server token keyring isn't network namespaced - it belongs
              to the server, so there's no need. Namespacing it means that
              request_key() fails to find it.
            - fix a leak of the server keyring"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (21 commits)
        net: usb: qmi_wwan: add Cellient MPL200 card
        macsec: avoid use-after-free in macsec_handle_frame()
        r8169: consider that PHY reset may still be in progress after applying firmware
        openvswitch: handle DNAT tuple collision
        sctp: fix sctp_auth_init_hmacs() error path
        bridge: Netlink interface fix.
        net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
        bpf: Fix scalar32_min_max_or bounds tracking
        tcp: fix receive window update in tcp_add_backlog()
        net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
        mptcp: more DATA FIN fixes
        net: mscc: ocelot: warn when encoding an out-of-bounds watermark value
        net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP
        net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()
        rxrpc: Fix server keyring leak
        rxrpc: The server keyring isn't network-namespaced
        rxrpc: Fix accept on a connection that need securing
        rxrpc: Fix some missing _bh annotations on locking conn->state_lock
        rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
        rxrpc: Fix rxkad token xdr encoding
        ...
      6288c1d8
    • Eli Cohen's avatar
      vdpa/mlx5: Fix dependency on MLX5_CORE · aff90770
      Eli Cohen authored
      Remove propmt for selecting MLX5_VDPA by the user and modify
      MLX5_VDPA_NET to select MLX5_VDPA. Also modify MLX5_VDPA_NET to depend
      on mlx5_core.
      
      This fixes an issue where configuration sets 'y' for MLX5_VDPA_NET while
      MLX5_CORE is compiled as a module causing link errors.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 device")s
      Signed-off-by: default avatarEli Cohen <elic@nvidia.com>
      Link: https://lore.kernel.org/r/20201007064011.GA50074@mtl-vdi-166.wap.labs.mlnxSigned-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      aff90770
    • Si-Wei Liu's avatar
      vdpa/mlx5: should keep avail_index despite device status · 3176e974
      Si-Wei Liu authored
      A VM with mlx5 vDPA has below warnings while being reset:
      
      vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
      vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
      
      We should allow userspace emulating the virtio device be
      able to get to vq's avail_index, regardless of vDPA device
      status. Save the index that was last seen when virtq was
      stopped, so that userspace doesn't complain.
      Signed-off-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Link: https://lore.kernel.org/r/1601583511-15138-1-git-send-email-si-wei.liu@oracle.comSigned-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarEli Cohen <elic@nvidia.com>
      3176e974
    • Wilken Gottwalt's avatar
      net: usb: qmi_wwan: add Cellient MPL200 card · 28802e7c
      Wilken Gottwalt authored
      Add usb ids of the Cellient MPL200 card.
      Signed-off-by: default avatarWilken Gottwalt <wilken.gottwalt@mailbox.org>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      28802e7c
    • Eric Dumazet's avatar
      macsec: avoid use-after-free in macsec_handle_frame() · c7cc9200
      Eric Dumazet authored
      De-referencing skb after call to gro_cells_receive() is not allowed.
      We need to fetch skb->len earlier.
      
      Fixes: 5491e7c6 ("macsec: enable GRO and RPS on macsec devices")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c7cc9200
    • Heiner Kallweit's avatar
      r8169: consider that PHY reset may still be in progress after applying firmware · 47dda786
      Heiner Kallweit authored
      Some firmware files trigger a PHY soft reset and don't wait for it to
      be finished. PHY register writes directly after applying the firmware
      may fail or provide unexpected results therefore. Fix this by waiting
      for bit BMCR_RESET to be cleared after applying firmware.
      
      There's nothing wrong with the referenced change, it's just that the
      fix will apply cleanly only after this change.
      
      Fixes: 89fbd26c ("r8169: fix firmware not resetting tp->ocp_base")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      47dda786
    • Dumitru Ceara's avatar
      openvswitch: handle DNAT tuple collision · 8aa7b526
      Dumitru Ceara authored
      With multiple DNAT rules it's possible that after destination
      translation the resulting tuples collide.
      
      For example, two openvswitch flows:
      nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
      nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
      
      Assuming two TCP clients initiating the following connections:
      10.0.0.10:5000->10.0.0.10:10
      10.0.0.10:5000->10.0.0.20:10
      
      Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
      nf_conntrack_confirm() to fail because of tuple collision.
      
      Netfilter handles this case by allocating a null binding for SNAT at
      egress by default.  Perform the same operation in openvswitch for DNAT
      if no explicit SNAT is requested by the user and allocate a null binding
      for SNAT for packets in the "original" direction.
      
      Reported-at: https://bugzilla.redhat.com/1877128Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: default avatarDumitru Ceara <dceara@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8aa7b526
    • Eric Dumazet's avatar
      sctp: fix sctp_auth_init_hmacs() error path · d42ee76e
      Eric Dumazet authored
      After freeing ep->auth_hmacs we have to clear the pointer
      or risk use-after-free as reported by syzbot:
      
      BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
      BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
      BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
      Read of size 8 at addr ffff8880a8ff52c0 by task syz-executor941/6874
      
      CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x198/0x1fd lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
       sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
       sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
       sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203
       sctp_endpoint_put net/sctp/endpointola.c:236 [inline]
       sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183
       sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981
       sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415
       sk_common_release+0x64/0x390 net/core/sock.c:3254
       sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475
       __sock_release+0xcd/0x280 net/socket.c:596
       sock_close+0x18/0x20 net/socket.c:1277
       __fput+0x285/0x920 fs/file_table.c:281
       task_work_run+0xdd/0x190 kernel/task_work.c:141
       exit_task_work include/linux/task_work.h:25 [inline]
       do_exit+0xb7d/0x29f0 kernel/exit.c:806
       do_group_exit+0x125/0x310 kernel/exit.c:903
       __do_sys_exit_group kernel/exit.c:914 [inline]
       __se_sys_exit_group kernel/exit.c:912 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x43f278
      Code: Bad RIP value.
      RSP: 002b:00007fffe0995c38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f278
      RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
      RBP: 00000000004bf068 R08: 00000000000000e7 R09: ffffffffffffffd0
      R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000001
      R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 6874:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
       kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554
       kmalloc include/linux/slab.h:554 [inline]
       kmalloc_array include/linux/slab.h:593 [inline]
       kcalloc include/linux/slab.h:605 [inline]
       sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464
       sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
       sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
       sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
       __sys_setsockopt+0x2db/0x610 net/socket.c:2132
       __do_sys_setsockopt net/socket.c:2143 [inline]
       __se_sys_setsockopt net/socket.c:2140 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 6874:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
       kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
       __cache_free mm/slab.c:3422 [inline]
       kfree+0x10e/0x2b0 mm/slab.c:3760
       sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline]
       sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
       sctp_auth_init_hmacs net/sctp/auth.c:496 [inline]
       sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454
       sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
       sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
       sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
       __sys_setsockopt+0x2db/0x610 net/socket.c:2132
       __do_sys_setsockopt net/socket.c:2143 [inline]
       __se_sys_setsockopt net/socket.c:2140 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1f485649 ("[SCTP]: Implement SCTP-AUTH internals")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d42ee76e
    • Jakub Kicinski's avatar
      Merge tag 'mac80211-for-net-2020-10-08' of... · a9e54cb3
      Jakub Kicinski authored
      Merge tag 'mac80211-for-net-2020-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      pull-request: mac80211 2020-10-08
      
      A single fix for missing input validation in nl80211.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a9e54cb3
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · cfe90f49
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-10-08
      
      The main changes are:
      
      1) Fix "unresolved symbol" build error under CONFIG_NET w/o CONFIG_INET due
         to missing tcp_timewait_sock and inet_timewait_sock BTF, from Yonghong Song.
      
      2) Fix 32 bit sub-register bounds tracking for OR case, from Daniel Borkmann.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cfe90f49
    • Henrik Bjoernlund's avatar
      bridge: Netlink interface fix. · b6c02ef5
      Henrik Bjoernlund authored
      This commit is correcting NETLINK br_fill_ifinfo() to be able to
      handle 'filter_mask' with multiple flags asserted.
      
      Fixes: 36a8e8e2 ("bridge: Extend br_fill_ifinfo to return MPR status")
      Signed-off-by: default avatarHenrik Bjoernlund <henrik.bjoernlund@microchip.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Suggested-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Tested-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b6c02ef5
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-10-08' of git://anongit.freedesktop.org/drm/drm · 3d006ee4
      Linus Torvalds authored
      Pull drm nouveau fixes from Dave Airlie:
       "Karol found two last minute nouveau fixes, they both fix crashes, the
        TTM one follows what other drivers do already, and the other is for
        bailing on load on unrecognised chipsets.
      
         - fix crash in TTM alloc fail path
      
         - return error earlier for unknown chipsets"
      
      * tag 'drm-fixes-2020-10-08' of git://anongit.freedesktop.org/drm/drm:
        drm/nouveau/mem: guard against NULL pointer access in mem_del
        drm/nouveau/device: return error for unknown chipsets
      3d006ee4
    • Linus Torvalds's avatar
      Merge tag 'exfat-for-5.9-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat · b9e3aa2a
      Linus Torvalds authored
      Pull exfat fixes from Namjae Jeon:
      
       - Fix use of uninitialized spinlock on error path
      
       - Fix missing err assignment in exfat_build_inode()
      
      * tag 'exfat-for-5.9-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
        exfat: fix use of uninitialized spinlock on error path
        exfat: fix pointer error checking
      b9e3aa2a
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.9b-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 86f0a5fb
      Linus Torvalds authored
      Pull xen fix from Juergen Gross:
       "One fix for a regression when booting as a Xen guest on ARM64
        introduced probably during the 5.9 cycle. It is very low risk as it is
        modifying Xen specific code only.
      
        The exact commit introducing the bug hasn't been identified yet, but
        everything was fine in 5.8 and only in 5.9 some configurations started
        to fail"
      
      * tag 'for-linus-5.9b-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        arm/arm64: xen: Fix to convert percpu address to gfn correctly
      86f0a5fb
    • David Howells's avatar
      afs: Fix deadlock between writeback and truncate · ec0fa0b6
      David Howells authored
      The afs filesystem has a lock[*] that it uses to serialise I/O operations
      going to the server (vnode->io_lock), as the server will only perform one
      modification operation at a time on any given file or directory.  This
      prevents the the filesystem from filling up all the call slots to a server
      with calls that aren't going to be executed in parallel anyway, thereby
      allowing operations on other files to obtain slots.
      
        [*] Note that is probably redundant for directories at least since
            i_rwsem is used to serialise directory modifications and
            lookup/reading vs modification.  The server does allow parallel
            non-modification ops, however.
      
      When a file truncation op completes, we truncate the in-memory copy of the
      file to match - but we do it whilst still holding the io_lock, the idea
      being to prevent races with other operations.
      
      However, if writeback starts in a worker thread simultaneously with
      truncation (whilst notify_change() is called with i_rwsem locked, writeback
      pays it no heed), it may manage to set PG_writeback bits on the pages that
      will get truncated before afs_setattr_success() manages to call
      truncate_pagecache().  Truncate will then wait for those pages - whilst
      still inside io_lock:
      
          # cat /proc/8837/stack
          [<0>] wait_on_page_bit_common+0x184/0x1e7
          [<0>] truncate_inode_pages_range+0x37f/0x3eb
          [<0>] truncate_pagecache+0x3c/0x53
          [<0>] afs_setattr_success+0x4d/0x6e
          [<0>] afs_wait_for_operation+0xd8/0x169
          [<0>] afs_do_sync_operation+0x16/0x1f
          [<0>] afs_setattr+0x1fb/0x25d
          [<0>] notify_change+0x2cf/0x3c4
          [<0>] do_truncate+0x7f/0xb2
          [<0>] do_sys_ftruncate+0xd1/0x104
          [<0>] do_syscall_64+0x2d/0x3a
          [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The writeback operation, however, stalls indefinitely because it needs to
      get the io_lock to proceed:
      
          # cat /proc/5940/stack
          [<0>] afs_get_io_locks+0x58/0x1ae
          [<0>] afs_begin_vnode_operation+0xc7/0xd1
          [<0>] afs_store_data+0x1b2/0x2a3
          [<0>] afs_write_back_from_locked_page+0x418/0x57c
          [<0>] afs_writepages_region+0x196/0x224
          [<0>] afs_writepages+0x74/0x156
          [<0>] do_writepages+0x2d/0x56
          [<0>] __writeback_single_inode+0x84/0x207
          [<0>] writeback_sb_inodes+0x238/0x3cf
          [<0>] __writeback_inodes_wb+0x68/0x9f
          [<0>] wb_writeback+0x145/0x26c
          [<0>] wb_do_writeback+0x16a/0x194
          [<0>] wb_workfn+0x74/0x177
          [<0>] process_one_work+0x174/0x264
          [<0>] worker_thread+0x117/0x1b9
          [<0>] kthread+0xec/0xf1
          [<0>] ret_from_fork+0x1f/0x30
      
      and thus deadlock has occurred.
      
      Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
      that the caller is holding i_rwsem doesn't preclude more pages being
      dirtied through an mmap'd region.
      
      Fix this by:
      
       (1) Use the vnode validate_lock to mediate access between afs_setattr()
           and afs_writepages():
      
           (a) Exclusively lock validate_lock in afs_setattr() around the whole
           	 RPC operation.
      
           (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
           	 shared-lock validate_lock and returning immediately if we couldn't
           	 get it.
      
           (c) If WB_SYNC_ALL is set, wait for the lock.
      
           The validate_lock is also used to validate a file and to zap its cache
           if the file was altered by a third party, so it's probably a good fit
           for this.
      
       (2) Move the truncation outside of the io_lock in setattr, using the same
           hook as is used for local directory editing.
      
           This requires the old i_size to be retained in the operation record as
           we commit the revised status to the inode members inside the io_lock
           still, but we still need to know if we reduced the file size.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ec0fa0b6
    • Linus Torvalds's avatar
      mm: avoid early COW write protect games during fork() · f3c64eda
      Linus Torvalds authored
      In commit 70e806e4 ("mm: Do early cow for pinned pages during fork()
      for ptes") we write-protected the PTE before doing the page pinning
      check, in order to avoid a race with concurrent fast-GUP pinning (which
      doesn't take the mm semaphore or the page table lock).
      
      That trick doesn't actually work - it doesn't handle memory ordering
      properly, and doing so would be prohibitively expensive.
      
      It also isn't really needed.  While we're moving in the direction of
      allowing and supporting page pinning without marking the pinned area
      with MADV_DONTFORK, the fact is that we've never really supported this
      kind of odd "concurrent fork() and page pinning", and doing the
      serialization on a pte level is just wrong.
      
      We can add serialization with a per-mm sequence counter, so we know how
      to solve that race properly, but we'll do that at a more appropriate
      time.  Right now this just removes the write protect games.
      
      It also turns out that the write protect games actually break on Power,
      as reported by Aneesh Kumar:
      
       "Architecture like ppc64 expects set_pte_at to be not used for updating
        a valid pte. This is further explained in commit 56eecdb9 ("mm:
        Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")"
      
      and the code triggered a warning there:
      
        WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
        Call Trace:
          copy_present_page mm/memory.c:857 [inline]
          copy_present_pte mm/memory.c:899 [inline]
          copy_pte_range mm/memory.c:1014 [inline]
          copy_pmd_range mm/memory.c:1092 [inline]
          copy_pud_range mm/memory.c:1127 [inline]
          copy_p4d_range mm/memory.c:1150 [inline]
          copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
          dup_mmap kernel/fork.c:592 [inline]
          dup_mm+0x77c/0xab0 kernel/fork.c:1355
          copy_mm kernel/fork.c:1411 [inline]
          copy_process+0x1f00/0x2740 kernel/fork.c:2070
          _do_fork+0xc4/0x10b0 kernel/fork.c:2429
      
      Link: https://lore.kernel.org/lkml/CAHk-=wiWr+gO0Ro4LvnJBMs90OiePNyrE3E+pJvc9PzdBShdmw@mail.gmail.com/
      Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.kumar@linux.ibm.com/Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Tested-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kirill Shutemov <kirill@shutemov.name>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3c64eda
    • Anant Thazhemadam's avatar
      net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() · 3dc289f8
      Anant Thazhemadam authored
      In nl80211_parse_key(), key.idx is first initialized as -1.
      If this value of key.idx remains unmodified and gets returned, and
      nl80211_key_allowed() also returns 0, then rdev_del_key() gets called
      with key.idx = -1.
      This causes an out-of-bounds array access.
      
      Handle this issue by checking if the value of key.idx after
      nl80211_parse_key() is called and return -EINVAL if key.idx < 0.
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com
      Tested-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com
      Signed-off-by: default avatarAnant Thazhemadam <anant.thazhemadam@gmail.com>
      Link: https://lore.kernel.org/r/20201007035401.9522-1-anant.thazhemadam@gmail.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      3dc289f8
    • Daniel Borkmann's avatar
      bpf: Fix scalar32_min_max_or bounds tracking · 5b9fbeb7
      Daniel Borkmann authored
      Simon reported an issue with the current scalar32_min_max_or() implementation.
      That is, compared to the other 32 bit subreg tracking functions, the code in
      scalar32_min_max_or() stands out that it's using the 64 bit registers instead
      of 32 bit ones. This leads to bounds tracking issues, for example:
      
        [...]
        8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        8: (79) r1 = *(u64 *)(r0 +0)
         R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        9: (b7) r0 = 1
        10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        10: (18) r2 = 0x600000002
        12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        12: (ad) if r1 < r2 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: (95) exit
        14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        14: (25) if r1 > 0x0 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: (95) exit
        16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        16: (47) r1 |= 0
        17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        [...]
      
      The bound tests on the map value force the upper unsigned bound to be 25769803777
      in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By
      using OR they are truncated and thus result in the range [1,1] for the 32 bit reg
      tracker. This is incorrect given the only thing we know is that the value must be
      positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the
      subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes
      sense, for example, for the case where we update dst_reg->s32_{min,max}_value in
      the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as
      we know that these are positive. Previously, in the else branch the 64 bit values
      of umin_value=1 and umax_value=32212254719 were used and latter got truncated to
      be 1 as upper bound there. After the fix the subreg range is now correct:
      
        [...]
        8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        8: (79) r1 = *(u64 *)(r0 +0)
         R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        9: (b7) r0 = 1
        10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        10: (18) r2 = 0x600000002
        12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        12: (ad) if r1 < r2 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: (95) exit
        14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        14: (25) if r1 > 0x0 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: (95) exit
        16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        16: (47) r1 |= 0
        17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        [...]
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Reported-by: default avatarSimon Scannell <scannell.smn@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5b9fbeb7
  4. 07 Oct, 2020 5 commits
    • Karol Herbst's avatar
      drm/nouveau/mem: guard against NULL pointer access in mem_del · d10285a2
      Karol Herbst authored
      other drivers seems to do something similar
      Signed-off-by: default avatarKarol Herbst <kherbst@redhat.com>
      Cc: dri-devel <dri-devel@lists.freedesktop.org>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-2-kherbst@redhat.com
      d10285a2
    • Karol Herbst's avatar
      drm/nouveau/device: return error for unknown chipsets · c3e0276c
      Karol Herbst authored
      Previously the code relied on device->pri to be NULL and to fail probing
      later. We really should just return an error inside nvkm_device_ctor for
      unsupported GPUs.
      
      Fixes: 24d5ff40 ("drm/nouveau/device: rework mmio mapping code to get rid of second map")
      Signed-off-by: default avatarKarol Herbst <kherbst@redhat.com>
      Cc: dann frazier <dann.frazier@canonical.com>
      Cc: dri-devel <dri-devel@lists.freedesktop.org>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJeremy Cline <jcline@redhat.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-1-kherbst@redhat.com
      c3e0276c
    • Namjae Jeon's avatar
      exfat: fix use of uninitialized spinlock on error path · 8ff006e5
      Namjae Jeon authored
      syzbot reported warning message:
      
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1d6/0x29e lib/dump_stack.c:118
       register_lock_class+0xf06/0x1520 kernel/locking/lockdep.c:893
       __lock_acquire+0xfd/0x2ae0 kernel/locking/lockdep.c:4320
       lock_acquire+0x148/0x720 kernel/locking/lockdep.c:5029
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       exfat_cache_inval_inode+0x30/0x280 fs/exfat/cache.c:226
       exfat_evict_inode+0x124/0x270 fs/exfat/inode.c:660
       evict+0x2bb/0x6d0 fs/inode.c:576
       exfat_fill_super+0x1e07/0x27d0 fs/exfat/super.c:681
       get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
       vfs_get_tree+0x88/0x270 fs/super.c:1547
       do_new_mount fs/namespace.c:2875 [inline]
       path_mount+0x179d/0x29e0 fs/namespace.c:3192
       do_mount fs/namespace.c:3205 [inline]
       __do_sys_mount fs/namespace.c:3413 [inline]
       __se_sys_mount+0x126/0x180 fs/namespace.c:3390
       do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      If exfat_read_root() returns an error, spinlock is used in
      exfat_evict_inode() without initialization. This patch combines
      exfat_cache_init_inode() with exfat_inode_init_once() to initialize
      spinlock by slab constructor.
      
      Fixes: c35b6810 ("exfat: add exfat cache")
      Cc: stable@vger.kernel.org # v5.7+
      Reported-by: default avatarsyzbot <syzbot+b91107320911a26c9a95@syzkaller.appspotmail.com>
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      8ff006e5
    • Tetsuhiro Kohada's avatar
      exfat: fix pointer error checking · d6c9efd9
      Tetsuhiro Kohada authored
      Fix missing result check of exfat_build_inode().
      And use PTR_ERR_OR_ZERO instead of PTR_ERR.
      Signed-off-by: default avatarTetsuhiro Kohada <kohada.t2@gmail.com>
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      d6c9efd9
    • Masami Hiramatsu's avatar
      arm/arm64: xen: Fix to convert percpu address to gfn correctly · 5a067711
      Masami Hiramatsu authored
      Use per_cpu_ptr_to_phys() instead of virt_to_phys() for per-cpu
      address conversion.
      
      In xen_starting_cpu(), per-cpu xen_vcpu_info address is converted
      to gfn by virt_to_gfn() macro. However, since the virt_to_gfn(v)
      assumes the given virtual address is in linear mapped kernel memory
      area, it can not convert the per-cpu memory if it is allocated on
      vmalloc area.
      
      This depends on CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK.
      If it is enabled, the first chunk of percpu memory is linear mapped.
      In the other case, that is allocated from vmalloc area. Moreover,
      if the first chunk of percpu has run out until allocating
      xen_vcpu_info, it will be allocated on the 2nd chunk, which is
      based on kernel memory or vmalloc memory (depends on
      CONFIG_NEED_PER_CPU_KM).
      
      Without this fix and kernel configured to use vmalloc area for
      the percpu memory, the Dom0 kernel will fail to boot with following
      errors.
      
      [    0.466172] Xen: initializing cpu0
      [    0.469601] ------------[ cut here ]------------
      [    0.474295] WARNING: CPU: 0 PID: 1 at arch/arm64/xen/../../arm/xen/enlighten.c:153 xen_starting_cpu+0x160/0x180
      [    0.484435] Modules linked in:
      [    0.487565] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #4
      [    0.493895] Hardware name: Socionext Developer Box (DT)
      [    0.499194] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
      [    0.504836] pc : xen_starting_cpu+0x160/0x180
      [    0.509263] lr : xen_starting_cpu+0xb0/0x180
      [    0.513599] sp : ffff8000116cbb60
      [    0.516984] x29: ffff8000116cbb60 x28: ffff80000abec000
      [    0.522366] x27: 0000000000000000 x26: 0000000000000000
      [    0.527754] x25: ffff80001156c000 x24: fffffdffbfcdb600
      [    0.533129] x23: 0000000000000000 x22: 0000000000000000
      [    0.538511] x21: ffff8000113a99c8 x20: ffff800010fe4f68
      [    0.543892] x19: ffff8000113a9988 x18: 0000000000000010
      [    0.549274] x17: 0000000094fe0f81 x16: 00000000deadbeef
      [    0.554655] x15: ffffffffffffffff x14: 0720072007200720
      [    0.560037] x13: 0720072007200720 x12: 0720072007200720
      [    0.565418] x11: 0720072007200720 x10: 0720072007200720
      [    0.570801] x9 : ffff8000100fbdc0 x8 : ffff800010715208
      [    0.576182] x7 : 0000000000000054 x6 : ffff00001b790f00
      [    0.581564] x5 : ffff800010bbf880 x4 : 0000000000000000
      [    0.586945] x3 : 0000000000000000 x2 : ffff80000abec000
      [    0.592327] x1 : 000000000000002f x0 : 0000800000000000
      [    0.597716] Call trace:
      [    0.600232]  xen_starting_cpu+0x160/0x180
      [    0.604309]  cpuhp_invoke_callback+0xac/0x640
      [    0.608736]  cpuhp_issue_call+0xf4/0x150
      [    0.612728]  __cpuhp_setup_state_cpuslocked+0x128/0x2c8
      [    0.618030]  __cpuhp_setup_state+0x84/0xf8
      [    0.622192]  xen_guest_init+0x324/0x364
      [    0.626097]  do_one_initcall+0x54/0x250
      [    0.630003]  kernel_init_freeable+0x12c/0x2c8
      [    0.634428]  kernel_init+0x1c/0x128
      [    0.637988]  ret_from_fork+0x10/0x18
      [    0.641635] ---[ end trace d95b5309a33f8b27 ]---
      [    0.646337] ------------[ cut here ]------------
      [    0.651005] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:158!
      [    0.657697] Internal error: Oops - BUG: 0 [#1] SMP
      [    0.662548] Modules linked in:
      [    0.665676] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.9.0-rc4+ #4
      [    0.673398] Hardware name: Socionext Developer Box (DT)
      [    0.678695] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
      [    0.684338] pc : xen_starting_cpu+0x178/0x180
      [    0.688765] lr : xen_starting_cpu+0x144/0x180
      [    0.693188] sp : ffff8000116cbb60
      [    0.696573] x29: ffff8000116cbb60 x28: ffff80000abec000
      [    0.701955] x27: 0000000000000000 x26: 0000000000000000
      [    0.707344] x25: ffff80001156c000 x24: fffffdffbfcdb600
      [    0.712718] x23: 0000000000000000 x22: 0000000000000000
      [    0.718107] x21: ffff8000113a99c8 x20: ffff800010fe4f68
      [    0.723481] x19: ffff8000113a9988 x18: 0000000000000010
      [    0.728863] x17: 0000000094fe0f81 x16: 00000000deadbeef
      [    0.734245] x15: ffffffffffffffff x14: 0720072007200720
      [    0.739626] x13: 0720072007200720 x12: 0720072007200720
      [    0.745008] x11: 0720072007200720 x10: 0720072007200720
      [    0.750390] x9 : ffff8000100fbdc0 x8 : ffff800010715208
      [    0.755771] x7 : 0000000000000054 x6 : ffff00001b790f00
      [    0.761153] x5 : ffff800010bbf880 x4 : 0000000000000000
      [    0.766534] x3 : 0000000000000000 x2 : 00000000deadbeef
      [    0.771916] x1 : 00000000deadbeef x0 : ffffffffffffffea
      [    0.777304] Call trace:
      [    0.779819]  xen_starting_cpu+0x178/0x180
      [    0.783898]  cpuhp_invoke_callback+0xac/0x640
      [    0.788325]  cpuhp_issue_call+0xf4/0x150
      [    0.792317]  __cpuhp_setup_state_cpuslocked+0x128/0x2c8
      [    0.797619]  __cpuhp_setup_state+0x84/0xf8
      [    0.801779]  xen_guest_init+0x324/0x364
      [    0.805683]  do_one_initcall+0x54/0x250
      [    0.809590]  kernel_init_freeable+0x12c/0x2c8
      [    0.814016]  kernel_init+0x1c/0x128
      [    0.817583]  ret_from_fork+0x10/0x18
      [    0.821226] Code: d0006980 f9427c00 cb000300 17ffffea (d4210000)
      [    0.827415] ---[ end trace d95b5309a33f8b28 ]---
      [    0.832076] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      [    0.839815] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Link: https://lore.kernel.org/r/160196697165.60224.17470743378683334995.stgit@devnote2Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      5a067711
  5. 06 Oct, 2020 14 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · c85fb28b
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "Fix a kernel panic in the AES crypto code caused by a BR tail call not
        matching the target BTI instruction (when branch target identification
        is enabled)"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        crypto: arm64: Use x16 with indirect branch to bti_c
      c85fb28b
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.9-3' of... · 6ec37e6b
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull another x86 platform driver fix from Hans de Goede:
       "One final pdx86 fix for Tablet Mode reporting regressions (which make
        the keyboard and touchpad unusable) on various Asus notebooks.
      
        These regressions were caused by the asus-nb-wmi and the intel-vbtn
        drivers both receiving recent patches to start reporting Tablet Mode /
        to report it on more models.
      
        Due to a miscommunication between Andy and me, Andy's earlier pull-req
        only contained the fix for the intel-vbtn driver and not the fix for
        the asus-nb-wmi code.
      
        This fix has been tested as a downstream patch in Fedora kernels for
        approx two weeks with no problems being reported"
      
      * tag 'platform-drivers-x86-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86: asus-wmi: Fix SW_TABLET_MODE always reporting 1 on many different models
      6ec37e6b
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm · f1e141e9
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Daniel queued these up last week and I took a long weekend so didn't
        get them out, but fixing the OOB access on get font seems like
        something we should land and it's cc'ed stable as well.
      
        The other big change is a partial revert for a regression on android
        on the clcd fbdev driver, and one other docs fix.
      
        fbdev:
         - Re-add FB_ARMCLCD for android
         - Fix global-out-of-bounds read in fbcon_get_font()
      
        core:
         - Small doc fix"
      
      * tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm:
        drm: drm_dsc.h: fix a kernel-doc markup
        Partially revert "video: fbdev: amba-clcd: Retire elder CLCD driver"
        fbcon: Fix global-out-of-bounds read in fbcon_get_font()
        Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts
        fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
      f1e141e9
    • Linus Torvalds's avatar
      usermodehelper: reset umask to default before executing user process · 4013c149
      Linus Torvalds authored
      Kernel threads intentionally do CLONE_FS in order to follow any changes
      that 'init' does to set up the root directory (or cwd).
      
      It is admittedly a bit odd, but it avoids the situation where 'init'
      does some extensive setup to initialize the system environment, and then
      we execute a usermode helper program, and it uses the original FS setup
      from boot time that may be very limited and incomplete.
      
      [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will
        follow the root regardless, since it fixes up other users of root (see
        chroot_fs_refs() for details), but overmounting root and doing a
        chroot() would not. ]
      
      However, Vegard Nossum noticed that the CLONE_FS not only means that we
      follow the root and current working directories, it also means we share
      umask with whatever init changed it to. That wasn't intentional.
      
      Just reset umask to the original default (0022) before actually starting
      the usermode helper program.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4013c149
    • Linus Torvalds's avatar
      splice: teach splice pipe reading about empty pipe buffers · d1a819a2
      Linus Torvalds authored
      Tetsuo Handa reports that splice() can return 0 before the real EOF, if
      the data in the splice source pipe is an empty pipe buffer.  That empty
      pipe buffer case doesn't happen in any normal situation, but you can
      trigger it by doing a write to a pipe that fails due to a page fault.
      
      Tetsuo has a test-case to show the behavior:
      
        #define _GNU_SOURCE
        #include <sys/types.h>
        #include <sys/stat.h>
        #include <fcntl.h>
        #include <unistd.h>
      
        int main(int argc, char *argv[])
        {
      	const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
      	int pipe_fd[2] = { -1, -1 };
      	pipe(pipe_fd);
      	write(pipe_fd[1], NULL, 4096);
      	/* This splice() should wait unless interrupted. */
      	return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
        }
      
      which results in
      
          write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
          splice(4, NULL, 3, NULL, 65536, 0)      = 0
      
      and this can confuse splice() users into believing they have hit EOF
      prematurely.
      
      The issue was introduced when the pipe write code started pre-allocating
      the pipe buffers before copying data from user space.
      
      This is modified verion of Tetsuo's original patch.
      
      Fixes: a194dfe6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot")
      Link:https://lore.kernel.org/linux-fsdevel/20201005121339.4063-1-penguin-kernel@I-love.SAKURA.ne.jp/Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Acked-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1a819a2
    • Jeremy Linton's avatar
      crypto: arm64: Use x16 with indirect branch to bti_c · 39e4716c
      Jeremy Linton authored
      The AES code uses a 'br x7' as part of a function called by
      a macro. That branch needs a bti_j as a target. This results
      in a panic as seen below. Using x16 (or x17) with an indirect
      branch keeps the target bti_c.
      
        Bad mode in Synchronous Abort handler detected on CPU1, code 0x34000003 -- BTI
        CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 5.8.11-300.fc33.aarch64 #1
        pstate: 20400c05 (nzCv daif +PAN -UAO BTYPE=j-)
        pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
        lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs]
        sp : ffff80001052b730
      
        aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
         __xts_crypt+0xb0/0x2dc [aes_neon_bs]
         xts_encrypt+0x28/0x3c [aes_neon_bs]
        crypto_skcipher_encrypt+0x50/0x84
        simd_skcipher_encrypt+0xc8/0xe0
        crypto_skcipher_encrypt+0x50/0x84
        test_skcipher_vec_cfg+0x224/0x5f0
        test_skcipher+0xbc/0x120
        alg_test_skcipher+0xa0/0x1b0
        alg_test+0x3dc/0x47c
        cryptomgr_test+0x38/0x60
      
      Fixes: 0e89640b ("crypto: arm64 - Use modern annotations for assembly functions")
      Cc: <stable@vger.kernel.org> # 5.6.x-
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Suggested-by: default avatarDave P Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20201006163326.2780619-1-jeremy.linton@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      39e4716c
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20201005' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · d91dc434
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Miscellaneous fixes
      
      Here are some miscellaneous rxrpc fixes:
      
       (1) Fix the xdr encoding of the contents read from an rxrpc key.
      
       (2) Fix a BUG() for a unsupported encoding type.
      
       (3) Fix missing _bh lock annotations.
      
       (4) Fix acceptance handling for an incoming call where the incoming call
           is encrypted.
      
       (5) The server token keyring isn't network namespaced - it belongs to the
           server, so there's no need.  Namespacing it means that request_key()
           fails to find it.
      
       (6) Fix a leak of the server keyring.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d91dc434
    • Eric Dumazet's avatar
      tcp: fix receive window update in tcp_add_backlog() · 86bccd03
      Eric Dumazet authored
      We got reports from GKE customers flows being reset by netfilter
      conntrack unless nf_conntrack_tcp_be_liberal is set to 1.
      
      Traces seemed to suggest ACK packet being dropped by the
      packet capture, or more likely that ACK were received in the
      wrong order.
      
       wscale=7, SYN and SYNACK not shown here.
      
       This ACK allows the sender to send 1871*128 bytes from seq 51359321 :
       New right edge of the window -> 51359321+1871*128=51598809
      
       09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0
      
       09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408
       09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0
       09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768
       09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0
       09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
      
       Receiver now allows to send 606*128=77568 from seq 51521241 :
       New right edge of the window -> 51521241+606*128=51598809
      
       09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0
      
       09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
      
       It seems the sender exceeds RWIN allowance, since 51611353 > 51598809
      
       09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728
       09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040
      
       09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0
      
       netfilter conntrack is not happy and sends RST
      
       09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0
       09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0
      
       Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet.
       New right edge of the window -> 51521241+859*128=51631193
      
      Normally TCP stack handles OOO packets just fine, but it
      turns out tcp_add_backlog() does not. It can update the window
      field of the aggregated packet even if the ACK sequence
      of the last received packet is too old.
      
      Many thanks to Alexandre Ferrieux for independently reporting the issue
      and suggesting a fix.
      
      Fixes: 4f693b55 ("tcp: implement coalescing on backlog queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAlexandre Ferrieux <alexandre.ferrieux@orange.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86bccd03
    • Anant Thazhemadam's avatar
      net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails · f45a4248
      Anant Thazhemadam authored
      When get_registers() fails in set_ethernet_addr(),the uninitialized
      value of node_id gets copied over as the address.
      So, check the return value of get_registers().
      
      If get_registers() executed successfully (i.e., it returns
      sizeof(node_id)), copy over the MAC address using ether_addr_copy()
      (instead of using memcpy()).
      
      Else, if get_registers() failed instead, a randomly generated MAC
      address is set as the MAC address instead.
      
      Reported-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com
      Tested-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com
      Acked-by: default avatarPetko Manolov <petkan@nucleusys.com>
      Signed-off-by: default avatarAnant Thazhemadam <anant.thazhemadam@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f45a4248
    • Paolo Abeni's avatar
      mptcp: more DATA FIN fixes · 017512a0
      Paolo Abeni authored
      Currently data fin on data packet are not handled properly:
      the 'rcv_data_fin_seq' field is interpreted as the last
      sequence number carrying a valid data, but for data fin
      packet with valid maps we currently store map_seq + map_len,
      that is, the next value.
      
      The 'write_seq' fields carries instead the value subseguent
      to the last valid byte, so in mptcp_write_data_fin() we
      never detect correctly the last DSS map.
      
      Fixes: 7279da61 ("mptcp: Use MPTCP-level flag for sending DATA_FIN")
      Fixes: 1a49b2c2 ("mptcp: Handle incoming 32-bit DATA_FIN values")
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      017512a0
    • David S. Miller's avatar
      Merge branch 'Fix-tail-dropping-watermarks-for-Ocelot-switches' · c88c5ed7
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix tail dropping watermarks for Ocelot switches
      
      This series adds a missing division by 60, and a warning to prevent that
      in the future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c88c5ed7
    • Vladimir Oltean's avatar
      net: mscc: ocelot: warn when encoding an out-of-bounds watermark value · 01326493
      Vladimir Oltean authored
      There is an upper bound to the value that a watermark may hold. That
      upper bound is not immediately obvious during configuration, and it
      might be possible to have accidental truncation.
      
      Actually this has happened already, add a warning to prevent it from
      happening again.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01326493
    • Vladimir Oltean's avatar
      net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP · 601e984f
      Vladimir Oltean authored
      Tail dropping is enabled for a port when:
      
      1. A source port consumes more packet buffers than the watermark encoded
         in SYS:PORT:ATOP_CFG.ATOP.
      
      AND
      
      2. Total memory use exceeds the consumption watermark encoded in
         SYS:PAUSE_CFG:ATOP_TOT_CFG.
      
      The unit of these watermarks is a 60 byte memory cell. That unit is
      programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when
      written into ATOP, it would get truncated and wrap around.
      
      Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      601e984f
    • Manivannan Sadhasivam's avatar
      net: qrtr: ns: Fix the incorrect usage of rcu_read_lock() · 082bb94f
      Manivannan Sadhasivam authored
      The rcu_read_lock() is not supposed to lock the kernel_sendmsg() API
      since it has the lock_sock() in qrtr_sendmsg() which will sleep. Hence,
      fix it by excluding the locking for kernel_sendmsg().
      
      While at it, let's also use radix_tree_deref_retry() to confirm the
      validity of the pointer returned by radix_tree_deref_slot() and use
      radix_tree_iter_resume() to resume iterating the tree properly before
      releasing the lock as suggested by Doug.
      
      Fixes: a7809ff9 ("net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks")
      Reported-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Tested-by: default avatarDouglas Anderson <dianders@chromium.org>
      Tested-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      082bb94f