1. 01 Oct, 2020 1 commit
  2. 29 Sep, 2020 5 commits
    • Alexander Aring's avatar
      fs: dlm: rework receive handling · 4798cbbf
      Alexander Aring authored
      This patch reworks the current receive handling of dlm. As I tried to
      change the send handling to fix reorder issues I took a look into the
      receive handling and simplified it, it works as the following:
      
      Each connection has a preallocated receive buffer with a minimum length of
      4096. On receive, the upper layer protocol will process all dlm message
      until there is not enough data anymore. If there exists "leftover" data at
      the end of the receive buffer because the dlm message wasn't fully received
      it will be copied to the begin of the preallocated receive buffer. Next
      receive more data will be appended to the previous "leftover" data and
      processing will begin again.
      
      This will remove a lot of code of the current mechanism. Inside the
      processing functionality we will ensure with a memmove() that the dlm
      message should be memory aligned. To have a dlm message always started
      at the beginning of the buffer will reduce some amount of memmove()
      calls because src and dest pointers are the same.
      
      The cluster attribute "buffer_size" becomes a new meaning, it's now the
      size of application layer receive buffer size. If this is changed during
      runtime the receive buffer will be reallocated. It's important that the
      receive buffer size has at minimum the size of the maximum possible dlm
      message size otherwise the received message cannot be placed inside
      the receive buffer size.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      4798cbbf
    • Alexander Aring's avatar
      fs: dlm: disallow buffer size below default · 4e192ee6
      Alexander Aring authored
      I observed that the upper layer will not send messages above this value.
      As conclusion the application receive buffer should not below that
      value, otherwise we are not capable to deliver the dlm message to the
      upper layer. This patch forbids to set the receive buffer below the
      maximum possible dlm message size.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      4e192ee6
    • Alexander Aring's avatar
      fs: dlm: handle range check as callback · e1a0ec30
      Alexander Aring authored
      This patch adds a callback to CLUSTER_ATTR macro to allow individual
      callbacks for attributes which might have a more complex attribute range
      checking just than non zero.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      e1a0ec30
    • Alexander Aring's avatar
      fs: dlm: fix mark per nodeid setting · 3f78cd7d
      Alexander Aring authored
      This patch fixes to set per nodeid mark configuration for accepted
      sockets as well. Before this patch only the listen socket mark value was
      used for all accepted connections. This patch will ensure that the
      cluster mark attribute value will be always used for all sockets, if a
      per nodeid mark value is specified dlm will use this value for the
      specific node.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      3f78cd7d
    • Alexander Aring's avatar
      fs: dlm: remove lock dependency warning · 0461e0db
      Alexander Aring authored
      During my experiments to make dlm robust against tcpkill application I
      was able to run sometimes in a circular lock dependency warning between
      clusters_root.subsys.su_mutex and con->sock_mutex. We don't need to
      held the sock_mutex when getting the mark value which held the
      clusters_root.subsys.su_mutex. This patch moves the specific handling
      just before the sock_mutex will be held.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      0461e0db
  3. 27 Aug, 2020 7 commits
    • Alexander Aring's avatar
      fs: dlm: use free_con to free connection · 7ae0451e
      Alexander Aring authored
      This patch use free_con() functionality to free the listen connection if
      listen fails. It also fixes an issue that a freed resource is still part
      of the connection_hash as hlist_del() is not called in this case. The
      only difference is that free_con() handles othercon as well, but this is
      never been set for the listen connection.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      7ae0451e
    • Alexander Aring's avatar
      fs: dlm: handle possible othercon writequeues · 948c47e9
      Alexander Aring authored
      This patch adds free of possible other writequeue entries in othercon
      member of struct connection.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      948c47e9
    • Alexander Aring's avatar
      fs: dlm: move free writequeue into con free · 0de98432
      Alexander Aring authored
      This patch just move the free of struct connection member writequeue
      into the functionality when struct connection will be freed instead of
      doing two iterations.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      0de98432
    • Alexander Aring's avatar
      fs: dlm: fix configfs memory leak · 3d2825c8
      Alexander Aring authored
      This patch fixes the following memory detected by kmemleak and umount
      gfs2 filesystem which removed the last lockspace:
      
      unreferenced object 0xffff9264f482f600 (size 192):
        comm "dlm_controld", pid 325, jiffies 4294690276 (age 48.136s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 6e 6f 64 65 73 00 00 00  ........nodes...
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000060481d7>] make_space+0x41/0x130
          [<000000008d905d46>] configfs_mkdir+0x1a2/0x5f0
          [<00000000729502cf>] vfs_mkdir+0x155/0x210
          [<000000000369bcf1>] do_mkdirat+0x6d/0x110
          [<00000000cc478a33>] do_syscall_64+0x33/0x40
          [<00000000ce9ccf01>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The patch just remembers the "nodes" entry pointer in space as I think
      it's created as subdirectory when parent "spaces" is created. In
      function drop_space() we will lost the pointer reference to nds because
      configfs_remove_default_groups(). However as this subdirectory is always
      available when "spaces" exists it will just be freed when "spaces" will be
      freed.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      3d2825c8
    • Alexander Aring's avatar
      fs: dlm: fix dlm_local_addr memory leak · 043697f0
      Alexander Aring authored
      This patch fixes the following memory detected by kmemleak and umount
      gfs2 filesystem which removed the last lockspace:
      
      unreferenced object 0xffff9264f4f48f00 (size 128):
        comm "mount", pid 425, jiffies 4294690253 (age 48.159s)
        hex dump (first 32 bytes):
          02 00 52 48 c0 a8 7a fb 00 00 00 00 00 00 00 00  ..RH..z.........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000067a34940>] kmemdup+0x18/0x40
          [<00000000c935f9ab>] init_local+0x4c/0xa0
          [<00000000bbd286ef>] dlm_lowcomms_start+0x28/0x160
          [<00000000a86625cb>] dlm_new_lockspace+0x7e/0xb80
          [<000000008df6cd63>] gdlm_mount+0x1cc/0x5de
          [<00000000b67df8c7>] gfs2_lm_mount.constprop.0+0x1a3/0x1d3
          [<000000006642ac5e>] gfs2_fill_super+0x717/0xba9
          [<00000000d3ab7118>] get_tree_bdev+0x17f/0x280
          [<000000001975926e>] gfs2_get_tree+0x21/0x90
          [<00000000561ce1c4>] vfs_get_tree+0x28/0xc0
          [<000000007fecaf63>] path_mount+0x434/0xc00
          [<00000000636b9594>] __x64_sys_mount+0xe3/0x120
          [<00000000cc478a33>] do_syscall_64+0x33/0x40
          [<00000000ce9ccf01>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      043697f0
    • Alexander Aring's avatar
      fs: dlm: make connection hash lockless · a47666eb
      Alexander Aring authored
      There are some problems with the connections_lock. During my
      experiements I saw sometimes circular dependencies with sock_lock.
      The reason here might be code parts which runs nodeid2con() before
      or after sock_lock is acquired.
      
      Another issue are missing locks in for_conn() iteration. Maybe this
      works fine because for_conn() is running in a context where
      connection_hash cannot be manipulated by others anymore.
      
      However this patch changes the connection_hash to be protected by
      sleepable rcu. The hotpath function __find_con() is implemented
      lockless as it is only a reader of connection_hash and this hopefully
      fixes the circular locking dependencies. The iteration for_conn() will
      still call some sleepable functionality, that's why we use sleepable rcu
      in this case.
      
      This patch removes the kmemcache functionality as I think I need to
      make some free() functionality via call_rcu(). However allocation time
      isn't here an issue. The dlm_allow_con will not be protected by a lock
      anymore as I think it's enough to just set and flush workqueues
      afterwards.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      a47666eb
    • Alexander Aring's avatar
      fs: dlm: synchronize dlm before shutdown · aa7ab1e2
      Alexander Aring authored
      This patch moves the dlm workqueue dlm synchronization before shutdown
      handling. The patch just flushes all pending work before starting to
      shutdown the connection. At least for the send_workqeue we should flush
      the workqueue to make sure there is no new connection handling going on
      as dlm_allow_conn switch is turned to false before.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      aa7ab1e2
  4. 23 Aug, 2020 9 commits
    • Linus Torvalds's avatar
      Linux 5.9-rc2 · d012a719
      Linus Torvalds authored
      d012a719
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · cb957121
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Add perf support for emitting extended registers for power10.
      
       - A fix for CPU hotplug on pseries, where on large/loaded systems we
         may not wait long enough for the CPU to be offlined, leading to
         crashes.
      
       - Addition of a raw cputable entry for Power10, which is not required
         to boot, but is required to make our PMU setup work correctly in
         guests.
      
       - Three fixes for the recent changes on 32-bit Book3S to move modules
         into their own segment for strict RWX.
      
       - A fix for a recent change in our powernv PCI code that could lead to
         crashes.
      
       - A change to our perf interrupt accounting to avoid soft lockups when
         using some events, found by syzkaller.
      
       - A change in the way we handle power loss events from the hypervisor
         on pseries. We no longer immediately shut down if we're told we're
         running on a UPS.
      
       - A few other minor fixes.
      
      Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T
      Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz,
      Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth,
      Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann,
      Vaidyanathan Srinivasan, Vasant Hegde.
      
      * tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
        powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
        powerpc/pseries: Do not initiate shutdown when system is running on UPS
        powerpc/perf: Fix soft lockups due to missed interrupt accounting
        powerpc/powernv/pci: Fix possible crash when releasing DMA resources
        powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
        powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
        powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
        powerpc/fixmap: Fix the size of the early debug area
        powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled
        powerpc/kernel: Cleanup machine check function declarations
        powerpc: Add POWER10 raw mode cputable entry
        powerpc/perf: Add extended regs support for power10 platform
        powerpc/perf: Add support for outputting extended regs in perf intr_regs
        powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores
      cb957121
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 550c2129
      Linus Torvalds authored
      Pull x86 fix from Thomas Gleixner:
       "A single fix for x86 which removes the RDPID usage from the paranoid
        entry path and unconditionally uses LSL to retrieve the CPU number.
      
        RDPID depends on MSR_TSX_AUX. KVM has an optmization to avoid
        expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values
        and restores them either when leaving the run loop, on preemption or
        when going out to user space. MSR_TSX_AUX is part of that lazy MSR
        set, so after writing the guest value and before the lazy restore any
        exception using the paranoid entry will read the guest value and use
        it as CPU number to retrieve the GSBASE value for the current CPU when
        FSGSBASE is enabled. As RDPID is only used in that particular entry
        path, there is no reason to burden VMENTER/EXIT with two extra MSR
        writes. Remove the RDPID optimization, which is not even backed by
        numbers from the paranoid entry path instead"
      
      * tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM
      550c2129
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cea05c19
      Linus Torvalds authored
      Pull x86 perf fix from Thomas Gleixner:
       "A single update for perf on x86 which has support for the broken down
        bandwith counters"
      
      * tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown
      cea05c19
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 10c091b6
      Linus Torvalds authored
      Pull EFI fixes from Thomas Gleixner:
      
       - Enforce NX on RO data in mixed EFI mode
      
       - Destroy workqueue in an error handling path to prevent UAF
      
       - Stop argument parser at '--' which is the delimiter for init
      
       - Treat a NULL command line pointer as empty instead of dereferncing it
         unconditionally.
      
       - Handle an unterminated command line correctly
      
       - Cleanup the 32bit code leftovers and remove obsolete documentation
      
      * tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Documentation: efi: remove description of efi=old_map
        efi/x86: Move 32-bit code into efi_32.c
        efi/libstub: Handle unterminated cmdline
        efi/libstub: Handle NULL cmdline
        efi/libstub: Stop parsing arguments at "--"
        efi: add missed destroy_workqueue when efisubsys_init fails
        efi/x86: Mark kernel rodata non-executable for mixed mode
      10c091b6
    • Linus Torvalds's avatar
      Merge tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e99b2507
      Linus Torvalds authored
      Pull entry fix from Thomas Gleixner:
       "A single bug fix for the common entry code.
      
        The transcription of the x86 version messed up the reload of the
        syscall number from pt_regs after ptrace and seccomp which breaks
        syscall number rewriting"
      
      * tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        core/entry: Respect syscall number rewrites
      e99b2507
    • Linus Torvalds's avatar
      Merge tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · d9232cb7
      Linus Torvalds authored
      Pull EDAC fix from Borislav Petkov:
       "A single fix correcting a reversed error severity determination check
        which lead to a recoverable error getting marked as fatal, by Tony
        Luck"
      
      * tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/{i7core,sb,pnd2,skx}: Fix error event severity
      d9232cb7
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9d045ed1
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Nothing earth shattering here, lots of small fixes (f.e. missing RCU
        protection, bad ref counting, missing memset(), etc.) all over the
        place:
      
         1) Use get_file_rcu() in task_file iterator, from Yonghong Song.
      
         2) There are two ways to set remote source MAC addresses in macvlan
            driver, but only one of which validates things properly. Fix this.
            From Alvin Šipraga.
      
         3) Missing of_node_put() in gianfar probing, from Sumera
            Priyadarsini.
      
         4) Preserve device wanted feature bits across multiple netlink
            ethtool requests, from Maxim Mikityanskiy.
      
         5) Fix rcu_sched stall in task and task_file bpf iterators, from
            Yonghong Song.
      
         6) Avoid reset after device destroy in ena driver, from Shay
            Agroskin.
      
         7) Missing memset() in netlink policy export reallocation path, from
            Johannes Berg.
      
         8) Fix info leak in __smc_diag_dump(), from Peilin Ye.
      
         9) Decapsulate ECN properly for ipv6 in ipv4 tunnels, from Mark
            Tomlinson.
      
        10) Fix number of data stream negotiation in SCTP, from David Laight.
      
        11) Fix double free in connection tracker action module, from Alaa
            Hleihel.
      
        12) Don't allow empty NHA_GROUP attributes, from Nikolay Aleksandrov"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
        net: nexthop: don't allow empty NHA_GROUP
        bpf: Fix two typos in uapi/linux/bpf.h
        net: dsa: b53: check for timeout
        tipc: call rcu_read_lock() in tipc_aead_encrypt_done()
        net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow
        net: sctp: Fix negotiation of the number of data streams.
        dt-bindings: net: renesas, ether: Improve schema validation
        gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY
        hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
        hv_netvsc: Remove "unlikely" from netvsc_select_queue
        bpf: selftests: global_funcs: Check err_str before strstr
        bpf: xdp: Fix XDP mode when no mode flags specified
        selftests/bpf: Remove test_align leftovers
        tools/resolve_btfids: Fix sections with wrong alignment
        net/smc: Prevent kernel-infoleak in __smc_diag_dump()
        sfc: fix build warnings on 32-bit
        net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified"
        libbpf: Fix map index used in error message
        net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
        net: atlantic: Use readx_poll_timeout() for large timeout
        ...
      9d045ed1
    • Linus Torvalds's avatar
      Merge branch 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f320ac6e
      Linus Torvalds authored
      Pull epoll fixes from Al Viro:
       "Fix reference counting and clean up exit paths"
      
      * 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        do_epoll_ctl(): clean the failure exits up a bit
        epoll: Keep a reference on files added to the check list
      f320ac6e
  5. 22 Aug, 2020 9 commits
    • Al Viro's avatar
      52c47969
    • Marc Zyngier's avatar
      epoll: Keep a reference on files added to the check list · a9ed4a65
      Marc Zyngier authored
      When adding a new fd to an epoll, and that this new fd is an
      epoll fd itself, we recursively scan the fds attached to it
      to detect cycles, and add non-epool files to a "check list"
      that gets subsequently parsed.
      
      However, this check list isn't completely safe when deletions
      can happen concurrently. To sidestep the issue, make sure that
      a struct file placed on the check list sees its f_count increased,
      ensuring that a concurrent deletion won't result in the file
      disapearing from under our feet.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a9ed4a65
    • Nikolay Aleksandrov's avatar
      net: nexthop: don't allow empty NHA_GROUP · eeaac363
      Nikolay Aleksandrov authored
      Currently the nexthop code will use an empty NHA_GROUP attribute, but it
      requires at least 1 entry in order to function properly. Otherwise we
      end up derefencing null or random pointers all over the place due to not
      having any nh_grp_entry members allocated, nexthop code relies on having at
      least the first member present. Empty NHA_GROUP doesn't make any sense so
      just disallow it.
      Also add a WARN_ON for any future users of nexthop_create_group().
      
       BUG: kernel NULL pointer dereference, address: 0000000000000080
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       RIP: 0010:fib_check_nexthop+0x4a/0xaa
       Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
       RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
       RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
       RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
       RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
       R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
       R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
       FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
       Call Trace:
        fib_create_info+0x64d/0xaf7
        fib_table_insert+0xf6/0x581
        ? __vma_adjust+0x3b6/0x4d4
        inet_rtm_newroute+0x56/0x70
        rtnetlink_rcv_msg+0x1e3/0x20d
        ? rtnl_calcit.isra.0+0xb8/0xb8
        netlink_rcv_skb+0x5b/0xac
        netlink_unicast+0xfa/0x17b
        netlink_sendmsg+0x334/0x353
        sock_sendmsg_nosec+0xf/0x3f
        ____sys_sendmsg+0x1a0/0x1fc
        ? copy_msghdr_from_user+0x4c/0x61
        ___sys_sendmsg+0x63/0x84
        ? handle_mm_fault+0xa39/0x11b5
        ? sockfd_lookup_light+0x72/0x9a
        __sys_sendmsg+0x50/0x6e
        do_syscall_64+0x54/0xbe
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f10dacc0bb7
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
       RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
       RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
       RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
       R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
       Modules linked in:
       CR2: 0000000000000080
      
      CC: David Ahern <dsahern@gmail.com>
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeaac363
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.9' of... · c3d8f220
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - move -Wsign-compare warning from W=2 to W=3
      
       - fix the keyword _restrict to __restrict in genksyms
      
       - fix more bugs in qconf
      
      * tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: qconf: replace deprecated QString::sprintf() with QTextStream
        kconfig: qconf: remove redundant help in the info view
        kconfig: qconf: remove qInfo() to get back Qt4 support
        kconfig: qconf: remove unused colNr
        kconfig: qconf: fix the popup menu in the ConfigInfoView window
        kconfig: qconf: fix signal connection to invalid slots
        genksyms: keywords: Use __restrict not _restrict
        kbuild: remove redundant patterns in filter/filter-out
        extract-cert: add static to local data
        Makefile.extrawarn: Move sign-compare from W=2 to W=3
      c3d8f220
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · dd105d64
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Allow booting of late secondary CPUs affected by erratum 1418040
         (currently they are parked if none of the early CPUs are affected by
         this erratum).
      
       - Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
         vdso_install' installs the 32-bit compat vdso when it is compiled.
      
       - Print a warning that untrusted guests without a CPU erratum
         workaround (Cortex-A57 832075) may deadlock the affected system.
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        ARM64: vdso32: Install vdso32 from vdso_install
        KVM: arm64: Print warning when cpu erratum can cause guests to deadlock
        arm64: Allow booting of late CPUs affected by erratum 1418040
        arm64: Move handling of erratum 1418040 into C code
      dd105d64
    • Linus Torvalds's avatar
      Merge tag 's390-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · d57ce840
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - a couple of fixes for storage key handling relevant for debugging
      
       - add cond_resched into potentially slow subchannels scanning loop
      
       - fixes for PF/VF linking and to ignore stale PCI configuration request
         events
      
      * tag 's390-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: fix PF/VF linking on hot plug
        s390/pci: re-introduce zpci_remove_device()
        s390/pci: fix zpci_bus_link_virtfn()
        s390/ptrace: fix storage key handling
        s390/runtime_instrumentation: fix storage key handling
        s390/pci: ignore stale configuration request event
        s390/cio: add cond_resched() in the slow_eval_known_fn() loop
      d57ce840
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b2d9e996
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
      
       - PAE and PKU bugfixes for x86
      
       - selftests fix for new binutils
      
       - MMU notifier fix for arm64
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
        KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
        kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
        kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
        KVM: x86: fix access code passed to gva_to_gpa
        selftests: kvm: Use a shorter encoding to clear RAX
      b2d9e996
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 9e574b74
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "23 fixes in 5 drivers (qla2xxx, ufs, scsi_debug, fcoe, zfcp). The bulk
        of the changes are in qla2xxx and ufs and all are mostly small and
        definitely don't impact the core"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
        Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe"
        Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"
        scsi: qla2xxx: Fix null pointer access during disconnect from subsystem
        scsi: qla2xxx: Check if FW supports MQ before enabling
        scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba
        scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set anytime
        scsi: qla2xxx: Reduce noisy debug message
        scsi: qla2xxx: Fix login timeout
        scsi: qla2xxx: Indicate correct supported speeds for Mezz card
        scsi: qla2xxx: Flush I/O on zone disable
        scsi: qla2xxx: Flush all sessions on zone disable
        scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values
        scsi: scsi_debug: Fix scp is NULL errors
        scsi: zfcp: Fix use-after-free in request timeout handlers
        scsi: ufs: No need to send Abort Task if the task in DB was cleared
        scsi: ufs: Clean up completed request without interrupt notification
        scsi: ufs: Improve interrupt handling for shared interrupts
        scsi: ufs: Fix interrupt error message for shared interrupts
        scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL
        scsi: ufs-mediatek: Fix incorrect time to wait link status
        ...
      9e574b74
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · d6af6330
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
       "Another set of DT fixes:
      
         - restore range parsing error check
      
         - workaround PCI range parsing with missing 'device_type' now
           required
      
         - correct description of 'phy-connection-type'
      
         - fix erroneous matching on 'snps,dw-pcie' by 'intel,lgm-pcie' schema
      
         - a couple of grammar and whitespace fixes
      
         - update Shawn Guo's email"
      
      * tag 'devicetree-fixes-for-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: vendor-prefixes: Remove trailing whitespace
        dt-bindings: net: correct description of phy-connection-type
        dt-bindings: PCI: intel,lgm-pcie: Fix matching on all snps,dw-pcie instances
        of: address: Work around missing device_type property in pcie nodes
        dt: writing-schema: Miscellaneous grammar fixes
        dt-bindings: Use Shawn Guo's preferred e-mail for i.MX bindings
        of/address: check for invalid range.cpu_addr
      d6af6330
  6. 21 Aug, 2020 9 commits
    • Geert Uytterhoeven's avatar
      5cd841d2
    • Will Deacon's avatar
      KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set · b5331379
      Will Deacon authored
      When an MMU notifier call results in unmapping a range that spans multiple
      PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
      since this avoids running into RCU stalls during VM teardown. Unfortunately,
      if the VM is destroyed as a result of OOM, then blocking is not permitted
      and the call to the scheduler triggers the following BUG():
      
       | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
       | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
       | INFO: lockdep is turned off.
       | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
       | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
       | Call trace:
       |  dump_backtrace+0x0/0x284
       |  show_stack+0x1c/0x28
       |  dump_stack+0xf0/0x1a4
       |  ___might_sleep+0x2bc/0x2cc
       |  unmap_stage2_range+0x160/0x1ac
       |  kvm_unmap_hva_range+0x1a0/0x1c8
       |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
       |  __mmu_notifier_invalidate_range_start+0x218/0x31c
       |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
       |  __oom_reap_task_mm+0x128/0x268
       |  oom_reap_task+0xac/0x298
       |  oom_reaper+0x178/0x17c
       |  kthread+0x1e4/0x1fc
       |  ret_from_fork+0x10/0x30
      
      Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
      only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
      flags.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8b3405e3 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-3-will@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5331379
    • Will Deacon's avatar
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon authored
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd
    • Madalin Bucur's avatar
      dt-bindings: net: correct description of phy-connection-type · 5f53584c
      Madalin Bucur authored
      The phy-connection-type parameter is described in ePAPR 1.1:
      
      Specifies interface type between the Ethernet device and a physical
      layer (PHY) device. The value of this property is specific to the
      implementation.
      Signed-off-by: default avatarMadalin Bucur <madalin.bucur@oss.nxp.com>
      Link: https://lore.kernel.org/r/1597917724-11127-1-git-send-email-madalin.bucur@oss.nxp.comSigned-off-by: default avatarRob Herring <robh@kernel.org>
      5f53584c
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block · f873db9a
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Make sure the head link cancelation includes async work
      
       - Get rid of kiocb_wait_page_queue_init(), makes no sense to have it as
         a separate function since you moved it into io_uring itself
      
       - io_import_iovec cleanups (Pavel, me)
      
       - Use system_unbound_wq for ring exit work, to avoid spawning tons of
         these if we have tons of rings exiting at the same time
      
       - Fix req->flags overflow flag manipulation (Pavel)
      
      * tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block:
        io_uring: kill extra iovec=NULL in import_iovec()
        io_uring: comment on kfree(iovec) checks
        io_uring: fix racy req->flags modification
        io_uring: use system_unbound_wq for ring exit work
        io_uring: cleanup io_import_iovec() of pre-mapped request
        io_uring: get rid of kiocb_wait_page_queue_init()
        io_uring: find and cancel head link async work on files exit
      f873db9a
    • Rob Herring's avatar
      dt-bindings: PCI: intel,lgm-pcie: Fix matching on all snps,dw-pcie instances · a326462c
      Rob Herring authored
      The intel,lgm-pcie binding is matching on all snps,dw-pcie instances
      which is wrong. Add a custom 'select' entry to fix this.
      
      Fixes: e54ea45a ("dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller")
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Reviewed-by: default avatarDilip Kota <eswara.kota@linux.intel.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      a326462c
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 349111f0
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "11 patches.
      
        Subsystems affected by this: misc, mm/hugetlb, mm/vmalloc, mm/misc,
        romfs, relay, uprobes, squashfs, mm/cma, mm/pagealloc"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, page_alloc: fix core hung in free_pcppages_bulk()
        mm: include CMA pages in lowmem_reserve at boot
        squashfs: avoid bio_alloc() failure with 1Mbyte blocks
        uprobes: __replace_page() avoid BUG in munlock_vma_page()
        kernel/relay.c: fix memleak on destroy relay channel
        romfs: fix uninitialized memory leak in romfs_dev_read()
        mm/rodata_test.c: fix missing function declaration
        mm/vunmap: add cond_resched() in vunmap_pmd_range
        khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
        hugetlb_cgroup: convert comma to semicolon
        mailmap: add Andi Kleen
      349111f0
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 4af7b32f
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-08-21
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 11 non-merge commits during the last 5 day(s) which contain
      a total of 12 files changed, 78 insertions(+), 24 deletions(-).
      
      The main changes are:
      
      1) three fixes in BPF task iterator logic, from Yonghong.
      
      2) fix for compressed dwarf sections in vmlinux, from Jiri.
      
      3) fix xdp attach regression, from Andrii.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4af7b32f
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · f22c5579
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - The CLINT driver has been split in two: one to handle the M-mode
         CLINT (memory mapped and used on NOMMU systems) and one to handle the
         S-mode CLINT (via SBI).
      
       - The addition of SiFive's drivers to rv32_defconfig
      
      * tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Add SiFive drivers to rv32_defconfig
        dt-bindings: timer: Add CLINT bindings
        RISC-V: Remove CLINT related code from timer and arch
        clocksource/drivers: Add CLINT timer driver
        RISC-V: Add mechanism to provide custom IPI operations
      f22c5579