1. 24 Apr, 2020 40 commits
    • Josh Triplett's avatar
      ext4: fix incorrect group count in ext4_fill_super error message · 536d20c0
      Josh Triplett authored
      commit df41460a upstream.
      
      ext4_fill_super doublechecks the number of groups before mounting; if
      that check fails, the resulting error message prints the group count
      from the ext4_sb_info sbi, which hasn't been set yet. Print the freshly
      computed group count instead (which at that point has just been computed
      in "blocks_count").
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Fixes: 4ec11028 ("ext4: Add sanity checks for the superblock before mounting the filesystem")
      Link: https://lore.kernel.org/r/8b957cd1513fcc4550fe675c10bcce2175c33a49.1585431964.git.josh@joshtriplett.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      536d20c0
    • zhangyi (F)'s avatar
      jbd2: improve comments about freeing data buffers whose page mapping is NULL · 0064b0f1
      zhangyi (F) authored
      commit 780f66e5 upstream.
      
      Improve comments in jbd2_journal_commit_transaction() to describe why
      we don't need to clear the buffer_mapped bit for freeing file mapping
      buffers whose page mapping is NULL.
      
      Link: https://lore.kernel.org/r/20200217112706.20085-1-yi.zhang@huawei.com
      Fixes: c96dceea ("jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer")
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0064b0f1
    • Can Guo's avatar
      scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic · db1b2190
      Can Guo authored
      commit c63d6099 upstream.
      
      The async version of ufshcd_hold(async == true), which is only called in
      queuecommand path as for now, is expected to work in atomic context, thus
      it should not sleep or schedule out. When it runs into the condition that
      clocks are ON but link is still in hibern8 state, it should bail out
      without flushing the clock ungate work.
      
      Fixes: f2a785ac ("scsi: ufshcd: Fix race between clk scaling and ungate work")
      Link: https://lore.kernel.org/r/1581392451-28743-6-git-send-email-cang@codeaurora.orgReviewed-by: default avatarHongwu Su <hongwus@codeaurora.org>
      Reviewed-by: default avatarAsutosh Das <asutoshd@codeaurora.org>
      Reviewed-by: default avatarBean Huo <beanhuo@micron.com>
      Reviewed-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Signed-off-by: default avatarCan Guo <cang@codeaurora.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db1b2190
    • Tim Stallard's avatar
      net: ipv6: do not consider routes via gateways for anycast address check · a6047aaf
      Tim Stallard authored
      [ Upstream commit 03e2a984 ]
      
      The behaviour for what is considered an anycast address changed in
      commit 45e4fd26 ("ipv6: Only create RTF_CACHE routes after
      encountering pmtu exception"). This now considers the first
      address in a subnet where there is a route via a gateway
      to be an anycast address.
      
      This breaks path MTU discovery and traceroutes when a host in a
      remote network uses the address at the start of a prefix
      (eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors
      will not be sent to anycast addresses.
      
      This patch excludes any routes with a gateway, or via point to
      point links, like the behaviour previously from
      rt6_is_gw_or_nonexthop in net/ipv6/route.c.
      
      This can be tested with:
      ip link add v1 type veth peer name v2
      ip netns add test
      ip netns exec test ip link set lo up
      ip link set v2 netns test
      ip link set v1 up
      ip netns exec test ip link set v2 up
      ip addr add 2001:db8::1/64 dev v1 nodad
      ip addr add 2001:db8:100:: dev lo nodad
      ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
      ip netns exec test ip route add unreachable 2001:db8:1::1
      ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1
      ip netns exec test sysctl net.ipv6.conf.all.forwarding=1
      ip route add 2001:db8:1::1 via 2001:db8::2
      ping -I 2001:db8::1 2001:db8:1::1 -c1
      ping -I 2001:db8:100:: 2001:db8:1::1 -c1
      ip addr delete 2001:db8:100:: dev lo
      ip netns delete test
      
      Currently the first ping will get back a destination unreachable ICMP
      error, but the second will never get a response, with "icmp6_send:
      acast source" logged. After this patch, both get destination
      unreachable ICMP replies.
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: default avatarTim Stallard <code@timstallard.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6047aaf
    • Wang Wenhu's avatar
      net: qrtr: send msgs from local of same id as broadcast · 0f838a40
      Wang Wenhu authored
      [ Upstream commit 6dbf02ac ]
      
      If the local node id(qrtr_local_nid) is not modified after its
      initialization, it equals to the broadcast node id(QRTR_NODE_BCAST).
      So the messages from local node should not be taken as broadcast
      and keep the process going to send them out anyway.
      
      The definitions are as follow:
      static unsigned int qrtr_local_nid = NUMA_NO_NODE;
      
      Fixes: fdf5fd39 ("net: qrtr: Broadcast messages only from control port")
      Signed-off-by: default avatarWang Wenhu <wenhu.wang@vivo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f838a40
    • Taras Chornyi's avatar
      net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin · 1cd63ccd
      Taras Chornyi authored
      [ Upstream commit 690cc863 ]
      
      When CONFIG_IP_MULTICAST is not set and multicast ip is added to the device
      with autojoin flag or when multicast ip is deleted kernel will crash.
      
      steps to reproduce:
      
      ip addr add 224.0.0.0/32 dev eth0
      ip addr del 224.0.0.0/32 dev eth0
      
      or
      
      ip addr add 224.0.0.0/32 dev eth0 autojoin
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
       pc : _raw_write_lock_irqsave+0x1e0/0x2ac
       lr : lock_sock_nested+0x1c/0x60
       Call trace:
        _raw_write_lock_irqsave+0x1e0/0x2ac
        lock_sock_nested+0x1c/0x60
        ip_mc_config.isra.28+0x50/0xe0
        inet_rtm_deladdr+0x1a8/0x1f0
        rtnetlink_rcv_msg+0x120/0x350
        netlink_rcv_skb+0x58/0x120
        rtnetlink_rcv+0x14/0x20
        netlink_unicast+0x1b8/0x270
        netlink_sendmsg+0x1a0/0x3b0
        ____sys_sendmsg+0x248/0x290
        ___sys_sendmsg+0x80/0xc0
        __sys_sendmsg+0x68/0xc0
        __arm64_sys_sendmsg+0x20/0x30
        el0_svc_common.constprop.2+0x88/0x150
        do_el0_svc+0x20/0x80
       el0_sync_handler+0x118/0x190
        el0_sync+0x140/0x180
      
      Fixes: 93a714d6 ("multicast: Extend ip address command to enable multicast group join/leave on")
      Signed-off-by: default avatarTaras Chornyi <taras.chornyi@plvision.eu>
      Signed-off-by: default avatarVadym Kochan <vadym.kochan@plvision.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cd63ccd
    • Taehee Yoo's avatar
      hsr: check protocol version in hsr_newlink() · c72ef9ff
      Taehee Yoo authored
      [ Upstream commit 4faab8c4 ]
      
      In the current hsr code, only 0 and 1 protocol versions are valid.
      But current hsr code doesn't check the version, which is received by
      userspace.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 version 4
      
      In the test commands, version 4 is invalid.
      So, the command should be failed.
      
      After this patch, following error will occur.
      "Error: hsr: Only versions 0..1 are supported."
      
      Fixes: ee1c2797 ("net/hsr: Added support for HSR v1")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c72ef9ff
    • Andy Shevchenko's avatar
      mfd: dln2: Fix sanity checking for endpoints · ac703de1
      Andy Shevchenko authored
      [ Upstream commit fb945c95 ]
      
      While the commit 2b8bd606 ("mfd: dln2: More sanity checking for endpoints")
      tries to harden the sanity checks it made at the same time a regression,
      i.e.  mixed in and out endpoints. Obviously it should have been not tested on
      real hardware at that time, but unluckily it didn't happen.
      
      So, fix above mentioned typo and make device being enumerated again.
      
      While here, introduce an enumerator for magic values to prevent similar issue
      to happen in the future.
      
      Fixes: 2b8bd606 ("mfd: dln2: More sanity checking for endpoints")
      Cc: Oliver Neukum <oneukum@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac703de1
    • Nathan Chancellor's avatar
      misc: echo: Remove unnecessary parentheses and simplify check for zero · 5d304d98
      Nathan Chancellor authored
      [ Upstream commit 85dc2c65 ]
      
      Clang warns when multiple pairs of parentheses are used for a single
      conditional statement.
      
      drivers/misc/echo/echo.c:384:27: warning: equality comparison with
      extraneous parentheses [-Wparentheses-equality]
              if ((ec->nonupdate_dwell == 0)) {
                   ~~~~~~~~~~~~~~~~~~~~^~~~
      drivers/misc/echo/echo.c:384:27: note: remove extraneous parentheses
      around the comparison to silence this warning
              if ((ec->nonupdate_dwell == 0)) {
                  ~                    ^   ~
      drivers/misc/echo/echo.c:384:27: note: use '=' to turn this equality
      comparison into an assignment
              if ((ec->nonupdate_dwell == 0)) {
                                       ^~
                                       =
      1 warning generated.
      
      Remove them and while we're at it, simplify the zero check as '!var' is
      used more than 'var == 0'.
      Reported-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5d304d98
    • Laurentiu Tudor's avatar
      powerpc/fsl_booke: Avoid creating duplicate tlb1 entry · f3c266a9
      Laurentiu Tudor authored
      [ Upstream commit aa411334 ]
      
      In the current implementation, the call to loadcam_multi() is wrapped
      between switch_to_as1() and restore_to_as0() calls so, when it tries
      to create its own temporary AS=1 TLB1 entry, it ends up duplicating
      the existing one created by switch_to_as1(). Add a check to skip
      creating the temporary entry if already running in AS=1.
      
      Fixes: d9e1831a ("powerpc/85xx: Load all early TLB entries at once")
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: default avatarLaurentiu Tudor <laurentiu.tudor@nxp.com>
      Acked-by: default avatarScott Wood <oss@buserror.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200123111914.2565-1-laurentiu.tudor@nxp.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      f3c266a9
    • Wen Yang's avatar
      ipmi: fix hung processes in __get_guid() · 62cd9aa3
      Wen Yang authored
      [ Upstream commit 32830a05 ]
      
      The wait_event() function is used to detect command completion.
      When send_guid_cmd() returns an error, smi_send() has not been
      called to send data. Therefore, wait_event() should not be used
      on the error path, otherwise it will cause the following warning:
      
      [ 1361.588808] systemd-udevd   D    0  1501   1436 0x00000004
      [ 1361.588813]  ffff883f4b1298c0 0000000000000000 ffff883f4b188000 ffff887f7e3d9f40
      [ 1361.677952]  ffff887f64bd4280 ffffc90037297a68 ffffffff8173ca3b ffffc90000000010
      [ 1361.767077]  00ffc90037297ad0 ffff887f7e3d9f40 0000000000000286 ffff883f4b188000
      [ 1361.856199] Call Trace:
      [ 1361.885578]  [<ffffffff8173ca3b>] ? __schedule+0x23b/0x780
      [ 1361.951406]  [<ffffffff8173cfb6>] schedule+0x36/0x80
      [ 1362.010979]  [<ffffffffa071f178>] get_guid+0x118/0x150 [ipmi_msghandler]
      [ 1362.091281]  [<ffffffff810d5350>] ? prepare_to_wait_event+0x100/0x100
      [ 1362.168533]  [<ffffffffa071f755>] ipmi_register_smi+0x405/0x940 [ipmi_msghandler]
      [ 1362.258337]  [<ffffffffa0230ae9>] try_smi_init+0x529/0x950 [ipmi_si]
      [ 1362.334521]  [<ffffffffa022f350>] ? std_irq_setup+0xd0/0xd0 [ipmi_si]
      [ 1362.411701]  [<ffffffffa0232bd2>] init_ipmi_si+0x492/0x9e0 [ipmi_si]
      [ 1362.487917]  [<ffffffffa0232740>] ? ipmi_pci_probe+0x280/0x280 [ipmi_si]
      [ 1362.568219]  [<ffffffff810021a0>] do_one_initcall+0x50/0x180
      [ 1362.636109]  [<ffffffff812231b2>] ? kmem_cache_alloc_trace+0x142/0x190
      [ 1362.714330]  [<ffffffff811b2ae1>] do_init_module+0x5f/0x200
      [ 1362.781208]  [<ffffffff81123ca8>] load_module+0x1898/0x1de0
      [ 1362.848069]  [<ffffffff811202e0>] ? __symbol_put+0x60/0x60
      [ 1362.913886]  [<ffffffff8130696b>] ? security_kernel_post_read_file+0x6b/0x80
      [ 1362.998514]  [<ffffffff81124465>] SYSC_finit_module+0xe5/0x120
      [ 1363.068463]  [<ffffffff81124465>] ? SYSC_finit_module+0xe5/0x120
      [ 1363.140513]  [<ffffffff811244be>] SyS_finit_module+0xe/0x10
      [ 1363.207364]  [<ffffffff81003c04>] do_syscall_64+0x74/0x180
      
      Fixes: 50c812b2 ("[PATCH] ipmi: add full sysfs support")
      Signed-off-by: default avatarWen Yang <wenyang@linux.alibaba.com>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: openipmi-developer@lists.sourceforge.net
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # 2.6.17-
      Message-Id: <20200403090408.58745-1-wenyang@linux.alibaba.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62cd9aa3
    • Chris Wilson's avatar
      drm: Remove PageReserved manipulation from drm_pci_alloc · 9abfa51e
      Chris Wilson authored
      [ Upstream commit ea36ec86 ]
      
      drm_pci_alloc/drm_pci_free are very thin wrappers around the core dma
      facilities, and we have no special reason within the drm layer to behave
      differently. In particular, since
      
      commit de09d31d
      Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Date:   Fri Jan 15 16:51:42 2016 -0800
      
          page-flags: define PG_reserved behavior on compound pages
      
          As far as I can see there's no users of PG_reserved on compound pages.
          Let's use PF_NO_COMPOUND here.
      
      it has been illegal to combine GFP_COMP with SetPageReserved, so lets
      stop doing both and leave the dma layer to its own devices.
      
      Reported-by: Taketo Kabe
      Bug: https://gitlab.freedesktop.org/drm/intel/issues/1027
      Fixes: de09d31d ("page-flags: define PG_reserved behavior on compound pages")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: <stable@vger.kernel.org> # v4.5+
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200202171635.4039044-1-chris@chris-wilson.co.ukSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      9abfa51e
    • Lyude Paul's avatar
      drm/dp_mst: Fix clearing payload state on topology disable · 80e21c3e
      Lyude Paul authored
      [ Upstream commit 8732fe46 ]
      
      The issues caused by:
      
      commit 64e62bdf ("drm/dp_mst: Remove VCPI while disabling topology
      mgr")
      
      Prompted me to take a closer look at how we clear the payload state in
      general when disabling the topology, and it turns out there's actually
      two subtle issues here.
      
      The first is that we're not grabbing &mgr.payload_lock when clearing the
      payloads in drm_dp_mst_topology_mgr_set_mst(). Seeing as the canonical
      lock order is &mgr.payload_lock -> &mgr.lock (because we always want
      &mgr.lock to be the inner-most lock so topology validation always
      works), this makes perfect sense. It also means that -technically- there
      could be racing between someone calling
      drm_dp_mst_topology_mgr_set_mst() to disable the topology, along with a
      modeset occurring that's modifying the payload state at the same time.
      
      The second is the more obvious issue that Wayne Lin discovered, that
      we're not clearing proposed_payloads when disabling the topology.
      
      I actually can't see any obvious places where the racing caused by the
      first issue would break something, and it could be that some of our
      higher-level locks already prevent this by happenstance, but better safe
      then sorry. So, let's make it so that drm_dp_mst_topology_mgr_set_mst()
      first grabs &mgr.payload_lock followed by &mgr.lock so that we never
      race when modifying the payload state. Then, we also clear
      proposed_payloads to fix the original issue of enabling a new topology
      with a dirty payload state. This doesn't clear any of the drm_dp_vcpi
      structures, but those are getting destroyed along with the ports anyway.
      
      Changes since v1:
      * Use sizeof(mgr->payloads[0])/sizeof(mgr->proposed_vcpis[0]) instead -
        vsyrjala
      
      Cc: Sean Paul <sean@poorly.run>
      Cc: Wayne Lin <Wayne.Lin@amd.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200122194321.14953-1-lyude@redhat.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      80e21c3e
    • Filipe Manana's avatar
      Btrfs: fix crash during unmount due to race with delayed inode workers · 68a417b5
      Filipe Manana authored
      [ Upstream commit f0cc2cd7 ]
      
      During unmount we can have a job from the delayed inode items work queue
      still running, that can lead to at least two bad things:
      
      1) A crash, because the worker can try to create a transaction just
         after the fs roots were freed;
      
      2) A transaction leak, because the worker can create a transaction
         before the fs roots are freed and just after we committed the last
         transaction and after we stopped the transaction kthread.
      
      A stack trace example of the crash:
      
       [79011.691214] kernel BUG at lib/radix-tree.c:982!
       [79011.692056] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
       [79011.693180] CPU: 3 PID: 1394 Comm: kworker/u8:2 Tainted: G        W         5.6.0-rc2-btrfs-next-54 #2
       (...)
       [79011.696789] Workqueue: btrfs-delayed-meta btrfs_work_helper [btrfs]
       [79011.697904] RIP: 0010:radix_tree_tag_set+0xe7/0x170
       (...)
       [79011.702014] RSP: 0018:ffffb3c84a317ca0 EFLAGS: 00010293
       [79011.702949] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
       [79011.704202] RDX: ffffb3c84a317cb0 RSI: ffffb3c84a317ca8 RDI: ffff8db3931340a0
       [79011.705463] RBP: 0000000000000005 R08: 0000000000000005 R09: ffffffff974629d0
       [79011.706756] R10: ffffb3c84a317bc0 R11: 0000000000000001 R12: ffff8db393134000
       [79011.708010] R13: ffff8db3931340a0 R14: ffff8db393134068 R15: 0000000000000001
       [79011.709270] FS:  0000000000000000(0000) GS:ffff8db3b6a00000(0000) knlGS:0000000000000000
       [79011.710699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [79011.711710] CR2: 00007f22c2a0a000 CR3: 0000000232ad4005 CR4: 00000000003606e0
       [79011.712958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [79011.714205] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [79011.715448] Call Trace:
       [79011.715925]  record_root_in_trans+0x72/0xf0 [btrfs]
       [79011.716819]  btrfs_record_root_in_trans+0x4b/0x70 [btrfs]
       [79011.717925]  start_transaction+0xdd/0x5c0 [btrfs]
       [79011.718829]  btrfs_async_run_delayed_root+0x17e/0x2b0 [btrfs]
       [79011.719915]  btrfs_work_helper+0xaa/0x720 [btrfs]
       [79011.720773]  process_one_work+0x26d/0x6a0
       [79011.721497]  worker_thread+0x4f/0x3e0
       [79011.722153]  ? process_one_work+0x6a0/0x6a0
       [79011.722901]  kthread+0x103/0x140
       [79011.723481]  ? kthread_create_worker_on_cpu+0x70/0x70
       [79011.724379]  ret_from_fork+0x3a/0x50
       (...)
      
      The following diagram shows a sequence of steps that lead to the crash
      during ummount of the filesystem:
      
              CPU 1                                             CPU 2                                CPU 3
      
       btrfs_punch_hole()
         btrfs_btree_balance_dirty()
           btrfs_balance_delayed_items()
             --> sees
                 fs_info->delayed_root->items
                 with value 200, which is greater
                 than
                 BTRFS_DELAYED_BACKGROUND (128)
                 and smaller than
                 BTRFS_DELAYED_WRITEBACK (512)
             btrfs_wq_run_delayed_node()
               --> queues a job for
                   fs_info->delayed_workers to run
                   btrfs_async_run_delayed_root()
      
                                                                                                  btrfs_async_run_delayed_root()
                                                                                                    --> job queued by CPU 1
      
                                                                                                    --> starts picking and running
                                                                                                        delayed nodes from the
                                                                                                        prepare_list list
      
                                                       close_ctree()
      
                                                         btrfs_delete_unused_bgs()
      
                                                         btrfs_commit_super()
      
                                                           btrfs_join_transaction()
                                                             --> gets transaction N
      
                                                           btrfs_commit_transaction(N)
                                                             --> set transaction state
                                                              to TRANTS_STATE_COMMIT_START
      
                                                                                                   btrfs_first_prepared_delayed_node()
                                                                                                     --> picks delayed node X through
                                                                                                         the prepared_list list
      
                                                             btrfs_run_delayed_items()
      
                                                               btrfs_first_delayed_node()
                                                                 --> also picks delayed node X
                                                                     but through the node_list
                                                                     list
      
                                                               __btrfs_commit_inode_delayed_items()
                                                                  --> runs all delayed items from
                                                                      this node and drops the
                                                                      node's item count to 0
                                                                      through call to
                                                                      btrfs_release_delayed_inode()
      
                                                               --> finishes running any remaining
                                                                   delayed nodes
      
                                                             --> finishes transaction commit
      
                                                         --> stops cleaner and transaction threads
      
                                                         btrfs_free_fs_roots()
                                                           --> frees all roots and removes them
                                                               from the radix tree
                                                               fs_info->fs_roots_radix
      
                                                                                                   btrfs_join_transaction()
                                                                                                     start_transaction()
                                                                                                       btrfs_record_root_in_trans()
                                                                                                         record_root_in_trans()
                                                                                                           radix_tree_tag_set()
                                                                                                             --> crashes because
                                                                                                                 the root is not in
                                                                                                                 the radix tree
                                                                                                                 anymore
      
      If the worker is able to call btrfs_join_transaction() before the unmount
      task frees the fs roots, we end up leaking a transaction and all its
      resources, since after the call to btrfs_commit_super() and stopping the
      transaction kthread, we don't expect to have any transaction open anymore.
      
      When this situation happens the worker has a delayed node that has no
      more items to run, since the task calling btrfs_run_delayed_items(),
      which is doing a transaction commit, picks the same node and runs all
      its items first.
      
      We can not wait for the worker to complete when running delayed items
      through btrfs_run_delayed_items(), because we call that function in
      several phases of a transaction commit, and that could cause a deadlock
      because the worker calls btrfs_join_transaction() and the task doing the
      transaction commit may have already set the transaction state to
      TRANS_STATE_COMMIT_DOING.
      
      Also it's not possible to get into a situation where only some of the
      items of a delayed node are added to the fs/subvolume tree in the current
      transaction and the remaining ones in the next transaction, because when
      running the items of a delayed inode we lock its mutex, effectively
      waiting for the worker if the worker is running the items of the delayed
      node already.
      
      Since this can only cause issues when unmounting a filesystem, fix it in
      a simple way by waiting for any jobs on the delayed workers queue before
      calling btrfs_commit_supper() at close_ctree(). This works because at this
      point no one can call btrfs_btree_balance_dirty() or
      btrfs_balance_delayed_items(), and if we end up waiting for any worker to
      complete, btrfs_commit_super() will commit the transaction created by the
      worker.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      68a417b5
    • Michael Ellerman's avatar
      powerpc/64/tm: Don't let userspace set regs->trap via sigreturn · 71064eba
      Michael Ellerman authored
      commit c7def7fb upstream.
      
      In restore_tm_sigcontexts() we take the trap value directly from the
      user sigcontext with no checking:
      
      	err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]);
      
      This means we can be in the kernel with an arbitrary regs->trap value.
      
      Although that's not immediately problematic, there is a risk we could
      trigger one of the uses of CHECK_FULL_REGS():
      
      	#define CHECK_FULL_REGS(regs)	BUG_ON(regs->trap & 1)
      
      It can also cause us to unnecessarily save non-volatile GPRs again in
      save_nvgprs(), which shouldn't be problematic but is still wrong.
      
      It's also possible it could trick the syscall restart machinery, which
      relies on regs->trap not being == 0xc00 (see 9a81c16b ("powerpc:
      fix double syscall restarts")), though I haven't been able to make
      that happen.
      
      Finally it doesn't match the behaviour of the non-TM case, in
      restore_sigcontext() which zeroes regs->trap.
      
      So change restore_tm_sigcontexts() to zero regs->trap.
      
      This was discovered while testing Nick's upcoming rewrite of the
      syscall entry path. In that series the call to save_nvgprs() prior to
      signal handling (do_notify_resume()) is removed, which leaves the
      low-bit of regs->trap uncleared which can then trigger the FULL_REGS()
      WARNs in setup_tm_sigcontexts().
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200401023836.3286664-1-mpe@ellerman.id.auSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71064eba
    • Kai-Heng Feng's avatar
      libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set · 4b8a7404
      Kai-Heng Feng authored
      commit 8305f72f upstream.
      
      During system resume from suspend, this can be observed on ASM1062 PMP
      controller:
      
      ata10.01: SATA link down (SStatus 0 SControl 330)
      ata10.02: hard resetting link
      ata10.02: SATA link down (SStatus 0 SControl 330)
      ata10.00: configured for UDMA/133
      Kernel panic - not syncing: stack-protector: Kernel
       in: sata_pmp_eh_recover+0xa2b/0xa40
      
      CPU: 2 PID: 230 Comm: scsi_eh_9 Tainted: P OE
      #49-Ubuntu
      Hardware name: System manufacturer System Product
       1001 12/10/2017
      Call Trace:
      dump_stack+0x63/0x8b
      panic+0xe4/0x244
      ? sata_pmp_eh_recover+0xa2b/0xa40
      __stack_chk_fail+0x19/0x20
      sata_pmp_eh_recover+0xa2b/0xa40
      ? ahci_do_softreset+0x260/0x260 [libahci]
      ? ahci_do_hardreset+0x140/0x140 [libahci]
      ? ata_phys_link_offline+0x60/0x60
      ? ahci_stop_engine+0xc0/0xc0 [libahci]
      sata_pmp_error_handler+0x22/0x30
      ahci_error_handler+0x45/0x80 [libahci]
      ata_scsi_port_error_handler+0x29b/0x770
      ? ata_scsi_cmd_error_handler+0x101/0x140
      ata_scsi_error+0x95/0xd0
      ? scsi_try_target_reset+0x90/0x90
      scsi_error_handler+0xd0/0x5b0
      kthread+0x121/0x140
      ? scsi_eh_get_sense+0x200/0x200
      ? kthread_create_worker_on_cpu+0x70/0x70
      ret_from_fork+0x22/0x40
      Kernel Offset: 0xcc00000 from 0xffffffff81000000
      (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      
      Since sata_pmp_eh_recover_pmp() doens't set rc when ATA_DFLAG_DETACH is
      set, sata_pmp_eh_recover() continues to run. During retry it triggers
      the stack protector.
      
      Set correct rc in sata_pmp_eh_recover_pmp() to let sata_pmp_eh_recover()
      jump to pmp_fail directly.
      
      BugLink: https://bugs.launchpad.net/bugs/1821434
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b8a7404
    • Simon Gander's avatar
      hfsplus: fix crash and filesystem corruption when deleting files · ffdfcd87
      Simon Gander authored
      commit 25efb2ff upstream.
      
      When removing files containing extended attributes, the hfsplus driver may
      remove the wrong entries from the attributes b-tree, causing major
      filesystem damage and in some cases even kernel crashes.
      
      To remove a file, all its extended attributes have to be removed as well.
      The driver does this by looking up all keys in the attributes b-tree with
      the cnid of the file.  Each of these entries then gets deleted using the
      key used for searching, which doesn't contain the attribute's name when it
      should.  Since the key doesn't contain the name, the deletion routine will
      not find the correct entry and instead remove the one in front of it.  If
      parent nodes have to be modified, these become corrupt as well.  This
      causes invalid links and unsorted entries that not even macOS's fsck_hfs
      is able to fix.
      
      To fix this, modify the search key before an entry is deleted from the
      attributes b-tree by copying the found entry's key into the search key,
      therefore ensuring that the correct entry gets removed from the tree.
      Signed-off-by: default avatarSimon Gander <simon@tuxera.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAnton Altaparmakov <anton@tuxera.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200327155541.1521-1-simon@tuxera.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffdfcd87
    • Oliver O'Halloran's avatar
      cpufreq: powernv: Fix use-after-free · 88a8e3c5
      Oliver O'Halloran authored
      commit d0a72efa upstream.
      
      The cpufreq driver has a use-after-free that we can hit if:
      
      a) There's an OCC message pending when the notifier is registered, and
      b) The cpufreq driver fails to register with the core.
      
      When a) occurs the notifier schedules a workqueue item to handle the
      message. The backing work_struct is located on chips[].throttle and
      when b) happens we clean up by freeing the array. Once we get to
      the (now free) queued item and the kernel crashes.
      
      Fixes: c5e29ea7 ("cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200206062622.28235-1-oohall@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88a8e3c5
    • Eric Biggers's avatar
      kmod: make request_module() return an error when autoloading is disabled · f5fbbb67
      Eric Biggers authored
      commit d7d27cfc upstream.
      
      Patch series "module autoloading fixes and cleanups", v5.
      
      This series fixes a bug where request_module() was reporting success to
      kernel code when module autoloading had been completely disabled via
      'echo > /proc/sys/kernel/modprobe'.
      
      It also addresses the issues raised on the original thread
      (https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
      bydocumenting the modprobe sysctl, adding a self-test for the empty path
      case, and downgrading a user-reachable WARN_ONCE().
      
      This patch (of 4):
      
      It's long been possible to disable kernel module autoloading completely
      (while still allowing manual module insertion) by setting
      /proc/sys/kernel/modprobe to the empty string.
      
      This can be preferable to setting it to a nonexistent file since it
      avoids the overhead of an attempted execve(), avoids potential
      deadlocks, and avoids the call to security_kernel_module_request() and
      thus on SELinux-based systems eliminates the need to write SELinux rules
      to dontaudit module_request.
      
      However, when module autoloading is disabled in this way,
      request_module() returns 0.  This is broken because callers expect 0 to
      mean that the module was successfully loaded.
      
      Apparently this was never noticed because this method of disabling
      module autoloading isn't used much, and also most callers don't use the
      return value of request_module() since it's always necessary to check
      whether the module registered its functionality or not anyway.
      
      But improperly returning 0 can indeed confuse a few callers, for example
      get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:
      
      	if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
      		fs = __get_fs_type(name, len);
      		WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
      	}
      
      This is easily reproduced with:
      
      	echo > /proc/sys/kernel/modprobe
      	mount -t NONEXISTENT none /
      
      It causes:
      
      	request_module fs-NONEXISTENT succeeded, but still no fs?
      	WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
      	[...]
      
      This should actually use pr_warn_once() rather than WARN_ONCE(), since
      it's also user-reachable if userspace immediately unloads the module.
      Regardless, request_module() should correctly return an error when it
      fails.  So let's make it return -ENOENT, which matches the error when
      the modprobe binary doesn't exist.
      
      I've also sent patches to document and test this case.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarJessica Yu <jeyu@kernel.org>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Ben Hutchings <benh@debian.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
      Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5fbbb67
    • Hans de Goede's avatar
      Input: i8042 - add Acer Aspire 5738z to nomux list · 21d22884
      Hans de Goede authored
      commit ebc68ced upstream.
      
      The Acer Aspire 5738z has a button to disable (and re-enable) the
      touchpad next to the touchpad.
      
      When this button is pressed a LED underneath indicates that the touchpad
      is disabled (and an event is send to userspace and GNOME shows its
      touchpad enabled / disable OSD thingie).
      
      So far so good, but after re-enabling the touchpad it no longer works.
      
      The laptop does not have an external ps2 port, so mux mode is not needed
      and disabling mux mode fixes the touchpad no longer working after toggling
      it off and back on again, so lets add this laptop model to the nomux list.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Link: https://lore.kernel.org/r/20200331123947.318908-1-hdegoede@redhat.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21d22884
    • Michael Mueller's avatar
      s390/diag: fix display of diagnose call statistics · 1ad66322
      Michael Mueller authored
      commit 6c7c851f upstream.
      
      Show the full diag statistic table and not just parts of it.
      
      The issue surfaced in a KVM guest with a number of vcpus
      defined smaller than NR_DIAG_STAT.
      
      Fixes: 1ec2772e ("s390/diag: add a statistic for diagnose calls")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMichael Mueller <mimu@linux.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ad66322
    • Changwei Ge's avatar
      ocfs2: no need try to truncate file beyond i_size · 75ace9f0
      Changwei Ge authored
      commit 783fda85 upstream.
      
      Linux fallocate(2) with FALLOC_FL_PUNCH_HOLE mode set, its offset can
      exceed the inode size.  Ocfs2 now doesn't allow that offset beyond inode
      size.  This restriction is not necessary and violates fallocate(2)
      semantics.
      
      If fallocate(2) offset is beyond inode size, just return success and do
      nothing further.
      
      Otherwise, ocfs2 will crash the kernel.
      
        kernel BUG at fs/ocfs2//alloc.c:7264!
         ocfs2_truncate_inline+0x20f/0x360 [ocfs2]
         ocfs2_remove_inode_range+0x23c/0xcb0 [ocfs2]
         __ocfs2_change_file_space+0x4a5/0x650 [ocfs2]
         ocfs2_fallocate+0x83/0xa0 [ocfs2]
         vfs_fallocate+0x148/0x230
         SyS_fallocate+0x48/0x80
         do_syscall_64+0x79/0x170
      Signed-off-by: default avatarChangwei Ge <chge@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200407082754.17565-1-chge@linux.alibaba.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75ace9f0
    • Qian Cai's avatar
      ext4: fix a data race at inode->i_blocks · 9945c406
      Qian Cai authored
      commit 28936b62 upstream.
      
      inode->i_blocks could be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in ext4_do_update_inode [ext4] / inode_add_bytes
      
       write to 0xffff9a00d4b982d0 of 8 bytes by task 22100 on cpu 118:
        inode_add_bytes+0x65/0xf0
        __inode_add_bytes at fs/stat.c:689
        (inlined by) inode_add_bytes at fs/stat.c:702
        ext4_mb_new_blocks+0x418/0xca0 [ext4]
        ext4_ext_map_blocks+0x1a6b/0x27b0 [ext4]
        ext4_map_blocks+0x1a9/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block_unwritten+0x33/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        ext4_da_write_begin+0x35f/0x8f0 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       read to 0xffff9a00d4b982d0 of 8 bytes by task 8 on cpu 65:
        ext4_do_update_inode+0x4a0/0xf60 [ext4]
        ext4_inode_blocks_set at fs/ext4/inode.c:4815
        ext4_mark_iloc_dirty+0xaf/0x160 [ext4]
        ext4_mark_inode_dirty+0x129/0x3e0 [ext4]
        ext4_convert_unwritten_extents+0x253/0x2d0 [ext4]
        ext4_convert_unwritten_io_end_vec+0xc5/0x150 [ext4]
        ext4_end_io_rsv_work+0x22c/0x350 [ext4]
        process_one_work+0x54f/0xb90
        worker_thread+0x80/0x5f0
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       4 locks held by kworker/u256:0/8:
        #0: ffff9a025abc4328 ((wq_completion)ext4-rsv-conversion){+.+.}, at: process_one_work+0x443/0xb90
        #1: ffffab5a862dbe20 ((work_completion)(&ei->i_rsv_conversion_work)){+.+.}, at: process_one_work+0x443/0xb90
        #2: ffff9a025a9d0f58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff9a00d4b985d8 (&(&ei->i_raw_lock)->rlock){+.+.}, at: ext4_do_update_inode+0xaa/0xf60 [ext4]
       irq event stamp: 3009267
       hardirqs last  enabled at (3009267): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (3009266): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (3009230): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (3009223): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 65 PID: 8 Comm: kworker/u256:0 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
       Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ext4]
      
      The plain read is outside of inode->i_lock critical section which
      results in a data race. Fix it by adding READ_ONCE() there.
      
      Link: https://lore.kernel.org/r/20200222043258.2279-1-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9945c406
    • Nathan Chancellor's avatar
      rtc: omap: Use define directive for PIN_CONFIG_ACTIVE_HIGH · b580a800
      Nathan Chancellor authored
      commit c5015652 upstream.
      
      Clang warns when one enumerated type is implicitly converted to another:
      
      drivers/rtc/rtc-omap.c:574:21: warning: implicit conversion from
      enumeration type 'enum rtc_pin_config_param' to different enumeration
      type 'enum pin_config_param' [-Wenum-conversion]
              {"ti,active-high", PIN_CONFIG_ACTIVE_HIGH, 0},
              ~                  ^~~~~~~~~~~~~~~~~~~~~~
      drivers/rtc/rtc-omap.c:579:12: warning: implicit conversion from
      enumeration type 'enum rtc_pin_config_param' to different enumeration
      type 'enum pin_config_param' [-Wenum-conversion]
              PCONFDUMP(PIN_CONFIG_ACTIVE_HIGH, "input active high", NULL, false),
              ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/linux/pinctrl/pinconf-generic.h:163:11: note: expanded from
      macro 'PCONFDUMP'
              .param = a, .display = b, .format = c, .has_arg = d     \
                       ^
      2 warnings generated.
      
      It is expected that pinctrl drivers can extend pin_config_param because
      of the gap between PIN_CONFIG_END and PIN_CONFIG_MAX so this conversion
      isn't an issue. Most drivers that take advantage of this define the
      PIN_CONFIG variables as constants, rather than enumerated values. Do the
      same thing here so that Clang no longer warns.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/144Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b580a800
    • Fredrik Strupe's avatar
      arm64: armv8_deprecated: Fix undef_hook mask for thumb setend · 3e2dadfe
      Fredrik Strupe authored
      commit fc226601 upstream.
      
      For thumb instructions, call_undef_hook() in traps.c first reads a u16,
      and if the u16 indicates a T32 instruction (u16 >= 0xe800), a second
      u16 is read, which then makes up the the lower half-word of a T32
      instruction. For T16 instructions, the second u16 is not read,
      which makes the resulting u32 opcode always have the upper half set to
      0.
      
      However, having the upper half of instr_mask in the undef_hook set to 0
      masks out the upper half of all thumb instructions - both T16 and T32.
      This results in trapped T32 instructions with the lower half-word equal
      to the T16 encoding of setend (b650) being matched, even though the upper
      half-word is not 0000 and thus indicates a T32 opcode.
      
      An example of such a T32 instruction is eaa0b650, which should raise a
      SIGILL since T32 instructions with an eaa prefix are unallocated as per
      Arm ARM, but instead works as a SETEND because the second half-word is set
      to b650.
      
      This patch fixes the issue by extending instr_mask to include the
      upper u32 half, which will still match T16 instructions where the upper
      half is 0, but not T32 instructions.
      
      Fixes: 2d888f48 ("arm64: Emulate SETEND for AArch32 tasks")
      Cc: <stable@vger.kernel.org> # 4.0.x-
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarFredrik Strupe <fredrik@strupe.net>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e2dadfe
    • Steffen Maier's avatar
      scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point · 42246805
      Steffen Maier authored
      commit 819732be upstream.
      
      v2.6.27 commit cc8c2829 ("[SCSI] zfcp: Automatically attach remote
      ports") introduced zfcp automatic port scan.
      
      Before that, the user had to use the sysfs attribute "port_add" of an FCP
      device (adapter) to add and open remote (target) ports, even for the remote
      peer port in point-to-point topology. That code path did a proper port open
      recovery trigger taking the erp_lock.
      
      Since above commit, a new helper function zfcp_erp_open_ptp_port()
      performed an UNlocked port open recovery trigger. This can race with other
      parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt
      e.g. adapter->erp_total_count or adapter->erp_ready_head.
      
      As already found for fabric topology in v4.17 commit fa89adba ("scsi:
      zfcp: fix infinite iteration on ERP ready list"), there was an endless loop
      during tracing of rport (un)block.  A subsequent v4.18 commit 9e156c54
      ("scsi: zfcp: assert that the ERP lock is held when tracing a recovery
      trigger") introduced a lockdep assertion for that case.
      
      As a side effect, that lockdep assertion now uncovered the unlocked code
      path for PtP. It is from within an adapter ERP action:
      
      zfcp_erp_strategy[1479]  intentionally DROPs erp lock around
                               zfcp_erp_strategy_do_action()
      zfcp_erp_strategy_do_action[1441]      NO erp lock
      zfcp_erp_adapter_strategy[876]         NO erp lock
      zfcp_erp_adapter_strategy_open[855]    NO erp lock
      zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock
      zfcp_erp_adapter_strat_fsf_xconf[772]  erp lock only around
                                             zfcp_erp_action_to_running(),
                                             BUT *_not_* around
                                             zfcp_erp_enqueue_ptp_port()
      zfcp_erp_enqueue_ptp_port[728]         BUG: *_not_* taking erp lock
      _zfcp_erp_port_reopen[432]             assumes to be called with erp lock
      zfcp_erp_action_enqueue[314]           assumes to be called with erp lock
      zfcp_dbf_rec_trig[288]                 _checks_ to be called with erp lock:
      	lockdep_assert_held(&adapter->erp_lock);
      
      It causes the following lockdep warning:
      
      WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288
                                  zfcp_dbf_rec_trig+0x16a/0x188
      no locks held by zfcperp0.0.17c0/775.
      
      Fix this by using the proper locked recovery trigger helper function.
      
      Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com
      Fixes: cc8c2829 ("[SCSI] zfcp: Automatically attach remote ports")
      Cc: <stable@vger.kernel.org> #v2.6.27+
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42246805
    • Shetty, Harshini X (EXT-Sony Mobile)'s avatar
      dm verity fec: fix memory leak in verity_fec_dtr · 4c02b23a
      Shetty, Harshini X (EXT-Sony Mobile) authored
      commit 75fa6019 upstream.
      
      Fix below kmemleak detected in verity_fec_ctr. output_pool is
      allocated for each dm-verity-fec device. But it is not freed when
      dm-table for the verity target is removed. Hence free the output
      mempool in destructor function verity_fec_dtr.
      
      unreferenced object 0xffffffffa574d000 (size 4096):
        comm "init", pid 1667, jiffies 4294894890 (age 307.168s)
        hex dump (first 32 bytes):
          8e 36 00 98 66 a8 0b 9b 00 00 00 00 00 00 00 00  .6..f...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000060e82407>] __kmalloc+0x2b4/0x340
          [<00000000dd99488f>] mempool_kmalloc+0x18/0x20
          [<000000002560172b>] mempool_init_node+0x98/0x118
          [<000000006c3574d2>] mempool_init+0x14/0x20
          [<0000000008cb266e>] verity_fec_ctr+0x388/0x3b0
          [<000000000887261b>] verity_ctr+0x87c/0x8d0
          [<000000002b1e1c62>] dm_table_add_target+0x174/0x348
          [<000000002ad89eda>] table_load+0xe4/0x328
          [<000000001f06f5e9>] dm_ctl_ioctl+0x3b4/0x5a0
          [<00000000bee5fbb7>] do_vfs_ioctl+0x5dc/0x928
          [<00000000b475b8f5>] __arm64_sys_ioctl+0x70/0x98
          [<000000005361e2e8>] el0_svc_common+0xa0/0x158
          [<000000001374818f>] el0_svc_handler+0x6c/0x88
          [<000000003364e9f4>] el0_svc+0x8/0xc
          [<000000009d84cec9>] 0xffffffffffffffff
      
      Fixes: a739ff3f ("dm verity: add support for forward error correction")
      Depends-on: 6f1c819c ("dm: convert to bioset_init()/mempool_init()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHarshini Shetty <harshini.x.shetty@sony.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c02b23a
    • Alexander Duyck's avatar
      mm: Use fixed constant in page_frag_alloc instead of size + 1 · b59a2e0a
      Alexander Duyck authored
      commit 86447726 upstream.
      
      This patch replaces the size + 1 value introduced with the recent fix for 1
      byte allocs with a constant value.
      
      The idea here is to reduce code overhead as the previous logic would have
      to read size into a register, then increment it, and write it back to
      whatever field was being used. By using a constant we can avoid those
      memory reads and arithmetic operations in favor of just encoding the
      maximum value into the operation itself.
      
      Fixes: 2c2ade81 ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs")
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b59a2e0a
    • Anssi Hannula's avatar
      tools: gpio: Fix out-of-tree build regression · bd075802
      Anssi Hannula authored
      commit 82f04bfe upstream.
      
      Commit 0161a94e ("tools: gpio: Correctly add make dependencies for
      gpio_utils") added a make rule for gpio-utils-in.o but used $(output)
      instead of the correct $(OUTPUT) for the output directory, breaking
      out-of-tree build (O=xx) with the following error:
      
        No rule to make target 'out/tools/gpio/gpio-utils-in.o', needed by 'out/tools/gpio/lsgpio-in.o'.  Stop.
      
      Fix that.
      
      Fixes: 0161a94e ("tools: gpio: Correctly add make dependencies for gpio_utils")
      Cc: <stable@vger.kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Link: https://lore.kernel.org/r/20200325103154.32235-1-anssi.hannula@bitwise.fiReviewed-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd075802
    • Zhenzhong Duan's avatar
      x86/speculation: Remove redundant arch_smt_update() invocation · f2a02afc
      Zhenzhong Duan authored
      commit 34d66caf upstream.
      
      With commit a74cfffb ("x86/speculation: Rework SMT state change"),
      arch_smt_update() is invoked from each individual CPU hotplug function.
      
      Therefore the extra arch_smt_update() call in the sysfs SMT control is
      redundant.
      
      Fixes: a74cfffb ("x86/speculation: Rework SMT state change")
      Signed-off-by: default avatarZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <konrad.wilk@oracle.com>
      Cc: <dwmw@amazon.co.uk>
      Cc: <bp@suse.de>
      Cc: <srinivas.eeda@oracle.com>
      Cc: <peterz@infradead.org>
      Cc: <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2a02afc
    • Takashi Iwai's avatar
      ALSA: hda: Initialize power_state field properly · c91df26c
      Takashi Iwai authored
      commit 183ab39e upstream.
      
      The recent commit 98081ca6 ("ALSA: hda - Record the current power
      state before suspend/resume calls") made the HD-audio driver to store
      the PM state in power_state field.  This forgot, however, the
      initialization at power up.  Although the codec drivers usually don't
      need to refer to this field in the normal operation, let's initialize
      it properly for consistency.
      
      Fixes: 98081ca6 ("ALSA: hda - Record the current power state before suspend/resume calls")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c91df26c
    • Rosioru Dragos's avatar
      crypto: mxs-dcp - fix scatterlist linearization for hash · e367644b
      Rosioru Dragos authored
      commit fa03481b upstream.
      
      The incorrect traversal of the scatterlist, during the linearization phase
      lead to computing the hash value of the wrong input buffer.
      New implementation uses scatterwalk_map_and_copy()
      to address this issue.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 15b59e7c ("crypto: mxs - Add Freescale MXS DCP driver")
      Signed-off-by: default avatarRosioru Dragos <dragos.rosioru@nxp.com>
      Reviewed-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e367644b
    • Josef Bacik's avatar
      btrfs: drop block from cache on error in relocation · c7b7daf9
      Josef Bacik authored
      commit 8e19c973 upstream.
      
      If we have an error while building the backref tree in relocation we'll
      process all the pending edges and then free the node.  However if we
      integrated some edges into the cache we'll lose our link to those edges
      by simply freeing this node, which means we'll leak memory and
      references to any roots that we've found.
      
      Instead we need to use remove_backref_node(), which walks through all of
      the edges that are still linked to this node and free's them up and
      drops any root references we may be holding.
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7b7daf9
    • Vitaly Kuznetsov's avatar
      KVM: VMX: fix crash cleanup when KVM wasn't used · b2f7d0ad
      Vitaly Kuznetsov authored
      commit dbef2808 upstream.
      
      If KVM wasn't used at all before we crash the cleanup procedure fails with
       BUG: unable to handle page fault for address: ffffffffffffffc8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 23215067 P4D 23215067 PUD 23217067 PMD 0
       Oops: 0000 [#8] SMP PTI
       CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G      D           5.6.0-rc2+ #823
       RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel]
      
      The root cause is that loaded_vmcss_on_cpu list is not yet initialized,
      we initialize it in hardware_enable() but this only happens when we start
      a VM.
      
      Previously, we used to have a bitmap with enabled CPUs and that was
      preventing [masking] the issue.
      
      Initialized loaded_vmcss_on_cpu list earlier, right before we assign
      crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and
      blocked_vcpu_on_cpu_lock are moved altogether for consistency.
      
      Fixes: 31603d4f ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2f7d0ad
    • Sean Christopherson's avatar
      KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support · b1bbaee4
      Sean Christopherson authored
      commit 31603d4f upstream.
      
      VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
      interrupted a KVM update of the percpu in-use VMCS list.
      
      Because NMIs are not blocked by disabling IRQs, it's possible that
      crash_vmclear_local_loaded_vmcss() could be called while the percpu list
      of VMCSes is being modified, e.g. in the middle of list_add() in
      vmx_vcpu_load_vmcs().  This potential corner case was called out in the
      original commit[*], but the analysis of its impact was wrong.
      
      Skipping the VMCLEARs is wrong because it all but guarantees that a
      loaded, and therefore cached, VMCS will live across kexec and corrupt
      memory in the new kernel.  Corruption will occur because the CPU's VMCS
      cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
      memory on its eviction will overwrite random memory in the new kernel.
      The VMCS will live because the NMI shootdown also disables VMX, i.e. the
      in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
      VMCS cache on VMXOFF.
      
      Furthermore, interrupting list_add() and list_del() is safe due to
      crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
      ensures the new entry is not visible to forward iteration unless the
      entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
      pointer could be observed if the NMI shootdown interrupted list_del() or
      list_add(), but list_for_each_entry() does not consume ->prev.
      
      In addition to removing the temporary disabling of VMCLEAR, open code
      loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
      the VMCS is deleted from the list only after it's been VMCLEAR'd.
      Deleting the VMCS before VMCLEAR would allow a race where the NMI
      shootdown could arrive between list_del() and vmcs_clear() and thus
      neither flow would execute a successful VMCLEAR.  Alternatively, more
      code could be moved into loaded_vmcs_init(), but that gets rather silly
      as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
      and would need to work around the list_del().
      
      Update the smp_*() comments related to the list manipulation, and
      opportunistically reword them to improve clarity.
      
      [*] https://patchwork.kernel.org/patch/1675731/#3720461
      
      Fixes: 8f536b76 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1bbaee4
    • Sean Christopherson's avatar
      KVM: x86: Allocate new rmap and large page tracking when moving memslot · 5163dcd3
      Sean Christopherson authored
      commit edd4fa37 upstream.
      
      Reallocate a rmap array and recalcuate large page compatibility when
      moving an existing memslot to correctly handle the alignment properties
      of the new memslot.  The number of rmap entries required at each level
      is dependent on the alignment of the memslot's base gfn with respect to
      that level, e.g. moving a large-page aligned memslot so that it becomes
      unaligned will increase the number of rmap entries needed at the now
      unaligned level.
      
      Not updating the rmap array is the most obvious bug, as KVM accesses
      garbage data beyond the end of the rmap.  KVM interprets the bad data as
      pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
      
        general protection fault: 0000 [#1] SMP
        CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
        Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
        RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
        RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
        RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
        RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
        R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
        R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
        FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
        Call Trace:
         kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
         __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
         kvm_set_memory_region+0x45/0x60 [kvm]
         kvm_vm_ioctl+0x30f/0x920 [kvm]
         do_vfs_ioctl+0xa1/0x620
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x4c/0x170
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f7ed9911f47
        Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
        RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
        RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
        RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
        R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
        R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
        Modules linked in: kvm_intel kvm irqbypass
        ---[ end trace 0c5f570b3358ca89 ]---
      
      The disallow_lpage tracking is more subtle.  Failure to update results
      in KVM creating large pages when it shouldn't, either due to stale data
      or again due to indexing beyond the end of the metadata arrays, which
      can lead to memory corruption and/or leaking data to guest/userspace.
      
      Note, the arrays for the old memslot are freed by the unconditional call
      to kvm_free_memslot() in __kvm_set_memory_region().
      
      Fixes: 05da4558 ("KVM: MMU: large page support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5163dcd3
    • David Hildenbrand's avatar
      KVM: s390: vsie: Fix delivery of addressing exceptions · 59106356
      David Hildenbrand authored
      commit 4d4cee96 upstream.
      
      Whenever we get an -EFAULT, we failed to read in guest 2 physical
      address space. Such addressing exceptions are reported via a program
      intercept to the nested hypervisor.
      
      We faked the intercept, we have to return to guest 2. Instead, right
      now we would be returning -EFAULT from the intercept handler, eventually
      crashing the VM.
      the correct thing to do is to return 1 as rc == 1 is the internal
      representation of "we have to go back into g2".
      
      Addressing exceptions can only happen if the g2->g3 page tables
      reference invalid g2 addresses (say, either a table or the final page is
      not accessible - so something that basically never happens in sane
      environments.
      
      Identified by manual code inspection.
      
      Fixes: a3508fbe ("KVM: s390: vsie: initial support for nested virtualization")
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Link: https://lore.kernel.org/r/20200403153050.20569-3-david@redhat.comReviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      [borntraeger@de.ibm.com: fix patch description]
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59106356
    • David Hildenbrand's avatar
      KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks · 34fbbaef
      David Hildenbrand authored
      commit a1d032a4 upstream.
      
      In case we have a region 1 the following calculation
      (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11)
      results in 64. As shifts beyond the size are undefined the compiler is
      free to use instructions like sllg. sllg will only use 6 bits of the
      shift value (here 64) resulting in no shift at all. That means that ALL
      addresses will be rejected.
      
      The can result in endless loops, e.g. when prefix cannot get mapped.
      
      Fixes: 4be130a0 ("s390/mm: add shadow gmap support")
      Tested-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Reported-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.comReviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      [borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE]
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34fbbaef
    • Thomas Gleixner's avatar
      x86/entry/32: Add missing ASM_CLAC to general_protection entry · acbc191c
      Thomas Gleixner authored
      commit 3d51507f upstream.
      
      All exception entry points must have ASM_CLAC right at the
      beginning. The general_protection entry is missing one.
      
      Fixes: e59d1b0a ("x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: default avatarAlexandre Chartre <alexandre.chartre@oracle.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200225220216.219537887@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acbc191c
    • Eric W. Biederman's avatar
      signal: Extend exec_id to 64bits · 110012a2
      Eric W. Biederman authored
      commit d1e7fd64 upstream.
      
      Replace the 32bit exec_id with a 64bit exec_id to make it impossible
      to wrap the exec_id counter.  With care an attacker can cause exec_id
      wrap and send arbitrary signals to a newly exec'd parent.  This
      bypasses the signal sending checks if the parent changes their
      credentials during exec.
      
      The severity of this problem can been seen that in my limited testing
      of a 32bit exec_id it can take as little as 19s to exec 65536 times.
      Which means that it can take as little as 14 days to wrap a 32bit
      exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
      days.  Even my slower timing is in the uptime of a typical server.
      Which means self_exec_id is simply a speed bump today, and if exec
      gets noticably faster self_exec_id won't even be a speed bump.
      
      Extending self_exec_id to 64bits introduces a problem on 32bit
      architectures where reading self_exec_id is no longer atomic and can
      take two read instructions.  Which means that is is possible to hit
      a window where the read value of exec_id does not match the written
      value.  So with very lucky timing after this change this still
      remains expoiltable.
      
      I have updated the update of exec_id on exec to use WRITE_ONCE
      and the read of exec_id in do_notify_parent to use READ_ONCE
      to make it clear that there is no locking between these two
      locations.
      
      Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
      Fixes: 2.3.23pre2
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      110012a2