1. 09 Jul, 2020 15 commits
    • Misono Tomohiro's avatar
      hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add() · a56fe733
      Misono Tomohiro authored
      [ Upstream commit 8b97f992 ]
      
      Although it rarely happens, we should call free_capabilities()
      if error happens after read_capabilities() to free allocated strings.
      
      Fixes: de584afa ("hwmon driver for ACPI 4.0 power meters")
      Signed-off-by: default avatarMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Link: https://lore.kernel.org/r/20200625043242.31175-1-misono.tomohiro@jp.fujitsu.comSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a56fe733
    • Chu Lin's avatar
      hwmon: (max6697) Make sure the OVERT mask is set correctly · 25c99651
      Chu Lin authored
      [ Upstream commit 016983d1 ]
      
      Per the datasheet for max6697, OVERT mask and ALERT mask are different.
      For example, the 7th bit of OVERT is the local channel but for alert
      mask, the 6th bit is the local channel. Therefore, we can't apply the
      same mask for both registers. In addition to that, the max6697 driver
      is supposed to be compatibale with different models. I manually went over
      all the listed chips and made sure all chip types have the same layout.
      
      Testing;
          mask value of 0x9 should map to 0x44 for ALERT and 0x84 for OVERT.
          I used iotool to read the reg value back to verify. I only tested this
          change on max6581.
      
      Reference:
      https://datasheets.maximintegrated.com/en/ds/MAX6581.pdf
      https://datasheets.maximintegrated.com/en/ds/MAX6697.pdf
      https://datasheets.maximintegrated.com/en/ds/MAX6699.pdfSigned-off-by: default avatarChu Lin <linchuyuan@google.com>
      Fixes: 5372d2d7 ("hwmon: Driver for Maxim MAX6697 and compatibles")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      25c99651
    • Rahul Lakkireddy's avatar
      cxgb4: parse TC-U32 key values and masks natively · 69d3042d
      Rahul Lakkireddy authored
      [ Upstream commit 27f78cb2 ]
      
      TC-U32 passes all keys values and masks in __be32 format. The parser
      already expects this and hence pass the value and masks in __be32
      natively to the parser.
      
      Fixes following sparse warnings in several places:
      cxgb4_tc_u32.c:57:21: warning: incorrect type in assignment (different base
      types)
      cxgb4_tc_u32.c:57:21:    expected unsigned int [usertype] val
      cxgb4_tc_u32.c:57:21:    got restricted __be32 [usertype] val
      cxgb4_tc_u32_parse.h:48:24: warning: cast to restricted __be32
      
      Fixes: 2e8aad7b ("cxgb4: add parser to translate u32 filters to internal spec")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69d3042d
    • Shile Zhang's avatar
      sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds · 0c199348
      Shile Zhang authored
      [ Upstream commit 975e155e ]
      
      We added the 'sched_rr_timeslice_ms' SCHED_RR tuning knob in this commit:
      
        ce0dbbbb ("sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice")
      
      ... which name suggests to users that it's in milliseconds, while in reality
      it's being set in milliseconds but the result is shown in jiffies.
      
      This is obviously confusing when HZ is not 1000, it makes it appear like the
      value set failed, such as HZ=100:
      
        root# echo 100 > /proc/sys/kernel/sched_rr_timeslice_ms
        root# cat /proc/sys/kernel/sched_rr_timeslice_ms
        10
      
      Fix this to be milliseconds all around.
      Signed-off-by: default avatarShile Zhang <shile.zhang@nokia.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1485612049-20923-1-git-send-email-shile.zhang@nokia.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0c199348
    • Herbert Xu's avatar
      crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock() · 04d462a6
      Herbert Xu authored
      commit 34c86f4c upstream.
      
      The locking in af_alg_release_parent is broken as the BH socket
      lock can only be taken if there is a code-path to handle the case
      where the lock is owned by process-context.  Instead of adding
      such handling, we can fix this by changing the ref counts to
      atomic_t.
      
      This patch also modifies the main refcnt to include both normal
      and nokey sockets.  This way we don't have to fudge the nokey
      ref count when a socket changes from nokey to normal.
      
      Credits go to Mauricio Faria de Oliveira who diagnosed this bug
      and sent a patch for it:
      
      https://lore.kernel.org/linux-crypto/20200605161657.535043-1-mfo@canonical.com/Reported-by: default avatarBrian Moyles <bmoyles@netflix.com>
      Reported-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Fixes: 37f96694 ("crypto: af_alg - Use bh_lock_sock in...")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04d462a6
    • Douglas Anderson's avatar
      kgdb: Avoid suspicious RCU usage warning · 2968445b
      Douglas Anderson authored
      [ Upstream commit 440ab9e1 ]
      
      At times when I'm using kgdb I see a splat on my console about
      suspicious RCU usage.  I managed to come up with a case that could
      reproduce this that looked like this:
      
        WARNING: suspicious RCU usage
        5.7.0-rc4+ #609 Not tainted
        -----------------------------
        kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection!
      
        other info that might help us debug this:
      
          rcu_scheduler_active = 2, debug_locks = 1
        3 locks held by swapper/0/1:
         #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c
         #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac
         #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac
      
        stack backtrace:
        CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609
        Hardware name: Google Cheza (rev3+) (DT)
        Call trace:
         dump_backtrace+0x0/0x1b8
         show_stack+0x1c/0x24
         dump_stack+0xd4/0x134
         lockdep_rcu_suspicious+0xf0/0x100
         find_task_by_pid_ns+0x5c/0x80
         getthread+0x8c/0xb0
         gdb_serial_stub+0x9d4/0xd04
         kgdb_cpu_enter+0x284/0x7ac
         kgdb_handle_exception+0x174/0x20c
         kgdb_brk_fn+0x24/0x30
         call_break_hook+0x6c/0x7c
         brk_handler+0x20/0x5c
         do_debug_exception+0x1c8/0x22c
         el1_sync_handler+0x3c/0xe4
         el1_sync+0x7c/0x100
         rpmh_rsc_probe+0x38/0x420
         platform_drv_probe+0x94/0xb4
         really_probe+0x134/0x300
         driver_probe_device+0x68/0x100
         __device_attach_driver+0x90/0xa8
         bus_for_each_drv+0x84/0xcc
         __device_attach+0xb4/0x13c
         device_initial_probe+0x18/0x20
         bus_probe_device+0x38/0x98
         device_add+0x38c/0x420
      
      If I understand properly we should just be able to blanket kgdb under
      one big RCU read lock and the problem should go away.  We'll add it to
      the beast-of-a-function known as kgdb_cpu_enter().
      
      With this I no longer get any splats and things seem to work fine.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Link: https://lore.kernel.org/r/20200602154729.v2.1.I70e0d4fd46d5ed2aaf0c98a355e8e1b7a5bb7e4e@changeidSigned-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2968445b
    • Zqiang's avatar
      usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect · 4ea5c909
      Zqiang authored
      [ Upstream commit 28ebeb8d ]
      
      BUG: memory leak
      unreferenced object 0xffff888055046e00 (size 256):
        comm "kworker/2:9", pid 2570, jiffies 4294942129 (age 1095.500s)
        hex dump (first 32 bytes):
          00 70 04 55 80 88 ff ff 18 bb 5a 81 ff ff ff ff  .p.U......Z.....
          f5 96 78 81 ff ff ff ff 37 de 8e 81 ff ff ff ff  ..x.....7.......
        backtrace:
          [<00000000d121dccf>] kmemleak_alloc_recursive
      include/linux/kmemleak.h:43 [inline]
          [<00000000d121dccf>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<00000000d121dccf>] slab_alloc_node mm/slub.c:2786 [inline]
          [<00000000d121dccf>] slab_alloc mm/slub.c:2794 [inline]
          [<00000000d121dccf>] kmem_cache_alloc_trace+0x15e/0x2d0 mm/slub.c:2811
          [<000000005c3c3381>] kmalloc include/linux/slab.h:555 [inline]
          [<000000005c3c3381>] usbtest_probe+0x286/0x19d0
      drivers/usb/misc/usbtest.c:2790
          [<000000001cec6910>] usb_probe_interface+0x2bd/0x870
      drivers/usb/core/driver.c:361
          [<000000007806c118>] really_probe+0x48d/0x8f0 drivers/base/dd.c:551
          [<00000000a3308c3e>] driver_probe_device+0xfc/0x2a0 drivers/base/dd.c:724
          [<000000003ef66004>] __device_attach_driver+0x1b6/0x240
      drivers/base/dd.c:831
          [<00000000eee53e97>] bus_for_each_drv+0x14e/0x1e0 drivers/base/bus.c:431
          [<00000000bb0648d0>] __device_attach+0x1f9/0x350 drivers/base/dd.c:897
          [<00000000838b324a>] device_initial_probe+0x1a/0x20 drivers/base/dd.c:944
          [<0000000030d501c1>] bus_probe_device+0x1e1/0x280 drivers/base/bus.c:491
          [<000000005bd7adef>] device_add+0x131d/0x1c40 drivers/base/core.c:2504
          [<00000000a0937814>] usb_set_configuration+0xe84/0x1ab0
      drivers/usb/core/message.c:2030
          [<00000000e3934741>] generic_probe+0x6a/0xe0 drivers/usb/core/generic.c:210
          [<0000000098ade0f1>] usb_probe_device+0x90/0xd0
      drivers/usb/core/driver.c:266
          [<000000007806c118>] really_probe+0x48d/0x8f0 drivers/base/dd.c:551
          [<00000000a3308c3e>] driver_probe_device+0xfc/0x2a0 drivers/base/dd.c:724
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarKyungtae Kim <kt0755@gmail.com>
      Signed-off-by: default avatarZqiang <qiang.zhang@windriver.com>
      Link: https://lore.kernel.org/r/20200612035210.20494-1-qiang.zhang@windriver.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ea5c909
    • Qian Cai's avatar
      mm/slub: fix stack overruns with SLUB_STATS · ba9950ac
      Qian Cai authored
      [ Upstream commit a68ee057 ]
      
      There is no need to copy SLUB_STATS items from root memcg cache to new
      memcg cache copies.  Doing so could result in stack overruns because the
      store function only accepts 0 to clear the stat and returns an error for
      everything else while the show method would print out the whole stat.
      
      Then, the mismatch of the lengths returns from show and store methods
      happens in memcg_propagate_slab_attrs():
      
      	else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf))
      		buf = mbuf;
      
      max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64]
      in show_stat() later where a bounch of sprintf() would overrun the stack
      variable.  Fix it by always allocating a page of buffer to be used in
      show_stat() if SLUB_STATS=y which should only be used for debug purpose.
      
        # echo 1 > /sys/kernel/slab/fs_cache/shrink
        BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0
        Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251
      
        Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
        Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
        Call Trace:
          number+0x421/0x6e0
          vsnprintf+0x451/0x8e0
          sprintf+0x9e/0xd0
          show_stat+0x124/0x1d0
          alloc_slowpath_show+0x13/0x20
          __kmem_cache_create+0x47a/0x6b0
      
        addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame:
         process_one_work+0x0/0xb90
      
        this frame has 1 object:
         [32, 72) 'lockdep_map'
      
        Memory state around the buggy address:
         ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
                                                               ^
         ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00
         ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ==================================================================
        Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0
        Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
        Call Trace:
          __kmem_cache_create+0x6ac/0x6b0
      
      Fixes: 107dab5c ("slub: slub-specific propagation changes")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Glauber Costa <glauber@scylladb.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ba9950ac
    • Dongli Zhang's avatar
      mm/slub.c: fix corrupted freechain in deactivate_slab() · 96b01209
      Dongli Zhang authored
      [ Upstream commit 52f23478 ]
      
      The slub_debug is able to fix the corrupted slab freelist/page.
      However, alloc_debug_processing() only checks the validity of current
      and next freepointer during allocation path.  As a result, once some
      objects have their freepointers corrupted, deactivate_slab() may lead to
      page fault.
      
      Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128
      slub_nomerge'.  The test kernel corrupts the freepointer of one free
      object on purpose.  Unfortunately, deactivate_slab() does not detect it
      when iterating the freechain.
      
        BUG: unable to handle page fault for address: 00000000123456f8
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        ... ...
        RIP: 0010:deactivate_slab.isra.92+0xed/0x490
        ... ...
        Call Trace:
         ___slab_alloc+0x536/0x570
         __slab_alloc+0x17/0x30
         __kmalloc+0x1d9/0x200
         ext4_htree_store_dirent+0x30/0xf0
         htree_dirblock_to_tree+0xcb/0x1c0
         ext4_htree_fill_tree+0x1bc/0x2d0
         ext4_readdir+0x54f/0x920
         iterate_dir+0x88/0x190
         __x64_sys_getdents+0xa6/0x140
         do_syscall_64+0x49/0x170
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Therefore, this patch adds extra consistency check in deactivate_slab().
      Once an object's freepointer is corrupted, all following objects
      starting at this object are isolated.
      
      [akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n]
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Joe Jin <joe.jin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      96b01209
    • Tuomas Tynkkynen's avatar
      usbnet: smsc95xx: Fix use-after-free after removal · 3cc5b831
      Tuomas Tynkkynen authored
      [ Upstream commit b835a71e ]
      
      Syzbot reports an use-after-free in workqueue context:
      
      BUG: KASAN: use-after-free in mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737
       mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737
       __smsc95xx_mdio_read drivers/net/usb/smsc95xx.c:217 [inline]
       smsc95xx_mdio_read+0x583/0x870 drivers/net/usb/smsc95xx.c:278
       check_carrier+0xd1/0x2e0 drivers/net/usb/smsc95xx.c:644
       process_one_work+0x777/0xf90 kernel/workqueue.c:2274
       worker_thread+0xa8f/0x1430 kernel/workqueue.c:2420
       kthread+0x2df/0x300 kernel/kthread.c:255
      
      It looks like that smsc95xx_unbind() is freeing the structures that are
      still in use by the concurrently running workqueue callback. Thus switch
      to using cancel_delayed_work_sync() to ensure the work callback really
      is no longer active.
      
      Reported-by: syzbot+29dc7d4ae19b703ff947@syzkaller.appspotmail.com
      Signed-off-by: default avatarTuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3cc5b831
    • Borislav Petkov's avatar
      EDAC/amd64: Read back the scrub rate PCI register on F15h · 5960439a
      Borislav Petkov authored
      [ Upstream commit ee470bb2 ]
      
      Commit:
      
        da92110d ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")
      
      added support for F15h, model 0x60 CPUs but in doing so, missed to read
      back SCRCTRL PCI config register on F15h CPUs which are *not* model
      0x60. Add that read so that doing
      
        $ cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate
      
      can show the previously set DRAM scrub rate.
      
      Fixes: da92110d ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")
      Reported-by: default avatarAnders Andersson <pipatron@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> #v4.4..
      Link: https://lkml.kernel.org/r/CAKkunMbNWppx_i6xSdDHLseA2QQmGJqj_crY=NF-GZML5np4Vw@mail.gmail.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      5960439a
    • Hugh Dickins's avatar
      mm: fix swap cache node allocation mask · b647889f
      Hugh Dickins authored
      [ Upstream commit 243bce09 ]
      
      Chris Murphy reports that a slightly overcommitted load, testing swap
      and zram along with i915, splats and keeps on splatting, when it had
      better fail less noisily:
      
        gnome-shell: page allocation failure: order:0,
        mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
        nodemask=(null),cpuset=/,mems_allowed=0
        CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
        Call Trace:
          dump_stack+0x64/0x88
          warn_alloc.cold+0x75/0xd9
          __alloc_pages_slowpath.constprop.0+0xcfa/0xd30
          __alloc_pages_nodemask+0x2df/0x320
          alloc_slab_page+0x195/0x310
          allocate_slab+0x3c5/0x440
          ___slab_alloc+0x40c/0x5f0
          __slab_alloc+0x1c/0x30
          kmem_cache_alloc+0x20e/0x220
          xas_nomem+0x28/0x70
          add_to_swap_cache+0x321/0x400
          __read_swap_cache_async+0x105/0x240
          swap_cluster_readahead+0x22c/0x2e0
          shmem_swapin+0x8e/0xc0
          shmem_swapin_page+0x196/0x740
          shmem_getpage_gfp+0x3a2/0xa60
          shmem_read_mapping_page_gfp+0x32/0x60
          shmem_get_pages+0x155/0x5e0 [i915]
          __i915_gem_object_get_pages+0x68/0xa0 [i915]
          i915_vma_pin+0x3fe/0x6c0 [i915]
          eb_add_vma+0x10b/0x2c0 [i915]
          i915_gem_do_execbuffer+0x704/0x3430 [i915]
          i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
          drm_ioctl_kernel+0x86/0xd0 [drm]
          drm_ioctl+0x206/0x390 [drm]
          ksys_ioctl+0x82/0xc0
          __x64_sys_ioctl+0x16/0x20
          do_syscall_64+0x5b/0xf0
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported on 5.7, but it goes back really to 3.1: when
      shmem_read_mapping_page_gfp() was implemented for use by i915, and
      allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
      missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
      __read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
      from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
      Fixes: 68da9f05 ("tmpfs: pass gfp to shmem_getpage_gfp")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: default avatarChris Murphy <lists@colorremedies.com>
      Analyzed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Analyzed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Tested-by: default avatarChris Murphy <lists@colorremedies.com>
      Cc: <stable@vger.kernel.org>	[3.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b647889f
    • Filipe Manana's avatar
      btrfs: fix data block group relocation failure due to concurrent scrub · b6a357f5
      Filipe Manana authored
      [ Upstream commit 432cd2a1 ]
      
      When running relocation of a data block group while scrub is running in
      parallel, it is possible that the relocation will fail and abort the
      current transaction with an -EINVAL error:
      
         [134243.988595] BTRFS info (device sdc): found 14 extents, stage: move data extents
         [134243.999871] ------------[ cut here ]------------
         [134244.000741] BTRFS: Transaction aborted (error -22)
         [134244.001692] WARNING: CPU: 0 PID: 26954 at fs/btrfs/ctree.c:1071 __btrfs_cow_block+0x6a7/0x790 [btrfs]
         [134244.003380] Modules linked in: btrfs blake2b_generic xor raid6_pq (...)
         [134244.012577] CPU: 0 PID: 26954 Comm: btrfs Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
         [134244.014162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
         [134244.016184] RIP: 0010:__btrfs_cow_block+0x6a7/0x790 [btrfs]
         [134244.017151] Code: 48 c7 c7 (...)
         [134244.020549] RSP: 0018:ffffa41607863888 EFLAGS: 00010286
         [134244.021515] RAX: 0000000000000000 RBX: ffff9614bdfe09c8 RCX: 0000000000000000
         [134244.022822] RDX: 0000000000000001 RSI: ffffffffb3d63980 RDI: 0000000000000001
         [134244.024124] RBP: ffff961589e8c000 R08: 0000000000000000 R09: 0000000000000001
         [134244.025424] R10: ffffffffc0ae5955 R11: 0000000000000000 R12: ffff9614bd530d08
         [134244.026725] R13: ffff9614ced41b88 R14: ffff9614bdfe2a48 R15: 0000000000000000
         [134244.028024] FS:  00007f29b63c08c0(0000) GS:ffff9615ba600000(0000) knlGS:0000000000000000
         [134244.029491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         [134244.030560] CR2: 00007f4eb339b000 CR3: 0000000130d6e006 CR4: 00000000003606f0
         [134244.031997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         [134244.033153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         [134244.034484] Call Trace:
         [134244.034984]  btrfs_cow_block+0x12b/0x2b0 [btrfs]
         [134244.035859]  do_relocation+0x30b/0x790 [btrfs]
         [134244.036681]  ? do_raw_spin_unlock+0x49/0xc0
         [134244.037460]  ? _raw_spin_unlock+0x29/0x40
         [134244.038235]  relocate_tree_blocks+0x37b/0x730 [btrfs]
         [134244.039245]  relocate_block_group+0x388/0x770 [btrfs]
         [134244.040228]  btrfs_relocate_block_group+0x161/0x2e0 [btrfs]
         [134244.041323]  btrfs_relocate_chunk+0x36/0x110 [btrfs]
         [134244.041345]  btrfs_balance+0xc06/0x1860 [btrfs]
         [134244.043382]  ? btrfs_ioctl_balance+0x27c/0x310 [btrfs]
         [134244.045586]  btrfs_ioctl_balance+0x1ed/0x310 [btrfs]
         [134244.045611]  btrfs_ioctl+0x1880/0x3760 [btrfs]
         [134244.049043]  ? do_raw_spin_unlock+0x49/0xc0
         [134244.049838]  ? _raw_spin_unlock+0x29/0x40
         [134244.050587]  ? __handle_mm_fault+0x11b3/0x14b0
         [134244.051417]  ? ksys_ioctl+0x92/0xb0
         [134244.052070]  ksys_ioctl+0x92/0xb0
         [134244.052701]  ? trace_hardirqs_off_thunk+0x1a/0x1c
         [134244.053511]  __x64_sys_ioctl+0x16/0x20
         [134244.054206]  do_syscall_64+0x5c/0x280
         [134244.054891]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
         [134244.055819] RIP: 0033:0x7f29b51c9dd7
         [134244.056491] Code: 00 00 00 (...)
         [134244.059767] RSP: 002b:00007ffcccc1dd08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
         [134244.061168] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f29b51c9dd7
         [134244.062474] RDX: 00007ffcccc1dda0 RSI: 00000000c4009420 RDI: 0000000000000003
         [134244.063771] RBP: 0000000000000003 R08: 00005565cea4b000 R09: 0000000000000000
         [134244.065032] R10: 0000000000000541 R11: 0000000000000202 R12: 00007ffcccc2060a
         [134244.066327] R13: 00007ffcccc1dda0 R14: 0000000000000002 R15: 00007ffcccc1dec0
         [134244.067626] irq event stamp: 0
         [134244.068202] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
         [134244.069351] hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
         [134244.070909] softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
         [134244.072392] softirqs last disabled at (0): [<0000000000000000>] 0x0
         [134244.073432] ---[ end trace bd7c03622e0b0a99 ]---
      
      The -EINVAL error comes from the following chain of function calls:
      
        __btrfs_cow_block() <-- aborts the transaction
          btrfs_reloc_cow_block()
            replace_file_extents()
              get_new_location() <-- returns -EINVAL
      
      When relocating a data block group, for each allocated extent of the block
      group, we preallocate another extent (at prealloc_file_extent_cluster()),
      associated with the data relocation inode, and then dirty all its pages.
      These preallocated extents have, and must have, the same size that extents
      from the data block group being relocated have.
      
      Later before we start the relocation stage that updates pointers (bytenr
      field of file extent items) to point to the the new extents, we trigger
      writeback for the data relocation inode. The expectation is that writeback
      will write the pages to the previously preallocated extents, that it
      follows the NOCOW path. That is generally the case, however, if a scrub
      is running it may have turned the block group that contains those extents
      into RO mode, in which case writeback falls back to the COW path.
      
      However in the COW path instead of allocating exactly one extent with the
      expected size, the allocator may end up allocating several smaller extents
      due to free space fragmentation - because we tell it at cow_file_range()
      that the minimum allocation size can match the filesystem's sector size.
      This later breaks the relocation's expectation that an extent associated
      to a file extent item in the data relocation inode has the same size as
      the respective extent pointed by a file extent item in another tree - in
      this case the extent to which the relocation inode poins to is smaller,
      causing relocation.c:get_new_location() to return -EINVAL.
      
      For example, if we are relocating a data block group X that has a logical
      address of X and the block group has an extent allocated at the logical
      address X + 128KiB with a size of 64KiB:
      
      1) At prealloc_file_extent_cluster() we allocate an extent for the data
         relocation inode with a size of 64KiB and associate it to the file
         offset 128KiB (X + 128KiB - X) of the data relocation inode. This
         preallocated extent was allocated at block group Z;
      
      2) A scrub running in parallel turns block group Z into RO mode and
         starts scrubing its extents;
      
      3) Relocation triggers writeback for the data relocation inode;
      
      4) When running delalloc (btrfs_run_delalloc_range()), we try first the
         NOCOW path because the data relocation inode has BTRFS_INODE_PREALLOC
         set in its flags. However, because block group Z is in RO mode, the
         NOCOW path (run_delalloc_nocow()) falls back into the COW path, by
         calling cow_file_range();
      
      5) At cow_file_range(), in the first iteration of the while loop we call
         btrfs_reserve_extent() to allocate a 64KiB extent and pass it a minimum
         allocation size of 4KiB (fs_info->sectorsize). Due to free space
         fragmentation, btrfs_reserve_extent() ends up allocating two extents
         of 32KiB each, each one on a different iteration of that while loop;
      
      6) Writeback of the data relocation inode completes;
      
      7) Relocation proceeds and ends up at relocation.c:replace_file_extents(),
         with a leaf which has a file extent item that points to the data extent
         from block group X, that has a logical address (bytenr) of X + 128KiB
         and a size of 64KiB. Then it calls get_new_location(), which does a
         lookup in the data relocation tree for a file extent item starting at
         offset 128KiB (X + 128KiB - X) and belonging to the data relocation
         inode. It finds a corresponding file extent item, however that item
         points to an extent that has a size of 32KiB, which doesn't match the
         expected size of 64KiB, resuling in -EINVAL being returned from this
         function and propagated up to __btrfs_cow_block(), which aborts the
         current transaction.
      
      To fix this make sure that at cow_file_range() when we call the allocator
      we pass it a minimum allocation size corresponding the desired extent size
      if the inode belongs to the data relocation tree, otherwise pass it the
      filesystem's sector size as the minimum allocation size.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b6a357f5
    • Anand Jain's avatar
      btrfs: cow_file_range() num_bytes and disk_num_bytes are same · 8714d9b4
      Anand Jain authored
      [ Upstream commit 3752d22f ]
      
      This patch deletes local variable disk_num_bytes as its value
      is same as num_bytes in the function cow_file_range().
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8714d9b4
    • Filipe Manana's avatar
      btrfs: fix a block group ref counter leak after failure to remove block group · a39010dc
      Filipe Manana authored
      [ Upstream commit 9fecd132 ]
      
      When removing a block group, if we fail to delete the block group's item
      from the extent tree, we jump to the 'out' label and end up decrementing
      the block group's reference count once only (by 1), resulting in a counter
      leak because the block group at that point was already removed from the
      block group cache rbtree - so we have to decrement the reference count
      twice, once for the rbtree and once for our lookup at the start of the
      function.
      
      There is a second bug where if removing the free space tree entries (the
      call to remove_block_group_free_space()) fails we end up jumping to the
      'out_put_group' label but end up decrementing the reference count only
      once, when we should have done it twice, since we have already removed
      the block group from the block group cache rbtree. This happens because
      the reference count decrement for the rbtree reference happens after
      attempting to remove the free space tree entries, which is far away from
      the place where we remove the block group from the rbtree.
      
      To make things less error prone, decrement the reference count for the
      rbtree immediately after removing the block group from it. This also
      eleminates the need for two different exit labels on error, renaming
      'out_put_label' to just 'out' and removing the old 'out'.
      
      Fixes: f6033c5e ("btrfs: fix block group leak when removing fails")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a39010dc
  2. 30 Jun, 2020 25 commits
    • Sasha Levin's avatar
      Linux 4.9.229 · df4674d8
      Sasha Levin authored
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      df4674d8
    • Greg Kroah-Hartman's avatar
      Revert "tty: hvc: Fix data abort due to race in hvc_open" · c496fa72
      Greg Kroah-Hartman authored
      commit cf9c9445 upstream.
      
      This reverts commit e2bd1dcb.
      
      In discussion on the mailing list, it has been determined that this is
      not the correct type of fix for this issue.  Revert it so that we can do
      this correctly.
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20200428032601.22127-1-rananta@codeaurora.org
      Cc: Raghavendra Rao Ananta <rananta@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c496fa72
    • Zheng Bin's avatar
      xfs: add agf freeblocks verify in xfs_agf_verify · 2bda23ca
      Zheng Bin authored
      [ Upstream commit d0c7feaf ]
      
      We recently used fuzz(hydra) to test XFS and automatically generate
      tmp.img(XFS v5 format, but some metadata is wrong)
      
      xfs_repair information(just one AG):
      agf_freeblks 0, counted 3224 in ag 0
      agf_longest 536874136, counted 3224 in ag 0
      sb_fdblocks 613, counted 3228
      
      Test as follows:
      mount tmp.img tmpdir
      cp file1M tmpdir
      sync
      
      In 4.19-stable, sync will stuck, the reason is:
      xfs_mountfs
        xfs_check_summary_counts
          if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
             XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
             !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
      	return 0;  -->just return, incore sb_fdblocks still be 613
          xfs_initialize_perag_data
      
      cp file1M tmpdir -->ok(write file to pagecache)
      sync -->stuck(write pagecache to disk)
      xfs_map_blocks
        xfs_iomap_write_allocate
          while (count_fsb != 0) {
            nimaps = 0;
            while (nimaps == 0) { --> endless loop
               nimaps = 1;
               xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again
      xfs_bmapi_write
        xfs_bmap_alloc
          xfs_bmap_btalloc
            xfs_alloc_vextent
              xfs_alloc_fix_freelist
                xfs_alloc_space_available -->fail(agf_freeblks is 0)
      
      In linux-next, sync not stuck, cause commit c2b31643 ("xfs:
      use the latest extent at writeback delalloc conversion time") remove
      the above while, dmesg is as follows:
      [   55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0.
      
      Users do not know why this page is discard, the better soultion is:
      1. Like xfs_repair, make sure sb_fdblocks is equal to counted
      (xfs_initialize_perag_data did this, who is not called at this mount)
      2. Add agf verify, if fail, will tell users to repair
      
      This patch use the second soultion.
      Signed-off-by: default avatarZheng Bin <zhengbin13@huawei.com>
      Signed-off-by: default avatarRen Xudong <renxudong1@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2bda23ca
    • Olga Kornievskaia's avatar
      NFSv4 fix CLOSE not waiting for direct IO compeletion · 5c85c81e
      Olga Kornievskaia authored
      commit d03727b2 upstream.
      
      Figuring out the root case for the REMOVE/CLOSE race and
      suggesting the solution was done by Neil Brown.
      
      Currently what happens is that direct IO calls hold a reference
      on the open context which is decremented as an asynchronous task
      in the nfs_direct_complete(). Before reference is decremented,
      control is returned to the application which is free to close the
      file. When close is being processed, it decrements its reference
      on the open_context but since directIO still holds one, it doesn't
      sent a close on the wire. It returns control to the application
      which is free to do other operations. For instance, it can delete a
      file. Direct IO is finally releasing its reference and triggering
      an asynchronous close. Which races with the REMOVE. On the server,
      REMOVE can be processed before the CLOSE, failing the REMOVE with
      EACCES as the file is still opened.
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Suggested-by: default avatarNeil Brown <neilb@suse.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c85c81e
    • Trond Myklebust's avatar
      pNFS/flexfiles: Fix list corruption if the mirror count changes · 86dfbc17
      Trond Myklebust authored
      commit 8b040137 upstream.
      
      If the mirror count changes in the new layout we pick up inside
      ff_layout_pg_init_write(), then we can end up adding the
      request to the wrong mirror and corrupting the mirror->pg_list.
      
      Fixes: d600ad1f ("NFS41: pop some layoutget errors to application")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86dfbc17
    • Chuck Lever's avatar
      SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment() · a20cecec
      Chuck Lever authored
      commit 89a3c9f5 upstream.
      
      @subbuf is an output parameter of xdr_buf_subsegment(). A survey of
      call sites shows that @subbuf is always uninitialized before
      xdr_buf_segment() is invoked by callers.
      
      There are some execution paths through xdr_buf_subsegment() that do
      not set all of the fields in @subbuf, leaving some pointer fields
      containing garbage addresses. Subsequent processing of that buffer
      then results in a page fault.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a20cecec
    • Vasily Averin's avatar
      sunrpc: fixed rollback in rpc_gssd_dummy_populate() · 04e6e060
      Vasily Averin authored
      commit b7ade381 upstream.
      
      __rpc_depopulate(gssd_dentry) was lost on error path
      
      cc: stable@vger.kernel.org
      Fixes: commit 4b9a445e ("sunrpc: create a new dummy pipe for gssd to hold open")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04e6e060
    • Denis Efremov's avatar
      drm/radeon: fix fb_div check in ni_init_smc_spll_table() · 6d3468bd
      Denis Efremov authored
      commit 35f760b4 upstream.
      
      clk_s is checked twice in a row in ni_init_smc_spll_table().
      fb_div should be checked instead.
      
      Fixes: 69e0b57a ("drm/radeon/kms: add dpm support for cayman (v5)")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDenis Efremov <efremov@linux.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d3468bd
    • Masami Hiramatsu's avatar
      tracing: Fix event trigger to accept redundant spaces · 587287ce
      Masami Hiramatsu authored
      commit 6784bead upstream.
      
      Fix the event trigger to accept redundant spaces in
      the trigger input.
      
      For example, these return -EINVAL
      
      echo " traceon" > events/ftrace/print/trigger
      echo "traceon  if common_pid == 0" > events/ftrace/print/trigger
      echo "disable_event:kmem:kmalloc " > events/ftrace/print/trigger
      
      But these are hard to find what is wrong.
      
      To fix this issue, use skip_spaces() to remove spaces
      in front of actual tokens, and set NULL if there is no
      token.
      
      Link: http://lkml.kernel.org/r/159262476352.185015.5261566783045364186.stgit@devnote2
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 85f2b082 ("tracing: Add basic event trigger framework")
      Reviewed-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      587287ce
    • Jiping Ma's avatar
      arm64: perf: Report the PC value in REGS_ABI_32 mode · 661a5532
      Jiping Ma authored
      commit 8dfe804a upstream.
      
      A 32-bit perf querying the registers of a compat task using REGS_ABI_32
      will receive zeroes from w15, when it expects to find the PC.
      
      Return the PC value for register dwarf register 15 when returning register
      values for a compat task to perf.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarJiping Ma <jiping.ma2@windriver.com>
      Link: https://lore.kernel.org/r/1589165527-188401-1-git-send-email-jiping.ma2@windriver.com
      [will: Shuffled code and added a comment]
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      661a5532
    • Junxiao Bi's avatar
      ocfs2: fix panic on nfs server over ocfs2 · d09a61a9
      Junxiao Bi authored
      commit e5a15e17 upstream.
      
      The following kernel panic was captured when running nfs server over
      ocfs2, at that time ocfs2_test_inode_bit() was checking whether one
      inode locating at "blkno" 5 was valid, that is ocfs2 root inode, its
      "suballoc_slot" was OCFS2_INVALID_SLOT(65535) and it was allocted from
      //global_inode_alloc, but here it wrongly assumed that it was got from per
      slot inode alloctor which would cause array overflow and trigger kernel
      panic.
      
        BUG: unable to handle kernel paging request at 0000000000001088
        IP: [<ffffffff816f6898>] _raw_spin_lock+0x18/0xf0
        PGD 1e06ba067 PUD 1e9e7d067 PMD 0
        Oops: 0002 [#1] SMP
        CPU: 6 PID: 24873 Comm: nfsd Not tainted 4.1.12-124.36.1.el6uek.x86_64 #2
        Hardware name: Huawei CH121 V3/IT11SGCA1, BIOS 3.87 02/02/2018
        RIP: _raw_spin_lock+0x18/0xf0
        RSP: e02b:ffff88005ae97908  EFLAGS: 00010206
        RAX: ffff88005ae98000 RBX: 0000000000001088 RCX: 0000000000000000
        RDX: 0000000000020000 RSI: 0000000000000009 RDI: 0000000000001088
        RBP: ffff88005ae97928 R08: 0000000000000000 R09: ffff880212878e00
        R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000001088
        R13: ffff8800063c0aa8 R14: ffff8800650c27d0 R15: 000000000000ffff
        FS:  0000000000000000(0000) GS:ffff880218180000(0000) knlGS:ffff880218180000
        CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000001088 CR3: 00000002033d0000 CR4: 0000000000042660
        Call Trace:
          igrab+0x1e/0x60
          ocfs2_get_system_file_inode+0x63/0x3a0 [ocfs2]
          ocfs2_test_inode_bit+0x328/0xa00 [ocfs2]
          ocfs2_get_parent+0xba/0x3e0 [ocfs2]
          reconnect_path+0xb5/0x300
          exportfs_decode_fh+0xf6/0x2b0
          fh_verify+0x350/0x660 [nfsd]
          nfsd4_putfh+0x4d/0x60 [nfsd]
          nfsd4_proc_compound+0x3d3/0x6f0 [nfsd]
          nfsd_dispatch+0xe0/0x290 [nfsd]
          svc_process_common+0x412/0x6a0 [sunrpc]
          svc_process+0x123/0x210 [sunrpc]
          nfsd+0xff/0x170 [nfsd]
          kthread+0xcb/0xf0
          ret_from_fork+0x61/0x90
        Code: 83 c2 02 0f b7 f2 e8 18 dc 91 ff 66 90 eb bf 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 48 89 fb ba 00 00 02 00 <f0> 0f c1 17 89 d0 45 31 e4 45 31 ed c1 e8 10 66 39 d0 41 89 c6
        RIP   _raw_spin_lock+0x18/0xf0
        CR2: 0000000000001088
        ---[ end trace 7264463cd1aac8f9 ]---
        Kernel panic - not syncing: Fatal exception
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-4-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d09a61a9
    • Junxiao Bi's avatar
      ocfs2: fix value of OCFS2_INVALID_SLOT · 5257ba36
      Junxiao Bi authored
      commit 9277f833 upstream.
      
      In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2
      implementation, slot number is 32 bits.  Usually this will not cause any
      issue, because slot number is converted from u16 to u32, but
      OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from
      disk was obtained, its value was (u16)-1, and it was converted to u32.
      Then the following checking in get_local_system_inode will be always
      skipped:
      
       static struct inode **get_local_system_inode(struct ocfs2_super *osb,
                                                     int type,
                                                     u32 slot)
       {
       	BUG_ON(slot == OCFS2_INVALID_SLOT);
      	...
       }
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-5-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5257ba36
    • Junxiao Bi's avatar
      ocfs2: load global_inode_alloc · 8b3f6157
      Junxiao Bi authored
      commit 7569d3c7 upstream.
      
      Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will
      make it load during mount.  It can be used to test whether some
      global/system inodes are valid.  One use case is that nfsd will test
      whether root inode is valid.
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-3-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b3f6157
    • Waiman Long's avatar
      mm/slab: use memzero_explicit() in kzfree() · b581a119
      Waiman Long authored
      commit 8982ae52 upstream.
      
      The kzfree() function is normally used to clear some sensitive
      information, like encryption keys, in the buffer before freeing it back to
      the pool.  Memset() is currently used for buffer clearing.  However
      unlikely, there is still a non-zero probability that the compiler may
      choose to optimize away the memory clearing especially if LTO is being
      used in the future.
      
      To make sure that this optimization will never happen,
      memzero_explicit(), which is introduced in v3.18, is now used in
      kzfree() to future-proof it.
      
      Link: http://lkml.kernel.org/r/20200616154311.12314-2-longman@redhat.com
      Fixes: 3ef0e5ba ("slab: introduce kzfree()")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b581a119
    • Xiaoyao Li's avatar
      KVM: X86: Fix MSR range of APIC registers in X2APIC mode · e3bca0f3
      Xiaoyao Li authored
      commit bf10bd0b upstream.
      
      Only MSR address range 0x800 through 0x8ff is architecturally reserved
      and dedicated for accessing APIC registers in x2APIC mode.
      
      Fixes: 0105d1a5 ("KVM: x2apic interface to lapic")
      Signed-off-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3bca0f3
    • Nathan Chancellor's avatar
      ACPI: sysfs: Fix pm_profile_attr type · 2523f245
      Nathan Chancellor authored
      commit e6d701dc upstream.
      
      When running a kernel with Clang's Control Flow Integrity implemented,
      there is a violation that happens when accessing
      /sys/firmware/acpi/pm_profile:
      
      $ cat /sys/firmware/acpi/pm_profile
      0
      
      $ dmesg
      ...
      [   17.352564] ------------[ cut here ]------------
      [   17.352568] CFI failure (target: acpi_show_profile+0x0/0x8):
      [   17.352572] WARNING: CPU: 3 PID: 497 at kernel/cfi.c:29 __cfi_check_fail+0x33/0x40
      [   17.352573] Modules linked in:
      [   17.352575] CPU: 3 PID: 497 Comm: cat Tainted: G        W         5.7.0-microsoft-standard+ #1
      [   17.352576] RIP: 0010:__cfi_check_fail+0x33/0x40
      [   17.352577] Code: 48 c7 c7 50 b3 85 84 48 c7 c6 50 0a 4e 84 e8 a4 d8 60 00 85 c0 75 02 5b c3 48 c7 c7 dc 5e 49 84 48 89 de 31 c0 e8 7d 06 eb ff <0f> 0b 5b c3 00 00 cc cc 00 00 cc cc 00 85 f6 74 25 41 b9 ea ff ff
      [   17.352577] RSP: 0018:ffffaa6dc3c53d30 EFLAGS: 00010246
      [   17.352578] RAX: 331267e0c06cee00 RBX: ffffffff83d85890 RCX: ffffffff8483a6f8
      [   17.352579] RDX: ffff9cceabbb37c0 RSI: 0000000000000082 RDI: ffffffff84bb9e1c
      [   17.352579] RBP: ffffffff845b2bc8 R08: 0000000000000001 R09: ffff9cceabbba200
      [   17.352579] R10: 000000000000019d R11: 0000000000000000 R12: ffff9cc947766f00
      [   17.352580] R13: ffffffff83d6bd50 R14: ffff9ccc6fa80000 R15: ffffffff845bd328
      [   17.352582] FS:  00007fdbc8d13580(0000) GS:ffff9cce91ac0000(0000) knlGS:0000000000000000
      [   17.352582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   17.352583] CR2: 00007fdbc858e000 CR3: 00000005174d0000 CR4: 0000000000340ea0
      [   17.352584] Call Trace:
      [   17.352586]  ? rev_id_show+0x8/0x8
      [   17.352587]  ? __cfi_check+0x45bac/0x4b640
      [   17.352589]  ? kobj_attr_show+0x73/0x80
      [   17.352590]  ? sysfs_kf_seq_show+0xc1/0x140
      [   17.352592]  ? ext4_seq_options_show.cfi_jt+0x8/0x8
      [   17.352593]  ? seq_read+0x180/0x600
      [   17.352595]  ? sysfs_create_file_ns.cfi_jt+0x10/0x10
      [   17.352596]  ? tlbflush_read_file+0x8/0x8
      [   17.352597]  ? __vfs_read+0x6b/0x220
      [   17.352598]  ? handle_mm_fault+0xa23/0x11b0
      [   17.352599]  ? vfs_read+0xa2/0x130
      [   17.352599]  ? ksys_read+0x6a/0xd0
      [   17.352601]  ? __do_sys_getpgrp+0x8/0x8
      [   17.352602]  ? do_syscall_64+0x72/0x120
      [   17.352603]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   17.352604] ---[ end trace 7b1fa81dc897e419 ]---
      
      When /sys/firmware/acpi/pm_profile is read, sysfs_kf_seq_show is called,
      which in turn calls kobj_attr_show, which gets the ->show callback
      member by calling container_of on attr (casting it to struct
      kobj_attribute) then calls it.
      
      There is a CFI violation because pm_profile_attr is of type
      struct device_attribute but kobj_attr_show calls ->show expecting it
      to be from struct kobj_attribute. CFI checking ensures that function
      pointer types match when doing indirect calls. Fix pm_profile_attr to
      be defined in terms of kobj_attribute so there is no violation or
      mismatch.
      
      Fixes: 362b6460 ("ACPI: Export FADT pm_profile integer value to userspace")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1051Reported-by: default avataryuu ichii <byahu140@heisei.be>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2523f245
    • Aaron Plattner's avatar
      ALSA: hda: Add NVIDIA codec IDs 9a & 9d through a0 to patch table · 981bcb1d
      Aaron Plattner authored
      commit adb36a82 upstream.
      
      These IDs are for upcoming NVIDIA chips with audio functions that are largely
      similar to the existing ones.
      Signed-off-by: default avatarAaron Plattner <aplattner@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200611180845.39942-1-aplattner@nvidia.comSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      981bcb1d
    • Luis Chamberlain's avatar
      blktrace: break out of blktrace setup on concurrent calls · c032e65b
      Luis Chamberlain authored
      [ Upstream commit 1b0b2836 ]
      
      We use one blktrace per request_queue, that means one per the entire
      disk.  So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
      or just two calls on /dev/vda.
      
      We check for concurrent setup only at the very end of the blktrace setup though.
      
      If we try to run two concurrent blktraces on the same block device the
      second one will fail, and the first one seems to go on. However when
      one tries to kill the first one one will see things like this:
      
      The kernel will show these:
      
      ```
      debugfs: File 'dropped' in directory 'nvme1n1' already present!
      debugfs: File 'msg' in directory 'nvme1n1' already present!
      debugfs: File 'trace0' in directory 'nvme1n1' already present!
      ``
      
      And userspace just sees this error message for the second call:
      
      ```
      blktrace /dev/nvme1n1
      BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
      ```
      
      The first userspace process #1 will also claim that the files
      were taken underneath their nose as well. The files are taken
      away form the first process given that when the second blktrace
      fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
      This means that even if go-happy process #1 is waiting for blktrace
      data, we *have* been asked to take teardown the blktrace.
      
      This can easily be reproduced with break-blktrace [0] run_0005.sh test.
      
      Just break out early if we know we're already going to fail, this will
      prevent trying to create the files all over again, which we know still
      exist.
      
      [0] https://github.com/mcgrof/break-blktraceSigned-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c032e65b
    • Masahiro Yamada's avatar
      kbuild: improve cc-option to clean up all temporary files · 48427c3b
      Masahiro Yamada authored
      [ Upstream commit f2f02ebd ]
      
      When cc-option and friends evaluate compiler flags, the temporary file
      $$TMP is created as an output object, and automatically cleaned up.
      The actual file path of $$TMP is .<pid>.tmp, here <pid> is the process
      ID of $(shell ...) invoked from cc-option. (Please note $$$$ is the
      escape sequence of $$).
      
      Such garbage files are cleaned up in most cases, but some compiler flags
      create additional output files.
      
      For example, -gsplit-dwarf creates a .dwo file.
      
      When CONFIG_DEBUG_INFO_SPLIT=y, you will see a bunch of .<pid>.dwo files
      left in the top of build directories. You may not notice them unless you
      do 'ls -a', but the garbage files will increase every time you run 'make'.
      
      This commit changes the temporary object path to .tmp_<pid>/tmp, and
      removes .tmp_<pid> directory when exiting. Separate build artifacts such
      as *.dwo will be cleaned up all together because their file paths are
      usually determined based on the base name of the object.
      
      Another example is -ftest-coverage, which outputs the coverage data into
      <base-name-of-object>.gcno
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      48427c3b
    • Sven Schnelle's avatar
      s390/ptrace: fix setting syscall number · 31a2029d
      Sven Schnelle authored
      [ Upstream commit 873e5a76 ]
      
      When strace wants to update the syscall number, it sets GPR2
      to the desired number and updates the GPR via PTRACE_SETREGSET.
      It doesn't update regs->int_code which would cause the old syscall
      executed on syscall restart. As we cannot change the ptrace ABI and
      don't have a field for the interruption code, check whether the tracee
      is in a syscall and the last instruction was svc. In that case assume
      that the tracer wants to update the syscall number and copy the GPR2
      value to regs->int_code.
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      31a2029d
    • Zekun Shen's avatar
      net: alx: fix race condition in alx_remove · 70a5b8b3
      Zekun Shen authored
      [ Upstream commit e89df5c4 ]
      
      There is a race condition exist during termination. The path is
      alx_stop and then alx_remove. An alx_schedule_link_check could be called
      before alx_stop by interrupt handler and invoke alx_link_check later.
      Alx_stop frees the napis, and alx_remove cancels any pending works.
      If any of the work is scheduled before termination and invoked before
      alx_remove, a null-ptr-deref occurs because both expect alx->napis[i].
      
      This patch fix the race condition by moving cancel_work_sync functions
      before alx_free_napis inside alx_stop. Because interrupt handler can call
      alx_schedule_link_check again, alx_free_irq is moved before
      cancel_work_sync calls too.
      Signed-off-by: default avatarZekun Shen <bruceshenzk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      70a5b8b3
    • Ye Bin's avatar
      ata/libata: Fix usage of page address by page_address in ata_scsi_mode_select_xlat function · 5fef1b31
      Ye Bin authored
      [ Upstream commit f650ef61 ]
      
      BUG: KASAN: use-after-free in ata_scsi_mode_select_xlat+0x10bd/0x10f0
      drivers/ata/libata-scsi.c:4045
      Read of size 1 at addr ffff88803b8cd003 by task syz-executor.6/12621
      
      CPU: 1 PID: 12621 Comm: syz-executor.6 Not tainted 4.19.95 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.10.2-1ubuntu1 04/01/2014
      Call Trace:
      __dump_stack lib/dump_stack.c:77 [inline]
      dump_stack+0xac/0xee lib/dump_stack.c:118
      print_address_description+0x60/0x223 mm/kasan/report.c:253
      kasan_report_error mm/kasan/report.c:351 [inline]
      kasan_report mm/kasan/report.c:409 [inline]
      kasan_report.cold+0xae/0x2d8 mm/kasan/report.c:393
      ata_scsi_mode_select_xlat+0x10bd/0x10f0 drivers/ata/libata-scsi.c:4045
      ata_scsi_translate+0x2da/0x680 drivers/ata/libata-scsi.c:2035
      __ata_scsi_queuecmd drivers/ata/libata-scsi.c:4360 [inline]
      ata_scsi_queuecmd+0x2e4/0x790 drivers/ata/libata-scsi.c:4409
      scsi_dispatch_cmd+0x2ee/0x6c0 drivers/scsi/scsi_lib.c:1867
      scsi_queue_rq+0xfd7/0x1990 drivers/scsi/scsi_lib.c:2170
      blk_mq_dispatch_rq_list+0x1e1/0x19a0 block/blk-mq.c:1186
      blk_mq_do_dispatch_sched+0x147/0x3d0 block/blk-mq-sched.c:108
      blk_mq_sched_dispatch_requests+0x427/0x680 block/blk-mq-sched.c:204
      __blk_mq_run_hw_queue+0xbc/0x200 block/blk-mq.c:1308
      __blk_mq_delay_run_hw_queue+0x3c0/0x460 block/blk-mq.c:1376
      blk_mq_run_hw_queue+0x152/0x310 block/blk-mq.c:1413
      blk_mq_sched_insert_request+0x337/0x6c0 block/blk-mq-sched.c:397
      blk_execute_rq_nowait+0x124/0x320 block/blk-exec.c:64
      blk_execute_rq+0xc5/0x112 block/blk-exec.c:101
      sg_scsi_ioctl+0x3b0/0x6a0 block/scsi_ioctl.c:507
      sg_ioctl+0xd37/0x23f0 drivers/scsi/sg.c:1106
      vfs_ioctl fs/ioctl.c:46 [inline]
      file_ioctl fs/ioctl.c:501 [inline]
      do_vfs_ioctl+0xae6/0x1030 fs/ioctl.c:688
      ksys_ioctl+0x76/0xa0 fs/ioctl.c:705
      __do_sys_ioctl fs/ioctl.c:712 [inline]
      __se_sys_ioctl fs/ioctl.c:710 [inline]
      __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:710
      do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45c479
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89
      f7 48
      89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f
      83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fb0e9602c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007fb0e96036d4 RCX: 000000000045c479
      RDX: 0000000020000040 RSI: 0000000000000001 RDI: 0000000000000003
      RBP: 000000000076bfc0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 000000000000046d R14: 00000000004c6e1a R15: 000000000076bfcc
      
      Allocated by task 12577:
      set_track mm/kasan/kasan.c:460 [inline]
      kasan_kmalloc mm/kasan/kasan.c:553 [inline]
      kasan_kmalloc+0xbf/0xe0 mm/kasan/kasan.c:531
      __kmalloc+0xf3/0x1e0 mm/slub.c:3749
      kmalloc include/linux/slab.h:520 [inline]
      load_elf_phdrs+0x118/0x1b0 fs/binfmt_elf.c:441
      load_elf_binary+0x2de/0x4610 fs/binfmt_elf.c:737
      search_binary_handler fs/exec.c:1654 [inline]
      search_binary_handler+0x15c/0x4e0 fs/exec.c:1632
      exec_binprm fs/exec.c:1696 [inline]
      __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820
      do_execveat_common fs/exec.c:1866 [inline]
      do_execve fs/exec.c:1883 [inline]
      __do_sys_execve fs/exec.c:1964 [inline]
      __se_sys_execve fs/exec.c:1959 [inline]
      __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959
      do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 12577:
      set_track mm/kasan/kasan.c:460 [inline]
      __kasan_slab_free+0x129/0x170 mm/kasan/kasan.c:521
      slab_free_hook mm/slub.c:1370 [inline]
      slab_free_freelist_hook mm/slub.c:1397 [inline]
      slab_free mm/slub.c:2952 [inline]
      kfree+0x8b/0x1a0 mm/slub.c:3904
      load_elf_binary+0x1be7/0x4610 fs/binfmt_elf.c:1118
      search_binary_handler fs/exec.c:1654 [inline]
      search_binary_handler+0x15c/0x4e0 fs/exec.c:1632
      exec_binprm fs/exec.c:1696 [inline]
      __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820
      do_execveat_common fs/exec.c:1866 [inline]
      do_execve fs/exec.c:1883 [inline]
      __do_sys_execve fs/exec.c:1964 [inline]
      __se_sys_execve fs/exec.c:1959 [inline]
      __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959
      do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff88803b8ccf00
      which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 259 bytes inside of
      512-byte region [ffff88803b8ccf00, ffff88803b8cd100)
      The buggy address belongs to the page:
      page:ffffea0000ee3300 count:1 mapcount:0 mapping:ffff88806cc03080
      index:0xffff88803b8cc780 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 ffffea0001104080 0000000200000002 ffff88806cc03080
      raw: ffff88803b8cc780 00000000800c000b 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
      ffff88803b8ccf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ffff88803b8ccf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff88803b8cd000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ^
      ffff88803b8cd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ffff88803b8cd100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      You can refer to "https://www.lkml.org/lkml/2019/1/17/474" reproduce
      this error.
      
      The exception code is "bd_len = p[3];", "p" value is ffff88803b8cd000
      which belongs to the cache kmalloc-512 of size 512. The "page_address(sg_page(scsi_sglist(scmd)))"
      maybe from sg_scsi_ioctl function "buffer" which allocated by kzalloc, so "buffer"
      may not page aligned.
      This also looks completely buggy on highmem systems and really needs to use a
      kmap_atomic.      --Christoph Hellwig
      To address above bugs, Paolo Bonzini advise to simpler to just make a char array
      of size CACHE_MPAGE_LEN+8+8+4-2(or just 64 to make it easy), use sg_copy_to_buffer
      to copy from the sglist into the buffer, and workthere.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5fef1b31
    • Juri Lelli's avatar
      sched/core: Fix PI boosting between RT and DEADLINE tasks · 37b31f4f
      Juri Lelli authored
      [ Upstream commit 740797ce ]
      
      syzbot reported the following warning:
      
       WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
       enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
      
      At deadline.c:628 we have:
      
       623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
       624 {
       625 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
       626 	struct rq *rq = rq_of_dl_rq(dl_rq);
       627
       628 	WARN_ON(dl_se->dl_boosted);
       629 	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
              [...]
           }
      
      Which means that setup_new_dl_entity() has been called on a task
      currently boosted. This shouldn't happen though, as setup_new_dl_entity()
      is only called when the 'dynamic' deadline of the new entity
      is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
      condition.
      
      Digging through the PI code I noticed that what above might in fact happen
      if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
      first branch of boosting conditions we check only if a pi_task 'dynamic'
      deadline is earlier than mutex holder's and in this case we set mutex
      holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
      initialized if such tasks get boosted at some point (or if they become
      DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
      to 0 and this verifies the aforementioned condition.
      
      Fix it by checking that the potential donor task is actually (even if
      temporary because in turn boosted) running at DEADLINE priority before
      using its 'dynamic' deadline value.
      
      Fixes: 2d3d891d ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
      Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
      Signed-off-by: default avatarJuri Lelli <juri.lelli@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Tested-by: default avatarDaniel Wagner <dwagner@suse.de>
      Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomainSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      37b31f4f
    • Russell King's avatar
      netfilter: ipset: fix unaligned atomic access · 6423d903
      Russell King authored
      [ Upstream commit 71502846 ]
      
      When using ip_set with counters and comment, traffic causes the kernel
      to panic on 32-bit ARM:
      
      Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>]
      Unhandled fault: alignment exception (0x221) at 0xea08133c
      PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
      
      The problem occurs when we try to update the 64-bit counters - the
      faulting address above is not 64-bit aligned.  The problem occurs
      due to the way elements are allocated, for example:
      
      	set->dsize = ip_set_elem_len(set, tb, 0, 0);
      	map = ip_set_alloc(sizeof(*map) + elements * set->dsize);
      
      If the element has a requirement for a member to be 64-bit aligned,
      and set->dsize is not a multiple of 8, but is a multiple of four,
      then every odd numbered elements will be misaligned - and hitting
      an atomic64_add() on that element will cause the kernel to panic.
      
      ip_set_elem_len() must return a size that is rounded to the maximum
      alignment of any extension field stored in the element.  This change
      ensures that is the case.
      
      Fixes: 95ad1f4a ("netfilter: ipset: Fix extension alignment")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6423d903
    • Dan Carpenter's avatar
      usb: gadget: udc: Potential Oops in error handling code · 19a197ff
      Dan Carpenter authored
      [ Upstream commit e55f3c37 ]
      
      If this is in "transceiver" mode the the ->qwork isn't required and is
      a NULL pointer.  This can lead to a NULL dereference when we call
      destroy_workqueue(udc->qwork).
      
      Fixes: 3517c31a ("usb: gadget: mv_udc: use devm_xxx for probe")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      19a197ff