1. 13 Nov, 2018 7 commits
    • Helge Deller's avatar
      parisc: Fix map_pages() to not overwrite existing pte entries · d35b161d
      Helge Deller authored
      commit 3c229b3f upstream.
      
      Fix a long-existing small nasty bug in the map_pages() implementation which
      leads to overwriting already written pte entries with zero, *if* map_pages() is
      called a second time with an end address which isn't aligned on a pmd boundry.
      This happens for example if we want to remap only the text segment read/write
      in order to run alternative patching on the code. Exiting the loop when we
      reach the end address fixes this.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d35b161d
    • John David Anglin's avatar
      parisc: Fix address in HPMC IVA · 662d2aef
      John David Anglin authored
      commit 1138b671 upstream.
      
      Helge noticed that the address of the os_hpmc handler was not being
      correctly calculated in the hpmc macro.  As a result, PDCE_CHECK would
      fail to call os_hpmc:
      
      <Cpu2> e800009802e00000  0000000000000000  CC_ERR_CHECK_HPMC
      <Cpu2> 37000f7302e00000  8040004000000000  CC_ERR_CPU_CHECK_SUMMARY
      <Cpu2> f600105e02e00000  fffffff0f0c00000  CC_MC_HPMC_MONARCH_SELECTED
      <Cpu2> 140003b202e00000  000000000000000b  CC_ERR_HPMC_STATE_ENTRY
      <Cpu2> 5600100b02e00000  00000000000001a0  CC_MC_OS_HPMC_LEN_ERR
      <Cpu2> 5600106402e00000  fffffff0f0438e70  CC_MC_BR_TO_OS_HPMC_FAILED
      <Cpu2> e800009802e00000  0000000000000000  CC_ERR_CHECK_HPMC
      <Cpu2> 37000f7302e00000  8040004000000000  CC_ERR_CPU_CHECK_SUMMARY
      <Cpu2> 4000109f02e00000  0000000000000000  CC_MC_HPMC_INITIATED
      <Cpu2> 4000101902e00000  0000000000000000  CC_MC_MULTIPLE_HPMCS
      <Cpu2> 030010d502e00000  0000000000000000  CC_CPU_STOP
      
      The address problem can be seen by dumping the fault vector:
      
      0000000040159000 <fault_vector_20>:
          40159000:   63 6f 77 73     stb r15,-2447(dp)
          40159004:   20 63 61 6e     ldil L%b747000,r3
          40159008:   20 66 6c 79     ldil L%-1c3b3000,r3
              ...
          40159020:   08 00 02 40     nop
          40159024:   20 6e 60 02     ldil L%15d000,r3
          40159028:   34 63 00 00     ldo 0(r3),r3
          4015902c:   e8 60 c0 02     bv,n r0(r3)
          40159030:   08 00 02 40     nop
          40159034:   00 00 00 00     break 0,0
          40159038:   c0 00 70 00     bb,*< r0,sar,40159840 <fault_vector_20+0x840>
          4015903c:   00 00 00 00     break 0,0
      
      Location 40159038 should contain the physical address of os_hpmc:
      
      000000004015d000 <os_hpmc>:
          4015d000:   08 1a 02 43     copy r26,r3
          4015d004:   01 c0 08 a4     mfctl iva,r4
          4015d008:   48 85 00 68     ldw 34(r4),r5
      
      This patch moves the address setup into initialize_ivt to resolve the
      above problem.  I tested the change by dumping the HPMC entry after setup:
      
      0000000040209020:  8000240
      0000000040209024: 206a2004
      0000000040209028: 34630ac0
      000000004020902c: e860c002
      0000000040209030:  8000240
      0000000040209034: 1bdddce6
      0000000040209038:   15d000
      000000004020903c:      1a0
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      662d2aef
    • Jan Glauber's avatar
      ipmi: Fix timer race with module unload · 9aba7ddf
      Jan Glauber authored
      commit 0711e8c1 upstream.
      
      Please note that below oops is from an older kernel, but the same
      race seems to be present in the upstream kernel too.
      
      ---8<---
      
      The following panic was encountered during removing the ipmi_ssif
      module:
      
      [ 526.352555] Unable to handle kernel paging request at virtual address ffff000006923090
      [ 526.360464] Mem abort info:
      [ 526.363257] ESR = 0x86000007
      [ 526.366304] Exception class = IABT (current EL), IL = 32 bits
      [ 526.372221] SET = 0, FnV = 0
      [ 526.375269] EA = 0, S1PTW = 0
      [ 526.378405] swapper pgtable: 4k pages, 48-bit VAs, pgd = 000000008ae60416
      [ 526.385185] [ffff000006923090] *pgd=000000bffcffe803, *pud=000000bffcffd803, *pmd=0000009f4731a003, *pte=0000000000000000
      [ 526.396141] Internal error: Oops: 86000007 [#1] SMP
      [ 526.401008] Modules linked in: nls_iso8859_1 ipmi_devintf joydev input_leds ipmi_msghandler shpchp sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i2c_smbus hid_generic usbhid uas hid usb_storage ast aes_ce_blk i2c_algo_bit aes_ce_cipher qede ttm crc32_ce ptp crct10dif_ce drm_kms_helper ghash_ce syscopyarea sha2_ce sysfillrect sysimgblt pps_core fb_sys_fops sha256_arm64 sha1_ce mpt3sas qed drm raid_class ahci scsi_transport_sas libahci gpio_xlp i2c_xlp9xx aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [last unloaded: ipmi_ssif]
      [ 526.468085] CPU: 125 PID: 0 Comm: swapper/125 Not tainted 4.15.0-35-generic #38~lp1775396+build.1
      [ 526.476942] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL022 08/14/2018
      [ 526.484932] pstate: 00400009 (nzcv daif +PAN -UAO)
      [ 526.489713] pc : 0xffff000006923090
      [ 526.493198] lr : call_timer_fn+0x34/0x178
      [ 526.497194] sp : ffff000009b0bdd0
      [ 526.500496] x29: ffff000009b0bdd0 x28: 0000000000000082
      [ 526.505796] x27: 0000000000000002 x26: ffff000009515188
      [ 526.511096] x25: ffff000009515180 x24: ffff0000090f1018
      [ 526.516396] x23: ffff000009519660 x22: dead000000000200
      [ 526.521696] x21: ffff000006923090 x20: 0000000000000100
      [ 526.526995] x19: ffff809eeb466a40 x18: 0000000000000000
      [ 526.532295] x17: 000000000000000e x16: 0000000000000007
      [ 526.537594] x15: 0000000000000000 x14: 071c71c71c71c71c
      [ 526.542894] x13: 0000000000000000 x12: 0000000000000000
      [ 526.548193] x11: 0000000000000001 x10: ffff000009b0be88
      [ 526.553493] x9 : 0000000000000000 x8 : 0000000000000005
      [ 526.558793] x7 : ffff80befc1f8528 x6 : 0000000000000020
      [ 526.564092] x5 : 0000000000000040 x4 : 0000000020001b20
      [ 526.569392] x3 : 0000000000000000 x2 : ffff809eeb466a40
      [ 526.574692] x1 : ffff000006923090 x0 : ffff809eeb466a40
      [ 526.579992] Process swapper/125 (pid: 0, stack limit = 0x000000002eb50acc)
      [ 526.586854] Call trace:
      [ 526.589289] 0xffff000006923090
      [ 526.592419] expire_timers+0xc8/0x130
      [ 526.596070] run_timer_softirq+0xec/0x1b0
      [ 526.600070] __do_softirq+0x134/0x328
      [ 526.603726] irq_exit+0xc8/0xe0
      [ 526.606857] __handle_domain_irq+0x6c/0xc0
      [ 526.610941] gic_handle_irq+0x84/0x188
      [ 526.614679] el1_irq+0xe8/0x180
      [ 526.617822] cpuidle_enter_state+0xa0/0x328
      [ 526.621993] cpuidle_enter+0x34/0x48
      [ 526.625564] call_cpuidle+0x44/0x70
      [ 526.629040] do_idle+0x1b8/0x1f0
      [ 526.632256] cpu_startup_entry+0x2c/0x30
      [ 526.636174] secondary_start_kernel+0x11c/0x130
      [ 526.640694] Code: bad PC value
      [ 526.643800] ---[ end trace d020b0b8417c2498 ]---
      [ 526.648404] Kernel panic - not syncing: Fatal exception in interrupt
      [ 526.654778] SMP: stopping secondary CPUs
      [ 526.658734] Kernel Offset: disabled
      [ 526.662211] CPU features: 0x5800c38
      [ 526.665688] Memory Limit: none
      [ 526.668768] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      Prevent mod_timer from arming a timer that was already removed by
      del_timer during module unload.
      Signed-off-by: default avatarJan Glauber <jglauber@cavium.com>
      Cc: <stable@vger.kernel.org> # 3.19
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9aba7ddf
    • Maciej S. Szmigiero's avatar
      pcmcia: Implement CLKRUN protocol disabling for Ricoh bridges · c52a368d
      Maciej S. Szmigiero authored
      commit 95691e3e upstream.
      
      Currently, "disable_clkrun" yenta_socket module parameter is only
      implemented for TI CardBus bridges.
      Add also an implementation for Ricoh bridges that have the necessary
      setting documented in publicly available datasheets.
      
      Tested on a RL5C476II with a Sunrich C-160 CardBus NIC that doesn't work
      correctly unless the CLKRUN protocol is disabled.
      
      Let's also make it clear in its description that the "disable_clkrun"
      module parameter only works on these two previously mentioned brands of
      CardBus bridges.
      Signed-off-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c52a368d
    • Hou Tao's avatar
      jffs2: free jffs2_sb_info through jffs2_kill_sb() · 6a6b4164
      Hou Tao authored
      commit 92e2921f upstream.
      
      When an invalid mount option is passed to jffs2, jffs2_parse_options()
      will fail and jffs2_sb_info will be freed, but then jffs2_sb_info will
      be used (use-after-free) and freeed (double-free) in jffs2_kill_sb().
      
      Fix it by removing the buggy invocation of kfree() when getting invalid
      mount options.
      
      Fixes: 92abc475 ("jffs2: implement mount option parsing and compression overriding")
      Cc: stable@kernel.org
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Reviewed-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a6b4164
    • Dmitry Bazhenov's avatar
      hwmon: (pmbus) Fix page count auto-detection. · 4f173e69
      Dmitry Bazhenov authored
      commit e7c6a556 upstream.
      
      Devices with compatible="pmbus" field have zero initial page count,
      and pmbus_clear_faults() being called before the page count auto-
      detection does not actually clear faults because it depends on the
      page count. Non-cleared faults in its turn may fail the subsequent
      page count auto-detection.
      
      This patch fixes this problem by calling pmbus_clear_fault_page()
      for currently set page and calling pmbus_clear_faults() after the
      page count was detected.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Bazhenov <bazhenov.dn@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f173e69
    • Tang Junhui's avatar
      bcache: fix miss key refill->end in writeback · 0d38b808
      Tang Junhui authored
      commit 2d6cb6ed upstream.
      
      refill->end record the last key of writeback, for example, at the first
      time, keys (1,128K) to (1,1024K) are flush to the backend device, but
      the end key (1,1024K) is not included, since the bellow code:
      	if (bkey_cmp(k, refill->end) >= 0) {
      		ret = MAP_DONE;
      		goto out;
      	}
      And in the next time when we refill writeback keybuf again, we searched
      key start from (1,1024K), and got a key bigger than it, so the key
      (1,1024K) missed.
      This patch modify the above code, and let the end key to be included to
      the writeback key buffer.
      Signed-off-by: default avatarTang Junhui <tang.junhui.linux@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d38b808
  2. 10 Nov, 2018 33 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.136 · 0bb1a5e5
      Greg Kroah-Hartman authored
      0bb1a5e5
    • Thomas Gleixner's avatar
      posix-timers: Sanitize overrun handling · 65cb24de
      Thomas Gleixner authored
      [ Upstream commit 78c9c4df ]
      
      The posix timer overrun handling is broken because the forwarding functions
      can return a huge number of overruns which does not fit in an int. As a
      consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
      random number generators.
      
      The k_clock::timer_forward() callbacks return a 64 bit value now. Make
      k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
      accounting is correct. 3Remove the temporary (int) casts.
      
      Add a helper function which clamps the overrun value returned to user space
      via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
      between 0 and INT_MAX. INT_MAX is an indicator for user space that the
      overrun value has been clamped.
      Reported-by: default avatarTeam OWL337 <icytxw@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
      [florian: Make patch apply to v4.9.135]
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      65cb24de
    • Christophe Leroy's avatar
      net: fs_enet: do not call phy_stop() in interrupts · b0b77fb6
      Christophe Leroy authored
      [ Upstream commit f8b39039 ]
      
      In case of TX timeout, fs_timeout() calls phy_stop(), which
      triggers the following BUG_ON() as we are in interrupt.
      
      [92708.199889] kernel BUG at drivers/net/phy/mdio_bus.c:482!
      [92708.204985] Oops: Exception in kernel mode, sig: 5 [#1]
      [92708.210119] PREEMPT
      [92708.212107] CMPC885
      [92708.214216] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W       4.9.61 #39
      [92708.223227] task: c60f0a40 task.stack: c6104000
      [92708.227697] NIP: c02a84bc LR: c02a947c CTR: c02a93d8
      [92708.232614] REGS: c6105c70 TRAP: 0700   Tainted: G        W        (4.9.61)
      [92708.241193] MSR: 00021032 <ME,IR,DR,RI>[92708.244818]   CR: 24000822  XER: 20000000
      [92708.248767]
      GPR00: c02a947c c6105d20 c60f0a40 c62b4c00 00000005 0000001f c069aad8 0001a688
      GPR08: 00000007 00000100 c02a93d8 00000000 000005fc 00000000 c6213240 c06338e4
      GPR16: 00000001 c06330d4 c0633094 00000000 c0680000 c6104000 c6104000 00000000
      GPR24: 00000200 00000000 ffffffff 00000004 00000078 00009032 00000000 c62b4c00
      NIP [c02a84bc] mdiobus_read+0x20/0x74
      [92708.281517] LR [c02a947c] kszphy_config_intr+0xa4/0xc4
      [92708.286547] Call Trace:
      [92708.288980] [c6105d20] [c6104000] 0xc6104000 (unreliable)
      [92708.294339] [c6105d40] [c02a947c] kszphy_config_intr+0xa4/0xc4
      [92708.300098] [c6105d50] [c02a5330] phy_stop+0x60/0x9c
      [92708.305007] [c6105d60] [c02c84d0] fs_timeout+0xdc/0x110
      [92708.310197] [c6105d80] [c035cd48] dev_watchdog+0x268/0x2a0
      [92708.315593] [c6105db0] [c0060288] call_timer_fn+0x34/0x17c
      [92708.321014] [c6105dd0] [c00605f0] run_timer_softirq+0x21c/0x2e4
      [92708.326887] [c6105e50] [c001e19c] __do_softirq+0xf4/0x2f4
      [92708.332207] [c6105eb0] [c001e3c8] run_ksoftirqd+0x2c/0x40
      [92708.337560] [c6105ec0] [c003b420] smpboot_thread_fn+0x1f0/0x258
      [92708.343405] [c6105ef0] [c003745c] kthread+0xbc/0xd0
      [92708.348217] [c6105f40] [c000c400] ret_from_kernel_thread+0x5c/0x64
      [92708.354275] Instruction dump:
      [92708.357207] 7c0803a6 bbc10018 38210020 4e800020 7c0802a6 9421ffe0 54290024 bfc10018
      [92708.364865] 90010024 7c7f1b78 81290008 552902ee <0f090000> 3bc3002c 7fc3f378 90810008
      [92708.372711] ---[ end trace 42b05441616fafd7 ]---
      
      This patch moves fs_timeout() actions into an async worker.
      
      Fixes: commit 48257c4f ("Add fs_enet ethernet network driver, for several embedded platforms")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0b77fb6
    • Nathan Chancellor's avatar
      x86/time: Correct the attribute on jiffies' definition · b462075e
      Nathan Chancellor authored
      commit 53c13ba8 upstream.
      
      Clang warns that the declaration of jiffies in include/linux/jiffies.h
      doesn't match the definition in arch/x86/time/kernel.c:
      
      arch/x86/kernel/time.c:29:42: warning: section does not match previous declaration [-Wsection]
      __visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
                                               ^
      ./include/linux/cache.h:49:4: note: expanded from macro '__cacheline_aligned'
                       __section__(".data..cacheline_aligned")))
                       ^
      ./include/linux/jiffies.h:81:31: note: previous attribute is here
      extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies;
                                    ^
      ./arch/x86/include/asm/cache.h:20:2: note: expanded from macro '__cacheline_aligned_in_smp'
              __page_aligned_data
              ^
      ./include/linux/linkage.h:39:29: note: expanded from macro '__page_aligned_data'
      #define __page_aligned_data     __section(.data..page_aligned) __aligned(PAGE_SIZE)
                                      ^
      ./include/linux/compiler_attributes.h:233:56: note: expanded from macro '__section'
      #define __section(S)                    __attribute__((__section__(#S)))
                                                             ^
      1 warning generated.
      
      The declaration was changed in commit 7c30f352 ("jiffies.h: declare
      jiffies and jiffies_64 with ____cacheline_aligned_in_smp") but wasn't
      updated here. Make them match so Clang no longer warns.
      
      Fixes: 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181013005311.28617-1-natechancellor@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b462075e
    • Peter Zijlstra's avatar
      x86/percpu: Fix this_cpu_read() · 4fad9fd1
      Peter Zijlstra authored
      commit b59167ac upstream.
      
      Eric reported that a sequence count loop using this_cpu_read() got
      optimized out. This is wrong, this_cpu_read() must imply READ_ONCE()
      because the interface is IRQ-safe, therefore an interrupt can have
      changed the per-cpu value.
      
      Fixes: 7c3576d2 ("[PATCH] i386: Convert PDA into the percpu section")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: hpa@zytor.com
      Cc: eric.dumazet@gmail.com
      Cc: bp@alien8.de
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181011104019.748208519@infradead.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fad9fd1
    • Phil Auld's avatar
      sched/fair: Fix throttle_list starvation with low CFS quota · bc1fccc7
      Phil Auld authored
      commit baa9be4f upstream.
      
      With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
      distribute_cfs_runtime may not empty the throttled_list before it runs
      out of runtime to distribute. In that case, due to the change from
      c06f04c7 to put throttled entries at the head of the list, later entries
      on the list will starve.  Essentially, the same X processes will get pulled
      off the list, given CPU time and then, when expired, get put back on the
      head of the list where distribute_cfs_runtime will give runtime to the same
      set of processes leaving the rest.
      
      Fix the issue by setting a bit in struct cfs_bandwidth when
      distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
      decide to put the throttled entry on the tail or the head of the list.  The
      bit is set/cleared by the callers of distribute_cfs_runtime while they hold
      cfs_bandwidth->lock.
      
      This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
      the live system. In some cases you can simply look at the throttled list and
      see the later entries are not changing:
      
        crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
          1     ffff90b56cb2d200  -976050
          2     ffff90b56cb2cc00  -484925
          3     ffff90b56cb2bc00  -658814
          4     ffff90b56cb2ba00  -275365
          5     ffff90b166a45600  -135138
          6     ffff90b56cb2da00  -282505
          7     ffff90b56cb2e000  -148065
          8     ffff90b56cb2fa00  -872591
          9     ffff90b56cb2c000  -84687
         10     ffff90b56cb2f000  -87237
         11     ffff90b166a40a00  -164582
      
        crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
          1     ffff90b56cb2d200  -994147
          2     ffff90b56cb2cc00  -306051
          3     ffff90b56cb2bc00  -961321
          4     ffff90b56cb2ba00  -24490
          5     ffff90b166a45600  -135138
          6     ffff90b56cb2da00  -282505
          7     ffff90b56cb2e000  -148065
          8     ffff90b56cb2fa00  -872591
          9     ffff90b56cb2c000  -84687
         10     ffff90b56cb2f000  -87237
         11     ffff90b166a40a00  -164582
      
      Sometimes it is easier to see by finding a process getting starved and looking
      at the sched_info:
      
        crash> task ffff8eb765994500 sched_info
        PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
          sched_info = {
            pcount = 8,
            run_delay = 697094208,
            last_arrival = 240260125039,
            last_queued = 240260327513
          },
        crash> task ffff8eb765994500 sched_info
        PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
          sched_info = {
            pcount = 8,
            run_delay = 697094208,
            last_arrival = 240260125039,
            last_queued = 240260327513
          },
      Signed-off-by: default avatarPhil Auld <pauld@redhat.com>
      Reviewed-by: default avatarBen Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
      Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csbSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc1fccc7
    • Mikhail Nikiforov's avatar
      Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15IGM · 3ddf3c21
      Mikhail Nikiforov authored
      commit 13c1c5e4 upstream.
      
      Add ELAN061C to the ACPI table to support Elan touchpad found in Lenovo
      IdeaPad 330-15IGM.
      Signed-off-by: default avatarMikhail Nikiforov <jackxviichaos@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ddf3c21
    • Alan Stern's avatar
      USB: fix the usbfs flag sanitization for control transfers · bdbb426f
      Alan Stern authored
      commit 665c365a upstream.
      
      Commit 7a68d9fb ("USB: usbdevfs: sanitize flags more") checks the
      transfer flags for URBs submitted from userspace via usbfs.  However,
      the check for whether the USBDEVFS_URB_SHORT_NOT_OK flag should be
      allowed for a control transfer was added in the wrong place, before
      the code has properly determined the direction of the control
      transfer.  (Control transfers are special because for them, the
      direction is set by the bRequestType byte of the Setup packet rather
      than direction bit of the endpoint address.)
      
      This patch moves code which sets up the allow_short flag for control
      transfers down after is_in has been set to the correct value.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-and-tested-by: syzbot+24a30223a4b609bb802e@syzkaller.appspotmail.com
      Fixes: 7a68d9fb ("USB: usbdevfs: sanitize flags more")
      CC: Oliver Neukum <oneukum@suse.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdbb426f
    • Gustavo A. R. Silva's avatar
      usb: gadget: storage: Fix Spectre v1 vulnerability · 4121be59
      Gustavo A. R. Silva authored
      commit 9ae24af3 upstream.
      
      num can be indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/usb/gadget/function/f_mass_storage.c:3177 fsg_lun_make() warn:
      potential spectre issue 'fsg_opts->common->luns' [r] (local cap)
      
      Fix this by sanitizing num before using it to index
      fsg_opts->common->luns
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Acked-by: default avatarFelipe Balbi <felipe.balbi@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4121be59
    • Tobias Herzog's avatar
      cdc-acm: correct counting of UART states in serial state notification · 25c1b59c
      Tobias Herzog authored
      commit f976d0e5 upstream.
      
      The usb standard ("Universal Serial Bus Class Definitions for Communication
      Devices") distiguishes between "consistent signals" (DSR, DCD), and
      "irregular signals" (break, ring, parity error, framing error, overrun).
      The bits of "irregular signals" are set, if this error/event occurred on
      the device side and are immeadeatly unset, if the serial state notification
      was sent.
      Like other drivers of real serial ports do, just the occurence of those
      events should be counted in serial_icounter_struct (but no 1->0
      transitions).
      Signed-off-by: default avatarTobias Herzog <t-herzog@gmx.de>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25c1b59c
    • Gustavo A. R. Silva's avatar
      IB/ucm: Fix Spectre v1 vulnerability · 1fcfb1d4
      Gustavo A. R. Silva authored
      commit 0295e395 upstream.
      
      hdr.cmd can be indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential
      spectre issue 'ucm_cmd_table' [r] (local cap)
      
      Fix this by sanitizing hdr.cmd before using it to index
      ucm_cmd_table.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fcfb1d4
    • Gustavo A. R. Silva's avatar
      RDMA/ucma: Fix Spectre v1 vulnerability · eacbd9c5
      Gustavo A. R. Silva authored
      commit a3671a4f upstream.
      
      hdr.cmd can be indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential
      spectre issue 'ucma_cmd_table' [r] (local cap)
      
      Fix this by sanitizing hdr.cmd before using it to index
      ucm_cmd_table.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eacbd9c5
    • Kai-Heng Feng's avatar
      drm/edid: Add 6 bpc quirk for BOE panel in HP Pavilion 15-n233sl · f1b2b868
      Kai-Heng Feng authored
      commit 0711a43b upstream.
      
      There's another panel that reports "DFP 1.x compliant TMDS" but it
      supports 6bpc instead of 8 bpc.
      
      Apply 6 bpc quirk for the panel to fix it.
      
      BugLink: https://bugs.launchpad.net/bugs/1794387
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181002152911.4370-1-kai.heng.feng@canonical.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1b2b868
    • Gustavo A. R. Silva's avatar
      ptp: fix Spectre v1 vulnerability · 4dd400ed
      Gustavo A. R. Silva authored
      commit efa61c8c upstream.
      
      pin_index can be indirectly controlled by user-space, hence leading
      to a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/ptp/ptp_chardev.c:253 ptp_ioctl() warn: potential spectre issue
      'ops->pin_config' [r] (local cap)
      
      Fix this by sanitizing pin_index before using it to index
      ops->pin_config, and before passing it as an argument to
      function ptp_set_pinfunc(), in which it is used to index
      info->pin_config.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4dd400ed
    • Al Viro's avatar
      cachefiles: fix the race between cachefiles_bury_object() and rmdir(2) · 186c5856
      Al Viro authored
      commit 169b8033 upstream.
      
      the victim might've been rmdir'ed just before the lock_rename();
      unlike the normal callers, we do not look the source up after the
      parents are locked - we know it beforehand and just recheck that it's
      still the child of what used to be its parent.  Unfortunately,
      the check is too weak - we don't spot a dead directory since its
      ->d_parent is unchanged, dentry is positive, etc.  So we sail all
      the way to ->rename(), with hosting filesystems _not_ expecting
      to be asked renaming an rmdir'ed subdirectory.
      
      The fix is easy, fortunately - the lock on parent is sufficient for
      making IS_DEADDIR() on child safe.
      
      Cc: stable@vger.kernel.org
      Fixes: 9ae326a6 (CacheFiles: A cache that backs onto a mounted filesystem)
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      186c5856
    • Brian Foster's avatar
      xfs: truncate transaction does not modify the inobt · 9bb68aaf
      Brian Foster authored
      [ Upstream commit a606ebdb ]
      
      The truncate transaction does not ever modify the inode btree, but
      includes an associated log reservation. Update
      xfs_calc_itruncate_reservation() to remove the reservation
      associated with inobt updates.
      
      [Amir:	This commit was merged for kernel v4.16 and a twin commit was
      	merged for xfsprogs v4.16. As a result, a small xfs filesystem
      	formatted with features -m rmapbt=1,reflink=1 using mkfs.xfs
      	version >= v4.16 cannot be mounted with kernel < v4.16.
      
      	For example, xfstests generic/17{1,2,3} format a small fs and
      	when trying to mount it, they fail with an assert on this very
      	demonic line:
      
       XFS (vdc): Log size 3075 blocks too small, minimum size is 3717 blocks
       XFS (vdc): AAIEEE! Log failed size checks. Abort!
       XFS: Assertion failed: 0, file: src/linux/fs/xfs/xfs_log.c, line: 666
      
      	The simple solution for stable kernels is to apply this patch,
      	because mkfs.xfs v4.16 is already in the wild, so we have to
      	assume that xfs filesystems with a "too small" log exist.
      	Regardless, xfsprogs maintainers should also consider reverting
      	the twin patch to stop creating those filesystems for the sake
      	of users with unpatched kernels.]
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Cc: <stable@vger.kernel.org> # v4.9+
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: default avatarDarrick J . Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9bb68aaf
    • Linus Walleij's avatar
      gpio: mxs: Get rid of external API call · ee74e356
      Linus Walleij authored
      [ Upstream commit 833eacc7 ]
      
      The MXS driver was calling back into the GPIO API from
      its irqchip. This is not very elegant, as we are a driver,
      let's just shortcut back into the gpio_chip .get() function
      instead.
      
      This is a tricky case since the .get() callback is not in
      this file, instead assigned by bgpio_init(). Calling the
      function direcly in the gpio_chip is however the lesser
      evil.
      
      Cc: Sascha Hauer <s.hauer@pengutronix.de>
      Cc: Janusz Uzycki <j.uzycki@elproma.com.pl>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ee74e356
    • Ard Biesheuvel's avatar
      ahci: don't ignore result code of ahci_reset_controller() · 29872c3e
      Ard Biesheuvel authored
      [ Upstream commit d312fefe ]
      
      ahci_pci_reset_controller() calls ahci_reset_controller(), which may
      fail, but ignores the result code and always returns success. This
      may result in failures like below
      
        ahci 0000:02:00.0: version 3.0
        ahci 0000:02:00.0: enabling device (0000 -> 0003)
        ahci 0000:02:00.0: SSS flag set, parallel bus scan disabled
        ahci 0000:02:00.0: controller reset failed (0xffffffff)
        ahci 0000:02:00.0: failed to stop engine (-5)
          ... repeated many times ...
        ahci 0000:02:00.0: failed to stop engine (-5)
        Unable to handle kernel paging request at virtual address ffff0000093f9018
          ...
        PC is at ahci_stop_engine+0x5c/0xd8 [libahci]
        LR is at ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
          ...
        [<ffff000000a17014>] ahci_stop_engine+0x5c/0xd8 [libahci]
        [<ffff000000a196b4>] ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
        [<ffff000000a197d8>] ahci_init_controller+0x80/0x168 [libahci]
        [<ffff000000a260f8>] ahci_pci_init_controller+0x60/0x68 [ahci]
        [<ffff000000a26f94>] ahci_init_one+0x75c/0xd88 [ahci]
        [<ffff000008430324>] local_pci_probe+0x3c/0xb8
        [<ffff000008431728>] pci_device_probe+0x138/0x170
        [<ffff000008585e54>] driver_probe_device+0x2dc/0x458
        [<ffff0000085860e4>] __driver_attach+0x114/0x118
        [<ffff000008583ca8>] bus_for_each_dev+0x60/0xa0
        [<ffff000008585638>] driver_attach+0x20/0x28
        [<ffff0000085850b0>] bus_add_driver+0x1f0/0x2a8
        [<ffff000008586ae0>] driver_register+0x60/0xf8
        [<ffff00000842f9b4>] __pci_register_driver+0x3c/0x48
        [<ffff000000a3001c>] ahci_pci_driver_init+0x1c/0x1000 [ahci]
        [<ffff000008083918>] do_one_initcall+0x38/0x120
      
      where an obvious hardware level failure results in an unnecessary 15 second
      delay and a subsequent crash.
      
      So record the result code of ahci_reset_controller() and relay it, rather
      than ignoring it.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      29872c3e
    • Jia-Ju Bai's avatar
      crypto: shash - Fix a sleep-in-atomic bug in shash_setkey_unaligned · 79d47dd6
      Jia-Ju Bai authored
      [ Upstream commit 9039f3ef ]
      
      The SCTP program may sleep under a spinlock, and the function call path is:
      sctp_generate_t3_rtx_event (acquire the spinlock)
        sctp_do_sm
          sctp_side_effects
            sctp_cmd_interpreter
              sctp_make_init_ack
                sctp_pack_cookie
                  crypto_shash_setkey
                    shash_setkey_unaligned
                      kmalloc(GFP_KERNEL)
      
      For the same reason, the orinoco driver may sleep in interrupt handler,
      and the function call path is:
      orinoco_rx_isr_tasklet
        orinoco_rx
          orinoco_mic
            crypto_shash_setkey
              shash_setkey_unaligned
                kmalloc(GFP_KERNEL)
      
      To fix it, GFP_KERNEL is replaced with GFP_ATOMIC.
      This bug is found by my static analysis tool and my code review.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@163.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      79d47dd6
    • Sasha Levin's avatar
      Revert "x86/mm: Expand static page table for fixmap space" · eba69ae2
      Sasha Levin authored
      This reverts commit 3a8304b7, which was
      upstream commit 05ab1d8a.
      
      Ben Hutchings writes:
      
      This backport is incorrect.  The part that updated __startup_64() in
      arch/x86/kernel/head64.c was dropped, presumably because that function
      doesn't exist in 4.9.  However that seems to be an essential of the
      fix.  In 4.9 the startup_64 routine in arch/x86/kernel/head_64.S would
      need to be changed instead.
      
      I also found that this introduces new boot-time warnings on some
      systems if CONFIG_DEBUG_WX is enabled.
      
      So, unless someone provides fixes for those issues, I think this should
      be reverted for the 4.9 branch.
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eba69ae2
    • Stefano Brivio's avatar
      ip6_tunnel: Fix encapsulation layout · ced272d8
      Stefano Brivio authored
      [ Upstream commit d4d576f5 ]
      
      Commit 058214a4 ("ip6_tun: Add infrastructure for doing
      encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
      the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
      Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.
      
      As long as the option didn't actually end up in generated packets, this
      wasn't an issue. Then commit 89a23c8b ("ip6_tunnel: Fix missing tunnel
      encapsulation limit option") fixed sending of this option, and the
      resulting layout, e.g. for FoU, is:
      
      .-------------------.------------.----------.-------------------.----- - -
      | Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
      '-------------------'------------'----------'-------------------'----- - -
      
      Needless to say, FoU and GUE (at least) won't work over IPv6. The option
      is appended by default, and I couldn't find a way to disable it with the
      current iproute2.
      
      Turn this into a more reasonable:
      
      .-------------------.----------.------------.-------------------.----- - -
      | Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
      '-------------------'----------'------------'-------------------'----- - -
      
      With this, and with 84dad559 ("udp6: fix encap return code for
      resubmitting"), FoU and GUE work again over IPv6.
      
      Fixes: 058214a4 ("ip6_tun: Add infrastructure for doing encapsulation")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ced272d8
    • Ido Schimmel's avatar
      rtnetlink: Disallow FDB configuration for non-Ethernet device · 9819741a
      Ido Schimmel authored
      [ Upstream commit da715775 ]
      
      When an FDB entry is configured, the address is validated to have the
      length of an Ethernet address, but the device for which the address is
      configured can be of any type.
      
      The above can result in the use of uninitialized memory when the address
      is later compared against existing addresses since 'dev->addr_len' is
      used and it may be greater than ETH_ALEN, as with ip6tnl devices.
      
      Fix this by making sure that FDB entries are only configured for
      Ethernet devices.
      
      BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
      CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x14b/0x190 lib/dump_stack.c:113
        kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
        __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
        memcmp+0x11d/0x180 lib/string.c:863
        dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
        ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
        rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
        rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
        netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
        rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
        netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
        netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
        netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x440ee9
      Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
      R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
        kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
        kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
        slab_post_alloc_hook mm/slab.h:446 [inline]
        slab_alloc_node mm/slub.c:2718 [inline]
        __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:996 [inline]
        netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
        netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      v2:
      * Make error message more specific (David)
      
      Fixes: 090096bf ("net: generic fdb support for drivers without ndo_fdb_<op>")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9819741a
    • Dimitris Michailidis's avatar
      net: fix pskb_trim_rcsum_slow() with odd trim offset · 0c49b5e5
      Dimitris Michailidis authored
      [ Upstream commit d55bef50 ]
      
      We've been getting checksum errors involving small UDP packets, usually
      59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
      has been complaining that HW is providing bad checksums. Turns out the
      problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98
      ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
      
      The source of the problem is that when the bytes we are trimming start
      at an odd address, as in the case of the 1 padding byte above,
      skb_checksum() returns a byte-swapped value. We cannot just combine this
      with skb->csum using csum_sub(). We need to use csum_block_sub() here
      that takes into account the parity of the start address and handles the
      swapping.
      
      Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
      
      Fixes: 88078d98 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Signed-off-by: default avatarDimitris Michailidis <dmichail@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c49b5e5
    • Cong Wang's avatar
      net: drop skb on failure in ip_check_defrag() · 0376b839
      Cong Wang authored
      [ Upstream commit 7de414a9 ]
      
      Most callers of pskb_trim_rcsum() simply drop the skb when
      it fails, however, ip_check_defrag() still continues to pass
      the skb up to stack. This is suspicious.
      
      In ip_check_defrag(), after we learn the skb is an IP fragment,
      passing the skb to callers makes no sense, because callers expect
      fragments are defrag'ed on success. So, dropping the skb when we
      can't defrag it is reasonable.
      
      Note, prior to commit 88078d98, this is not a big problem as
      checksum will be fixed up anyway. After it, the checksum is not
      correct on failure.
      
      Found this during code review.
      
      Fixes: 88078d98 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0376b839
    • Tobias Jungel's avatar
      bonding: fix length of actor system · 210c21f1
      Tobias Jungel authored
      [ Upstream commit 414dd6fb ]
      
      The attribute IFLA_BOND_AD_ACTOR_SYSTEM is sent to user space having the
      length of sizeof(bond->params.ad_actor_system) which is 8 byte. This
      patch aligns the length to ETH_ALEN to have the same MAC address exposed
      as using sysfs.
      
      Fixes: f87fda00 ("bonding: prevent out of bound accesses")
      Signed-off-by: default avatarTobias Jungel <tobias.jungel@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      210c21f1
    • Wenwen Wang's avatar
      ethtool: fix a privilege escalation bug · f0223d1f
      Wenwen Wang authored
      [ Upstream commit 58f5bbe3 ]
      
      In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the
      use-space buffer 'useraddr' and checked to see whether it is
      ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from
      the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next,
      according to 'sub_cmd', a permission check is enforced through the function
      ns_capable(). For example, the permission check is required if 'sub_cmd' is
      ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is
      ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be
      done by anyone". The following execution invokes different handlers
      according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE,
      ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel
      object 'per_queue_opt' is copied again from the user-space buffer
      'useraddr' and 'per_queue_opt.sub_command' is used to determine which
      operation should be performed. Given that the buffer 'useraddr' is in the
      user space, a malicious user can race to change the sub-command between the
      two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and
      ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then
      before ethtool_set_per_queue() is called, the attacker changes
      ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can
      bypass the permission check and execute ETHTOOL_SCOALESCE.
      
      This patch enforces a check in ethtool_set_per_queue() after the second
      copy from 'useraddr'. If the sub-command is different from the one obtained
      in the first copy in dev_ethtool(), an error code EINVAL will be returned.
      
      Fixes: f38d138a ("net/ethtool: support set coalesce per queue")
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0223d1f
    • Jason Wang's avatar
      vhost: Fix Spectre V1 vulnerability · 242e6f52
      Jason Wang authored
      [ Upstream commit ff002269 ]
      
      The idx in vhost_vring_ioctl() was controlled by userspace, hence a
      potential exploitation of the Spectre variant 1 vulnerability.
      
      Fixing this by sanitizing idx before using it to index d->vqs.
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      242e6f52
    • Marcelo Ricardo Leitner's avatar
      sctp: fix race on sctp_id2asoc · 1b0bb7e5
      Marcelo Ricardo Leitner authored
      [ Upstream commit b336deca ]
      
      syzbot reported an use-after-free involving sctp_id2asoc.  Dmitry Vyukov
      helped to root cause it and it is because of reading the asoc after it
      was freed:
      
              CPU 1                       CPU 2
      (working on socket 1)            (working on socket 2)
      	                         sctp_association_destroy
      sctp_id2asoc
         spin lock
           grab the asoc from idr
         spin unlock
                                         spin lock
      				     remove asoc from idr
      				   spin unlock
      				   free(asoc)
         if asoc->base.sk != sk ... [*]
      
      This can only be hit if trying to fetch asocs from different sockets. As
      we have a single IDR for all asocs, in all SCTP sockets, their id is
      unique on the system. An application can try to send stuff on an id
      that matches on another socket, and the if in [*] will protect from such
      usage. But it didn't consider that as that asoc may belong to another
      socket, it may be freed in parallel (read: under another socket lock).
      
      We fix it by moving the checks in [*] into the protected region. This
      fixes it because the asoc cannot be freed while the lock is held.
      
      Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b0bb7e5
    • Heiner Kallweit's avatar
      r8169: fix NAPI handling under high load · 01a2ff11
      Heiner Kallweit authored
      [ Upstream commit 6b839b6c ]
      
      rtl_rx() and rtl_tx() are called only if the respective bits are set
      in the interrupt status register. Under high load NAPI may not be
      able to process all data (work_done == budget) and it will schedule
      subsequent calls to the poll callback.
      rtl_ack_events() however resets the bits in the interrupt status
      register, therefore subsequent calls to rtl8169_poll() won't call
      rtl_rx() and rtl_tx() - chip interrupts are still disabled.
      
      Fix this by calling rtl_rx() and rtl_tx() independent of the bits
      set in the interrupt status register. Both functions will detect
      if there's nothing to do for them.
      
      Fixes: da78dbff ("r8169: remove work from irq handler.")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01a2ff11
    • Sean Tranchetti's avatar
      net: udp: fix handling of CHECKSUM_COMPLETE packets · 33424e7c
      Sean Tranchetti authored
      [ Upstream commit db4f1be3 ]
      
      Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
      incorrect for any packet that has an incorrect checksum value.
      
      udp4/6_csum_init() will both make a call to
      __skb_checksum_validate_complete() to initialize/validate the csum
      field when receiving a CHECKSUM_COMPLETE packet. When this packet
      fails validation, skb->csum will be overwritten with the pseudoheader
      checksum so the packet can be fully validated by software, but the
      skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
      the stack can later warn the user about their hardware spewing bad
      checksums. Unfortunately, leaving the SKB in this state can cause
      problems later on in the checksum calculation.
      
      Since the the packet is still marked as CHECKSUM_COMPLETE,
      udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
      from skb->csum instead of adding it, leaving us with a garbage value
      in that field. Once we try to copy the packet to userspace in the
      udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
      to checksum the packet data and add it in the garbage skb->csum value
      to perform our final validation check.
      
      Since the value we're validating is not the proper checksum, it's possible
      that the folded value could come out to 0, causing us not to drop the
      packet. Instead, we believe that the packet was checksummed incorrectly
      by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
      to warn the user with netdev_rx_csum_fault(skb->dev);
      
      Unfortunately, since this is the UDP path, skb->dev has been overwritten
      by skb->dev_scratch and is no longer a valid pointer, so we end up
      reading invalid memory.
      
      This patch addresses this problem in two ways:
      	1) Do not use the dev pointer when calling netdev_rx_csum_fault()
      	   from skb_copy_and_csum_datagram_msg(). Since this gets called
      	   from the UDP path where skb->dev has been overwritten, we have
      	   no way of knowing if the pointer is still valid. Also for the
      	   sake of consistency with the other uses of
      	   netdev_rx_csum_fault(), don't attempt to call it if the
      	   packet was checksummed by software.
      
      	2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
      	   If we receive a packet that's CHECKSUM_COMPLETE that fails
      	   verification (i.e. skb->csum_valid == 0), check who performed
      	   the calculation. It's possible that the checksum was done in
      	   software by the network stack earlier (such as Netfilter's
      	   CONNTRACK module), and if that says the checksum is bad,
      	   we can drop the packet immediately instead of waiting until
      	   we try and copy it to userspace. Otherwise, we need to
      	   mark the SKB as CHECKSUM_NONE, since the skb->csum field
      	   no longer contains the full packet checksum after the
      	   call to __skb_checksum_validate_complete().
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Fixes: c84d9490 ("udp: copy skb->truesize in the first cache line")
      Cc: Sam Kumar <samanthakumar@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33424e7c
    • Niklas Cassel's avatar
      net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules · 8b9a62c5
      Niklas Cassel authored
      [ Upstream commit 30549aab ]
      
      When building stmmac, it is only possible to select CONFIG_DWMAC_GENERIC,
      or any of the glue drivers, when CONFIG_STMMAC_PLATFORM is set.
      The only exception is CONFIG_STMMAC_PCI.
      
      When calling of_mdiobus_register(), it will call our ->reset()
      callback, which is set to stmmac_mdio_reset().
      
      Most of the code in stmmac_mdio_reset() is protected by a
      "#if defined(CONFIG_STMMAC_PLATFORM)", which will evaluate
      to false when CONFIG_STMMAC_PLATFORM=m.
      
      Because of this, the phy reset gpio will only be pulled when
      stmmac is built as built-in, but not when built as modules.
      
      Fix this by using "#if IS_ENABLED()" instead of "#if defined()".
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b9a62c5
    • Wenwen Wang's avatar
      net: socket: fix a missing-check bug · f57ef24f
      Wenwen Wang authored
      [ Upstream commit b6168562 ]
      
      In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
      statement to see whether it is necessary to pre-process the ethtool
      structure, because, as mentioned in the comment, the structure
      ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
      is allocated through compat_alloc_user_space(). One thing to note here is
      that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
      partially determined by 'rule_cnt', which is actually acquired from the
      user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
      get_user(). After 'rxnfc' is allocated, the data in the original user-space
      buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
      including the 'rule_cnt' field. However, after this copy, no check is
      re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
      race to change the value in the 'compat_rxnfc->rule_cnt' between these two
      copies. Through this way, the attacker can bypass the previous check on
      'rule_cnt' and inject malicious data. This can cause undefined behavior of
      the kernel and introduce potential security risk.
      
      This patch avoids the above issue via copying the value acquired by
      get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f57ef24f
    • Jakub Kicinski's avatar
      net: sched: gred: pass the right attribute to gred_change_table_def() · 3628c3dd
      Jakub Kicinski authored
      [ Upstream commit 38b4f18d ]
      
      gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
      and expects it will be able to interpret its contents as
      struct tc_gred_sopt.  Pass the correct gred attribute, instead of
      TCA_OPTIONS.
      
      This bug meant the table definition could never be changed after
      Qdisc was initialized (unless whatever TCA_OPTIONS contained both
      passed netlink validation and was a valid struct tc_gred_sopt...).
      
      Old behaviour:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      RTNETLINK answers: Invalid argument
      
      Now:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      
      Fixes: f62d6b93 ("[PKT_SCHED]: GRED: Use central VQ change procedure")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3628c3dd