1. 11 Aug, 2019 32 commits
    • Thomas Gleixner's avatar
      x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS · 2224e894
      Thomas Gleixner authored
      commit f36cf386 upstream.
      
      Intel provided the following information:
      
       On all current Atom processors, instructions that use a segment register
       value (e.g. a load or store) will not speculatively execute before the
       last writer of that segment retires. Thus they will not use a
       speculatively written segment value.
      
      That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
      entry paths can be excluded from the extra LFENCE if PTI is disabled.
      
      Create a separate bug flag for the through SWAPGS speculation and mark all
      out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
      are excluded from the whole mitigation mess anyway.
      Reported-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      [bwh: Backported to 4.4:
       - There's no whitelist entry (or any support) for Hygon CPUs
       - Adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2224e894
    • Josh Poimboeuf's avatar
      x86/entry/64: Use JMP instead of JMPQ · 6583ecce
      Josh Poimboeuf authored
      commit 64dbc122 upstream.
      
      Somehow the swapgs mitigation entry code patch ended up with a JMPQ
      instruction instead of JMP, where only the short jump is needed.  Some
      assembler versions apparently fail to optimize JMPQ into a two-byte JMP
      when possible, instead always using a 7-byte JMP with relocation.  For
      some reason that makes the entry code explode with a #GP during boot.
      
      Change it back to "JMP" as originally intended.
      
      Fixes: 18ec54fd ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6583ecce
    • Josh Poimboeuf's avatar
      x86/speculation: Enable Spectre v1 swapgs mitigations · 90d45f08
      Josh Poimboeuf authored
      commit a2059825 upstream.
      
      The previous commit added macro calls in the entry code which mitigate the
      Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
      enabled.  Enable those features where applicable.
      
      The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
      
      There are different features which can affect the risk of attack:
      
      - When FSGSBASE is enabled, unprivileged users are able to place any
        value in GS, using the wrgsbase instruction.  This means they can
        write a GS value which points to any value in kernel space, which can
        be useful with the following gadget in an interrupt/exception/NMI
        handler:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	// dependent load or store based on the value of %reg
      	// for example: mov %(reg1), %reg2
      
        If an interrupt is coming from user space, and the entry code
        speculatively skips the swapgs (due to user branch mistraining), it
        may speculatively execute the GS-based load and a subsequent dependent
        load or store, exposing the kernel data to an L1 side channel leak.
      
        Note that, on Intel, a similar attack exists in the above gadget when
        coming from kernel space, if the swapgs gets speculatively executed to
        switch back to the user GS.  On AMD, this variant isn't possible
        because swapgs is serializing with respect to future GS-based
        accesses.
      
        NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
      	doesn't exist quite yet.
      
      - When FSGSBASE is disabled, the issue is mitigated somewhat because
        unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
        restricts GS values to user space addresses only.  That means the
        gadget would need an additional step, since the target kernel address
        needs to be read from user space first.  Something like:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	mov (%reg1), %reg2
      	// dependent load or store based on the value of %reg2
      	// for example: mov %(reg2), %reg3
      
        It's difficult to audit for this gadget in all the handlers, so while
        there are no known instances of it, it's entirely possible that it
        exists somewhere (or could be introduced in the future).  Without
        tooling to analyze all such code paths, consider it vulnerable.
      
        Effects of SMAP on the !FSGSBASE case:
      
        - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
          susceptible to Meltdown), the kernel is prevented from speculatively
          reading user space memory, even L1 cached values.  This effectively
          disables the !FSGSBASE attack vector.
      
        - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
          still prevents the kernel from speculatively reading user space
          memory.  But it does *not* prevent the kernel from reading the
          user value from L1, if it has already been cached.  This is probably
          only a small hurdle for an attacker to overcome.
      
      Thanks to Dave Hansen for contributing the speculative_smap() function.
      
      Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
      is serializing on AMD.
      
      [ tglx: Fixed the USER fence decision and polished the comment as suggested
        	by Dave Hansen ]
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      [bwh: Backported to 4.9:
       - Check for X86_FEATURE_KAISER instead of X86_FEATURE_PTI
       - mitigations= parameter is x86-only here
       - Adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90d45f08
    • Josh Poimboeuf's avatar
      x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations · e90ec5e2
      Josh Poimboeuf authored
      commit 18ec54fd upstream.
      
      Spectre v1 isn't only about array bounds checks.  It can affect any
      conditional checks.  The kernel entry code interrupt, exception, and NMI
      handlers all have conditional swapgs checks.  Those may be problematic in
      the context of Spectre v1, as kernel code can speculatively run with a user
      GS.
      
      For example:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg
      	mov (%reg), %reg1
      
      When coming from user space, the CPU can speculatively skip the swapgs, and
      then do a speculative percpu load using the user GS value.  So the user can
      speculatively force a read of any kernel value.  If a gadget exists which
      uses the percpu value as an address in another load/store, then the
      contents of the kernel value may become visible via an L1 side channel
      attack.
      
      A similar attack exists when coming from kernel space.  The CPU can
      speculatively do the swapgs, causing the user GS to get used for the rest
      of the speculative window.
      
      The mitigation is similar to a traditional Spectre v1 mitigation, except:
      
        a) index masking isn't possible; because the index (percpu offset)
           isn't user-controlled; and
      
        b) an lfence is needed in both the "from user" swapgs path and the
           "from kernel" non-swapgs path (because of the two attacks described
           above).
      
      The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
      CR3 write when PTI is enabled.  Since CR3 writes are serializing, the
      lfences can be skipped in those cases.
      
      On the other hand, the kernel entry swapgs paths don't depend on PTI.
      
      To avoid unnecessary lfences for the user entry case, create two separate
      features for alternative patching:
      
        X86_FEATURE_FENCE_SWAPGS_USER
        X86_FEATURE_FENCE_SWAPGS_KERNEL
      
      Use these features in entry code to patch in lfences where needed.
      
      The features aren't enabled yet, so there's no functional change.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      [bwh: Backported to 4.9:
       - Assign the CPU feature bits from word 7
       - Add FENCE_SWAPGS_KERNEL_ENTRY to NMI entry, since it does not
         use paranoid_entry
       - Include <asm/cpufeatures.h> in calling.h
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e90ec5e2
    • Ben Hutchings's avatar
      x86: cpufeatures: Sort feature word 7 · 7092a21c
      Ben Hutchings authored
      This will make it clearer which bits are allocated, in case we need to
      assign more feature bits for later backports.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7092a21c
    • Lukas Wunner's avatar
      spi: bcm2835: Fix 3-wire mode if DMA is enabled · a89b44e4
      Lukas Wunner authored
      commit 8d8bef50 upstream.
      
      Commit 6935224d ("spi: bcm2835: enable support of 3-wire mode")
      added 3-wire support to the BCM2835 SPI driver by setting the REN bit
      (Read Enable) in the CS register when receiving data.  The REN bit puts
      the transmitter in high-impedance state.  The driver recognizes that
      data is to be received by checking whether the rx_buf of a transfer is
      non-NULL.
      
      Commit 3ecd37ed ("spi: bcm2835: enable dma modes for transfers
      meeting certain conditions") subsequently broke 3-wire support because
      it set the SPI_MASTER_MUST_RX flag which causes spi_map_msg() to replace
      rx_buf with a dummy buffer if it is NULL.  As a result, rx_buf is
      *always* non-NULL if DMA is enabled.
      
      Reinstate 3-wire support by not only checking whether rx_buf is non-NULL,
      but also checking that it is not the dummy buffer.
      
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Reported-by: default avatarNuno Sá <nuno.sa@analog.com>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Acked-by: default avatarStefan Wahren <wahrenst@gmx.net>
      Link: https://lore.kernel.org/r/328318841455e505370ef8ecad97b646c033dc8a.1562148527.git.lukas@wunner.deSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a89b44e4
    • xiao jin's avatar
      block: blk_init_allocated_queue() set q->fq as NULL in the fail case · c1919916
      xiao jin authored
      commit 54648cf1 upstream.
      
      We find the memory use-after-free issue in __blk_drain_queue()
      on the kernel 4.14. After read the latest kernel 4.18-rc6 we
      think it has the same problem.
      
      Memory is allocated for q->fq in the blk_init_allocated_queue().
      If the elevator init function called with error return, it will
      run into the fail case to free the q->fq.
      
      Then the __blk_drain_queue() uses the same memory after the free
      of the q->fq, it will lead to the unpredictable event.
      
      The patch is to set q->fq as NULL in the fail case of
      blk_init_allocated_queue().
      
      Fixes: commit 7c94e1c1 ("block: introduce blk_flush_queue to drive flush machinery")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarxiao jin <jin.xiao@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      [groeck: backport to v4.4.y/v4.9.y (context change)]
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAlessio Balsini <balsini@android.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1919916
    • Sudarsana Reddy Kalluru's avatar
      bnx2x: Disable multi-cos feature. · 7c465327
      Sudarsana Reddy Kalluru authored
      [ Upstream commit d1f0b5dc ]
      
      Commit 3968d389 ("bnx2x: Fix Multi-Cos.") which enabled multi-cos
      feature after prolonged time in driver added some regression causing
      numerous issues (sudden reboots, tx timeout etc.) reported by customers.
      We plan to backout this commit and submit proper fix once we have root
      cause of issues reported with this feature enabled.
      
      Fixes: 3968d389 ("bnx2x: Fix Multi-Cos.")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c465327
    • Cong Wang's avatar
      ife: error out when nla attributes are empty · 264e020e
      Cong Wang authored
      [ Upstream commit c8ec4632 ]
      
      act_ife at least requires TCA_IFE_PARMS, so we have to bail out
      when there is no attribute passed in.
      
      Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com
      Fixes: ef6980b6 ("introduce IFE action")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      264e020e
    • Haishuang Yan's avatar
      ip6_tunnel: fix possible use-after-free on xmit · dfb98836
      Haishuang Yan authored
      [ Upstream commit 01f5bffa ]
      
      ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
      can cause a possible use-after-free accessing iph/ipv6h pointer
      since the packet will be 'uncloned' running pskb_expand_head if
      it is a cloned gso skb.
      
      Fixes: 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfb98836
    • Arnd Bergmann's avatar
      compat_ioctl: pppoe: fix PPPOEIOCSFWD handling · 00a8794f
      Arnd Bergmann authored
      [ Upstream commit 055d8824 ]
      
      Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
      linux-2.5.69 along with hundreds of other commands, but was always broken
      sincen only the structure is compatible, but the command number is not,
      due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
      sockaddr_pppox)), which is different on 64-bit architectures.
      
      Guillaume Nault adds:
      
        And the implementation was broken until 2016 (see 29e73269 ("pppoe:
        fix reference counting in PPPoE proxy")), and nobody ever noticed. I
        should probably have removed this ioctl entirely instead of fixing it.
        Clearly, it has never been used.
      
      Fix it by adding a compat_ioctl handler for all pppoe variants that
      translates the command number and then calls the regular ioctl function.
      
      All other ioctl commands handled by pppoe are compatible between 32-bit
      and 64-bit, and require compat_ptr() conversion.
      
      This should apply to all stable kernels.
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00a8794f
    • Taras Kondratiuk's avatar
      tipc: compat: allow tipc commands without arguments · 695074c7
      Taras Kondratiuk authored
      [ Upstream commit 4da5f001 ]
      
      Commit 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit")
      broke older tipc tools that use compat interface (e.g. tipc-config from
      tipcutils package):
      
      % tipc-config -p
      operation not supported
      
      The commit started to reject TIPC netlink compat messages that do not
      have attributes. It is too restrictive because some of such messages are
      valid (they don't need any arguments):
      
      % grep 'tx none' include/uapi/linux/tipc_config.h
      #define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
      #define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
      #define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
      #define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
      #define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
      #define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */
      
      This patch relaxes the original fix and rejects messages without
      arguments only if such arguments are expected by a command (reg_type is
      non zero).
      
      Fixes: 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTaras Kondratiuk <takondra@cisco.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      695074c7
    • Jia-Ju Bai's avatar
      net: sched: Fix a possible null-pointer dereference in dequeue_func() · 4c6f0d6b
      Jia-Ju Bai authored
      [ Upstream commit 051c7b39 ]
      
      In dequeue_func(), there is an if statement on line 74 to check whether
      skb is NULL:
          if (skb)
      
      When skb is NULL, it is used on line 77:
          prefetch(&skb->end);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, skb->end is used when skb is not NULL.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: 76e3cc12 ("codel: Controlled Delay AQM")
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c6f0d6b
    • Mark Zhang's avatar
      net/mlx5: Use reversed order when unregister devices · 03eb042d
      Mark Zhang authored
      [ Upstream commit 08aa5e7d ]
      
      When lag is active, which is controlled by the bonded mlx5e netdev, mlx5
      interface unregestering must happen in the reverse order where rdma is
      unregistered (unloaded) first, to guarantee all references to the lag
      context in hardware is removed, then remove mlx5e netdev interface which
      will cleanup the lag context from hardware.
      
      Without this fix during destroy of LAG interface, we observed following
      errors:
       * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
         status bad parameter(0x3), syndrome (0xe4ac33)
       * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
         status bad parameter(0x3), syndrome (0xa5aee8).
      
      Fixes: a31208b1 ("net/mlx5_core: New init and exit flow for mlx5_core")
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarMark Zhang <markz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03eb042d
    • Jiri Pirko's avatar
      net: fix ifindex collision during namespace removal · 78922ae6
      Jiri Pirko authored
      [ Upstream commit 55b40dbf ]
      
      Commit aca51397 ("netns: Fix arbitrary net_device-s corruptions
      on net_ns stop.") introduced a possibility to hit a BUG in case device
      is returning back to init_net and two following conditions are met:
      1) dev->ifindex value is used in a name of another "dev%d"
         device in init_net.
      2) dev->name is used by another device in init_net.
      
      Under real life circumstances this is hard to get. Therefore this has
      been present happily for over 10 years. To reproduce:
      
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip netns add ns1
      $ ip -n ns1 link add dummy1ns1 type dummy
      $ ip -n ns1 link add dummy2ns1 type dummy
      $ ip link set enp0s2 netns ns1
      $ ip -n ns1 link set enp0s2 name dummy0
      [  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
      $ ip link add dev4 type dummy
      $ ip -n ns1 a
      1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
      3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
      4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
      $ ip netns del ns1
      [  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
      [  158.719316] ------------[ cut here ]------------
      [  158.720591] kernel BUG at net/core/dev.c:9824!
      [  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
      [  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  158.727508] Workqueue: netns cleanup_net
      [  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  158.762758] Call Trace:
      [  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
      [  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.769870]  ops_exit_list.isra.0+0xa8/0x150
      [  158.771544]  cleanup_net+0x446/0x8f0
      [  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
      [  158.775294]  process_one_work+0xa1a/0x1740
      [  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
      [  158.779143]  ? do_raw_spin_lock+0x11b/0x280
      [  158.780848]  worker_thread+0x9e/0x1060
      [  158.782500]  ? process_one_work+0x1740/0x1740
      [  158.784454]  kthread+0x31b/0x420
      [  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
      [  158.788286]  ret_from_fork+0x3a/0x50
      [  158.789871] ---[ end trace defd6c657c71f936 ]---
      [  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fix this by checking if a device with the same name exists in init_net
      and fallback to original code - dev%d to allocate name - in case it does.
      
      This was found using syzkaller.
      
      Fixes: aca51397 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78922ae6
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: don't delete permanent entries when fast leave is enabled · 6d9235bd
      Nikolay Aleksandrov authored
      [ Upstream commit 5c725b6b ]
      
      When permanent entries were introduced by the commit below, they were
      exempt from timing out and thus igmp leave wouldn't affect them unless
      fast leave was enabled on the port which was added before permanent
      entries existed. It shouldn't matter if fast leave is enabled or not
      if the user added a permanent entry it shouldn't be deleted on igmp
      leave.
      
      Before:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      $
      
      After:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      Fixes: ccb1c31a ("bridge: add flags to distinguish permanent mdb entires")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d9235bd
    • Nikolay Aleksandrov's avatar
      net: bridge: delete local fdb on device init failure · 0e08ee57
      Nikolay Aleksandrov authored
      [ Upstream commit d7bae09f ]
      
      On initialization failure we have to delete the local fdb which was
      inserted due to the default pvid creation. This problem has been present
      since the inception of default_pvid. Note that currently there are 2 cases:
      1) in br_dev_init() when br_multicast_init() fails
      2) if register_netdevice() fails after calling ndo_init()
      
      This patch takes care of both since br_vlan_flush() is called on both
      occasions. Also the new fdb delete would be a no-op on normal bridge
      device destruction since the local fdb would've been already flushed by
      br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
      called last when adding a port thus nothing can fail after it.
      
      Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e08ee57
    • Gustavo A. R. Silva's avatar
      atm: iphase: Fix Spectre v1 vulnerability · b5641e51
      Gustavo A. R. Silva authored
      [ Upstream commit ea443e5e ]
      
      board is controlled by user-space, hence leading to a potential
      exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/atm/iphase.c:2765 ia_ioctl() warn: potential spectre issue 'ia_dev' [r] (local cap)
      drivers/atm/iphase.c:2774 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2782 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2816 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2823 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2830 ia_ioctl() warn: potential spectre issue '_ia_dev' [r] (local cap)
      drivers/atm/iphase.c:2845 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2856 ia_ioctl() warn: possible spectre second half.  'iadev'
      
      Fix this by sanitizing board before using it to index ia_dev and _ia_dev
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5641e51
    • Ilya Dryomov's avatar
      22395a3e
    • Josh Poimboeuf's avatar
      objtool: Add rewind_stack_do_exit() to the noreturn list · 180019b0
      Josh Poimboeuf authored
      commit 4fa5ecda upstream.
      
      This fixes the following warning seen on GCC 7.3:
      
        arch/x86/kernel/dumpstack.o: warning: objtool: oops_end() falls through to next function show_regs()
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/3418ebf5a5a9f6ed7e80954c741c0b904b67b5dc.1554398240.git.jpoimboe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      180019b0
    • Josh Poimboeuf's avatar
      objtool: Add machine_real_restart() to the noreturn list · 495dace5
      Josh Poimboeuf authored
      commit 684fb246 upstream.
      
      machine_real_restart() is annotated as '__noreturn", so add it to the
      objtool noreturn list.  This fixes the following warning with clang and
      CONFIG_CC_OPTIMIZE_FOR_SIZE=y:
      
        arch/x86/kernel/reboot.o: warning: objtool: native_machine_emergency_restart() falls through to next function machine_power_off()
      Reported-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Reviewed-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Link: https://lkml.kernel.org/r/791712792aa4431bdd55bf1beb33a169ddf3b4a2.1529423255.git.jpoimboe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      495dace5
    • Greg Kroah-Hartman's avatar
      IB: directly cast the sockaddr union to aockaddr · d41d78cc
      Greg Kroah-Hartman authored
      Like commit 641114d2 ("RDMA: Directly cast the sockaddr union to
      sockaddr") we need to quiet gcc 9 from warning about this crazy union.
      That commit did not fix all of the warnings in 4.19 and older kernels
      because the logic in roce_resolve_route_from_path() was rewritten
      between 4.19 and 5.2 when that change happened.
      
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d41d78cc
    • Jason Gunthorpe's avatar
      RDMA: Directly cast the sockaddr union to sockaddr · 7d5750c0
      Jason Gunthorpe authored
      commit 641114d2 upstream.
      
      gcc 9 now does allocation size tracking and thinks that passing the member
      of a union and then accessing beyond that member's bounds is an overflow.
      
      Instead of using the union member, use the entire union with a cast to
      get to the sockaddr. gcc will now know that the memory extends the full
      size of the union.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d5750c0
    • Sebastian Parschauer's avatar
      HID: Add quirk for HP X1200 PIXART OEM mouse · 41052c98
      Sebastian Parschauer authored
      commit 49869d2e upstream.
      
      The PixArt OEM mice are known for disconnecting every minute in
      runlevel 1 or 3 if they are not always polled. So add quirk
      ALWAYS_POLL for this one as well.
      
      Jonathan Teh (@jonathan-teh) reported and tested the quirk.
      Reference: https://github.com/sriemer/fix-linux-mouse/issues/15Signed-off-by: default avatarSebastian Parschauer <s.parschauer@gmx.de>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41052c98
    • Aaron Armstrong Skomra's avatar
      HID: wacom: fix bit shift for Cintiq Companion 2 · 0cb65f3a
      Aaron Armstrong Skomra authored
      commit 693c3dab upstream.
      
      The bit indicating BTN_6 on this device is overshifted
      by 2 bits, resulting in the incorrect button being
      reported.
      
      Also fix copy-paste mistake in comments.
      Signed-off-by: default avatarAaron Armstrong Skomra <aaron.skomra@wacom.com>
      Reviewed-by: default avatarPing Cheng <ping.cheng@wacom.com>
      Link: https://github.com/linuxwacom/xf86-input-wacom/issues/71
      Fixes: c7f0522a ("HID: wacom: Slim down wacom_intuos_pad processing")
      Cc: <stable@vger.kernel.org> # v4.5+
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cb65f3a
    • Eric Dumazet's avatar
      tcp: be more careful in tcp_fragment() · 06fac082
      Eric Dumazet authored
      [ Upstream commit b617158d ]
      
      Some applications set tiny SO_SNDBUF values and expect
      TCP to just work. Recent patches to address CVE-2019-11478
      broke them in case of losses, since retransmits might
      be prevented.
      
      We should allow these flows to make progress.
      
      This patch allows the first and last skb in retransmit queue
      to be split even if memory limits are hit.
      
      It also adds the some room due to the fact that tcp_sendmsg()
      and tcp_sendpage() might overshoot sk_wmem_queued by about one full
      TSO skb (64KB size). Note this allowance was already present
      in stable backports for kernels < 4.15
      
      Note for < 4.15 backports :
       tcp_rtx_queue_tail() will probably look like :
      
      static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
      {
      	struct sk_buff *skb = tcp_send_head(sk);
      
      	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
      }
      
      Fixes: f070ef2a ("tcp: tcp_fragment() should apply sane memory limits")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrew Prout <aprout@ll.mit.edu>
      Tested-by: default avatarAndrew Prout <aprout@ll.mit.edu>
      Tested-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Tested-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Cc: Jonathan Looney <jtl@netflix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      06fac082
    • Will Deacon's avatar
      arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG} · 3c5dbb95
      Will Deacon authored
      commit 147b9635 upstream.
      
      If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
      their architecturally maximum values, which defeats the use of
      FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
      machines.
      
      Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
      saturate at zero.
      
      Fixes: 3c739b57 ("arm64: Keep track of CPU feature registers")
      Cc: <stable@vger.kernel.org> # 4.9.y only
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c5dbb95
    • Will Deacon's avatar
      arm64: cpufeature: Fix CTR_EL0 field definitions · e364e9a2
      Will Deacon authored
      commit be68a8aa upstream.
      
      Our field definitions for CTR_EL0 suffer from a number of problems:
      
        - The IDC and DIC fields are missing, which causes us to enable CTR
          trapping on CPUs with either of these returning non-zero values.
      
        - The ERG is FTR_LOWER_SAFE, whereas it should be treated like CWG as
          FTR_HIGHER_SAFE so that applications can use it to avoid false sharing.
      
        - [nit] A RES1 field is described as "RAO"
      
      This patch updates the CTR_EL0 field definitions to fix these issues.
      
      Cc: <stable@vger.kernel.org> # 4.9.y only
      Cc: Shanker Donthineni <shankerd@codeaurora.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e364e9a2
    • Adam Ford's avatar
      ARM: dts: logicpd-som-lv: Fix Audio Mute · 5f085ef8
      Adam Ford authored
      [ Upstream commit 95e59fc3 ]
      
      The Audio has worked, but the mute pin has a weak pulldown which alows
      some of the audio signal to pass very quietly.  This patch fixes
      that so the mute pin is actively driven high for mute or low for normal
      operation.
      
      Fixes: ab8dd3ae ("ARM: DTS: Add minimal Support for Logic
      PD DM3730 SOM-LV")
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5f085ef8
    • Adam Ford's avatar
      ARM: dts: Add pinmuxing for i2c2 and i2c3 for LogicPD torpedo · 7fc1e8a0
      Adam Ford authored
      [ Upstream commit a135a392 ]
      
      Since I2C1 and I2C4 have explicit pinmuxing set, let's be on the
      safe side and set the pin muxing for I2C2 and I2C3.
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7fc1e8a0
    • Adam Ford's avatar
      ARM: dts: Add pinmuxing for i2c2 and i2c3 for LogicPD SOM-LV · 5770e696
      Adam Ford authored
      [ Upstream commit 5fe3c0fa ]
      
      Since I2C1 and I2C4 have explicit pinmuxing set, let's be on the
      safe side and set the pin muxing for I2C2 and I2C3.
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5770e696
    • Hannes Reinecke's avatar
      scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure · 792f95e7
      Hannes Reinecke authored
      commit 023358b1 upstream.
      
      Gcc-9 complains for a memset across pointer boundaries, which happens as
      the code tries to allocate a flexible array on the stack.  Turns out we
      cannot do this without relying on gcc-isms, so with this patch we'll embed
      the fc_rport_priv structure into fcoe_rport, can use the normal
      'container_of' outcast, and will only have to do a memset over one
      structure.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      792f95e7
  2. 06 Aug, 2019 8 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.188 · fa897d17
      Greg Kroah-Hartman authored
      fa897d17
    • Vlastimil Babka's avatar
      x86, mm, gup: prevent get_page() race with munmap in paravirt guest · d73af797
      Vlastimil Babka authored
      The x86 version of get_user_pages_fast() relies on disabled interrupts to
      synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
      a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
      releases the page. As TLB flush is done synchronously via IPI disabling
      interrupts blocks the page release, and get_page(), which assumes existing
      reference on page, is thus safe.
      However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
      no blocking thanks to disabled interrupts, and get_page() can succeed on a page
      that was already freed or even reused.
      
      We have recently seen this happen with our 4.4 and 4.12 based kernels, with
      userspace (java) that exits a thread, where mm_release() performs a futex_wake()
      on tsk->clear_child_tid, and another thread in parallel unmaps the page where
      tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
      immediately releases the page again, while it's already on a freelist. Symptoms
      include a bad page state warning, general protection faults acessing a poisoned
      list prev/next pointer in the freelist, or free page pcplists of two cpus joined
      together in a single list. Oscar has also reproduced this scenario, with a
      patch inserting delays before the get_page() to make the race window larger.
      
      Fix this by removing the dependency on TLB flush interrupts the same way as the
      generic get_user_pages_fast() code by using page_cache_add_speculative() and
      revalidating the PTE contents after pinning the page. Mainline is safe since
      4.13 where the x86 gup code was removed in favor of the common code. Accessing
      the page table itself safely also relies on disabled interrupts and TLB flush
      IPIs that don't happen with hypercalls, which was acknowledged in commit
      9e52fc2b ("x86/mm: Enable RCU based page table freeing
      (CONFIG_HAVE_RCU_TABLE_FREE=y)"). That commit with follups should also be
      backported for full safety, although our reproducer didn't hit a problem
      without that backport.
      Reproduced-by: default avatarOscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d73af797
    • Josh Poimboeuf's avatar
      objtool: Support GCC 9 cold subfunction naming scheme · 410b20dd
      Josh Poimboeuf authored
      commit bcb6fb5d upstream.
      
      Starting with GCC 8, a lot of unlikely code was moved out of line to
      "cold" subfunctions in .text.unlikely.
      
      For example, the unlikely bits of:
      
        irq_do_set_affinity()
      
      are moved out to the following subfunction:
      
        irq_do_set_affinity.cold.49()
      
      Starting with GCC 9, the numbered suffix has been removed.  So in the
      above example, the cold subfunction is instead:
      
        irq_do_set_affinity.cold()
      
      Tweak the objtool subfunction detection logic so that it detects both
      GCC 8 and GCC 9 naming schemes.
      Reported-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/015e9544b1f188d36a7f02fa31e9e95629aa5f50.1541040800.git.jpoimboe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      410b20dd
    • Miguel Ojeda's avatar
      include/linux/module.h: copy __init/__exit attrs to init/cleanup_module · 2c34c215
      Miguel Ojeda authored
      commit a6e60d84 upstream.
      
      The upcoming GCC 9 release extends the -Wmissing-attributes warnings
      (enabled by -Wall) to C and aliases: it warns when particular function
      attributes are missing in the aliases but not in their target.
      
      In particular, it triggers for all the init/cleanup_module
      aliases in the kernel (defined by the module_init/exit macros),
      ending up being very noisy.
      
      These aliases point to the __init/__exit functions of a module,
      which are defined as __cold (among other attributes). However,
      the aliases themselves do not have the __cold attribute.
      
      Since the compiler behaves differently when compiling a __cold
      function as well as when compiling paths leading to calls
      to __cold functions, the warning is trying to point out
      the possibly-forgotten attribute in the alias.
      
      In order to keep the warning enabled, we decided to silence
      this case. Ideally, we would mark the aliases directly
      as __init/__exit. However, there are currently around 132 modules
      in the kernel which are missing __init/__exit in their init/cleanup
      functions (either because they are missing, or for other reasons,
      e.g. the functions being called from somewhere else); and
      a section mismatch is a hard error.
      
      A conservative alternative was to mark the aliases as __cold only.
      However, since we would like to eventually enforce __init/__exit
      to be always marked,  we chose to use the new __copy function
      attribute (introduced by GCC 9 as well to deal with this).
      With it, we copy the attributes used by the target functions
      into the aliases. This way, functions that were not marked
      as __init/__exit won't have their aliases marked either,
      and therefore there won't be a section mismatch.
      
      Note that the warning would go away marking either the extern
      declaration, the definition, or both. However, we only mark
      the definition of the alias, since we do not want callers
      (which only see the declaration) to be compiled as if the function
      was __cold (and therefore the paths leading to those calls
      would be assumed to be unlikely).
      
      Link: https://lore.kernel.org/lkml/259986242.BvXPX32bHu@devpool35/
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/
      Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/Suggested-by: default avatarMartin Sebor <msebor@gcc.gnu.org>
      Acked-by: default avatarJessica Yu <jeyu@kernel.org>
      Signed-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c34c215
    • Miguel Ojeda's avatar
      Backport minimal compiler_attributes.h to support GCC 9 · fe584436
      Miguel Ojeda authored
      This adds support for __copy to v4.9.y so that we can use it in
      init/exit_module to avoid -Werror=missing-attributes errors on GCC 9.
      
      Link: https://lore.kernel.org/lkml/259986242.BvXPX32bHu@devpool35/
      Cc: <stable@vger.kernel.org>
      Suggested-by: default avatarRolf Eike Beer <eb@emlix.com>
      Signed-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe584436
    • Jean Delvare's avatar
      eeprom: at24: make spd world-readable again · 0b326994
      Jean Delvare authored
      commit 25e5ef30 upstream.
      
      The integration of the at24 driver into the nvmem framework broke the
      world-readability of spd EEPROMs. Fix it.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Cc: stable@vger.kernel.org
      Fixes: 57d15550 ("eeprom: at24: extend driver to plug into the NVMEM framework")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Bartosz Golaszewski <brgl@bgdev.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      [Bartosz: backported the patch to older branches]
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b326994
    • Andrea Arcangeli's avatar
      coredump: fix race condition between collapse_huge_page() and core dumping · d4fc64c9
      Andrea Arcangeli authored
      commit 59ea6d06 upstream.
      
      When fixing the race conditions between the coredump and the mmap_sem
      holders outside the context of the process, we focused on
      mmget_not_zero()/get_task_mm() callers in 04f5866e ("coredump: fix
      race condition between mmget_not_zero()/get_task_mm() and core
      dumping"), but those aren't the only cases where the mmap_sem can be
      taken outside of the context of the process as Michal Hocko noticed
      while backporting that commit to older -stable kernels.
      
      If mmgrab() is called in the context of the process, but then the
      mm_count reference is transferred outside the context of the process,
      that can also be a problem if the mmap_sem has to be taken for writing
      through that mm_count reference.
      
      khugepaged registration calls mmgrab() in the context of the process,
      but the mmap_sem for writing is taken later in the context of the
      khugepaged kernel thread.
      
      collapse_huge_page() after taking the mmap_sem for writing doesn't
      modify any vma, so it's not obvious that it could cause a problem to the
      coredump, but it happens to modify the pmd in a way that breaks an
      invariant that pmd_trans_huge_lock() relies upon.  collapse_huge_page()
      needs the mmap_sem for writing just to block concurrent page faults that
      call pmd_trans_huge_lock().
      
      Specifically the invariant that "!pmd_trans_huge()" cannot become a
      "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.
      
      The coredump will call __get_user_pages() without mmap_sem for reading,
      which eventually can invoke a lockless page fault which will need a
      functional pmd_trans_huge_lock().
      
      So collapse_huge_page() needs to use mmget_still_valid() to check it's
      not running concurrently with the coredump...  as long as the coredump
      can invoke page faults without holding the mmap_sem for reading.
      
      This has "Fixes: khugepaged" to facilitate backporting, but in my view
      it's more a bug in the coredump code that will eventually have to be
      rewritten to stop invoking page faults without the mmap_sem for reading.
      So the long term plan is still to drop all mmget_still_valid().
      
      Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
      Fixes: ba76149f ("thp: khugepaged")
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [Ajay: Just adjusted to apply on v4.9]
      Signed-off-by: default avatarAjay Kaher <akaher@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4fc64c9
    • Ajay Kaher's avatar
      infiniband: fix race condition between infiniband mlx4, mlx5 driver and core dumping · 91c3a6ce
      Ajay Kaher authored
      This patch is the extension of following upstream commit to fix
      the race condition between get_task_mm() and core dumping
      for IB->mlx4 and IB->mlx5 drivers:
      
      commit 04f5866e ("coredump: fix race condition between
      mmget_not_zero()/get_task_mm() and core dumping")'
      
      Thanks to Jason for pointing this.
      Signed-off-by: default avatarAjay Kaher <akaher@vmware.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91c3a6ce