1. 22 Jul, 2018 27 commits
    • Or Gerlitz's avatar
      net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager · c3994f4f
      Or Gerlitz authored
      [ Upstream commit 0efc8562 ]
      
      In smartnic env, the host (PF) driver might not be an e-switch
      manager, hence the FW will err on driver attempts to deal with
      setting/unsetting the eswitch and as a result the overall setup
      of sriov will fail.
      
      Fix that by avoiding the operation if e-switch management is not
      allowed for this driver instance. While here, move to use the
      correct name for the esw manager capability name.
      
      Fixes: 81848731 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarGuy Kushnir <guyk@mellanox.com>
      Reviewed-by: default avatarEli Cohen <eli@melloanox.com>
      Tested-by: default avatarEli Cohen <eli@melloanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3994f4f
    • Or Gerlitz's avatar
      net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager · b216867c
      Or Gerlitz authored
      [ Upstream commit 8ffd569a ]
      
      The check for cpu hit statistics was not returning immediate false for
      any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if
      user-space tool was calling into any possible netdev in the system.
      
      Fix that by doing a proper check before dereferencing.
      
      Fixes: 1d447a39 ('net/mlx5e: Extendable vport representor netdev private data')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarEli Cohen <eli@melloanox.com>
      Reviewed-by: default avatarEli Cohen <eli@melloanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b216867c
    • Or Gerlitz's avatar
      net/mlx5e: Avoid dealing with vport representors if not being e-switch manager · 1d8dda44
      Or Gerlitz authored
      [ Upstream commit 733d3e54 ]
      
      In smartnic env, the host (PF) driver might not be an e-switch
      manager, hence the switchdev mode representors are running on
      the embedded cpu (EC) and not at the host.
      
      As such, we should avoid dealing with vport representors if
      not being esw manager.
      
      While here, make sure to disallow eswitch switchdev related
      setups through devlink if we are not esw managers.
      
      Fixes: cb67b832 ('net/mlx5e: Introduce SRIOV VF representors')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarEli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d8dda44
    • Harini Katakam's avatar
      net: macb: Fix ptp time adjustment for large negative delta · f389c17b
      Harini Katakam authored
      [ Upstream commit 64d7839a ]
      
      When delta passed to gem_ptp_adjtime is negative, the sign is
      maintained in the ns_to_timespec64 conversion. Hence timespec_add
      should be used directly. timespec_sub will just subtract the negative
      value thus increasing the time difference.
      Signed-off-by: default avatarHarini Katakam <harini.katakam@xilinx.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f389c17b
    • Sabrina Dubroca's avatar
      net: fix use-after-free in GRO with ESP · b364a914
      Sabrina Dubroca authored
      [ Upstream commit 603d4cf8 ]
      
      Since the addition of GRO for ESP, gro_receive can consume the skb and
      return -EINPROGRESS. In that case, the lower layer GRO handler cannot
      touch the skb anymore.
      
      Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
      some of the gro_receive handlers that can lead to ESP's gro_receive so
      that they wouldn't access the skb when -EINPROGRESS is returned, but
      missed other spots, mainly in tunneling protocols.
      
      This patch finishes the conversion to using skb_gro_flush_final(), and
      adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
      GUE.
      
      Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b364a914
    • Eric Dumazet's avatar
      net: dccp: switch rx_tstamp_last_feedback to monotonic clock · fb6b1466
      Eric Dumazet authored
      [ Upstream commit 0ce4e70f ]
      
      To compute delays, better not use time of the day which can
      be changed by admins or malicious programs.
      
      Also change ccid3_first_li() to use s64 type for delta variable
      to avoid potential overflows.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: dccp@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb6b1466
    • Eric Dumazet's avatar
      net: dccp: avoid crash in ccid3_hc_rx_send_feedback() · a3225a83
      Eric Dumazet authored
      [ Upstream commit 74174fe5 ]
      
      On fast hosts or malicious bots, we trigger a DCCP_BUG() which
      seems excessive.
      
      syzbot reported :
      
      BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback()
      CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline]
       ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793
       ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline]
       dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180
       dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378
       dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
       sk_backlog_rcv include/net/sock.h:914 [inline]
       __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517
       dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875
       ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215
       NF_HOOK include/linux/netfilter.h:287 [inline]
       ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256
       dst_input include/net/dst.h:450 [inline]
       ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396
       NF_HOOK include/linux/netfilter.h:287 [inline]
       ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492
       __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       process_backlog+0x219/0x760 net/core/dev.c:5373
       napi_poll net/core/dev.c:5771 [inline]
       net_rx_action+0x7da/0x1980 net/core/dev.c:5837
       __do_softirq+0x2e8/0xb17 kernel/softirq.c:284
       run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
       smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: dccp@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3225a83
    • Jesper Dangaard Brouer's avatar
      ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing · a2e53d69
      Jesper Dangaard Brouer authored
      [ Upstream commit ad088ec4 ]
      
      The driver was combining the XDP_TX tail flush and XDP_REDIRECT
      map flushing (xdp_do_flush_map).  This is suboptimal, these two
      flush operations should be kept separate.
      
      Fixes: 11393cc9 ("xdp: Add batching support to redirect map")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2e53d69
    • Xin Long's avatar
      ipvlan: fix IFLA_MTU ignored on NEWLINK · f5a42d63
      Xin Long authored
      [ Upstream commit 30877961 ]
      
      Commit 296d4856 ("ipvlan: inherit MTU from master device") adjusted
      the mtu from the master device when creating a ipvlan device, but it
      would also override the mtu value set in rtnl_create_link. It causes
      IFLA_MTU param not to take effect.
      
      So this patch is to not adjust the mtu if IFLA_MTU param is set when
      creating a ipvlan device.
      
      Fixes: 296d4856 ("ipvlan: inherit MTU from master device")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5a42d63
    • Eric Biggers's avatar
      ipv6: sr: fix passing wrong flags to crypto_alloc_shash() · d10c0baa
      Eric Biggers authored
      [ Upstream commit fc9c2029 ]
      
      The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags,
      not 'gfp_t'.  So don't pass GFP_KERNEL to it.
      
      Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d10c0baa
    • Stephen Hemminger's avatar
      hv_netvsc: split sub-channel setup into async and sync · e34e92d8
      Stephen Hemminger authored
      [ Upstream commit 3ffe64f1 ]
      
      When doing device hotplug the sub channel must be async to avoid
      deadlock issues because device is discovered in softirq context.
      
      When doing changes to MTU and number of channels, the setup
      must be synchronous to avoid races such as when MTU and device
      settings are done in a single ip command.
      Reported-by: default avatarThomas Walker <Thomas.Walker@twosigma.com>
      Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
      Fixes: 732e4985 ("netvsc: fix race on sub channel creation")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e34e92d8
    • Gustavo A. R. Silva's avatar
      atm: zatm: Fix potential Spectre v1 · 43c9207d
      Gustavo A. R. Silva authored
      [ Upstream commit ced9e191 ]
      
      pool can be indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue
      'zatm_dev->pool_info' (local cap)
      
      Fix this by sanitizing pool before using it to index
      zatm_dev->pool_info
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43c9207d
    • David Woodhouse's avatar
      atm: Preserve value of skb->truesize when accounting to vcc · f93d6593
      David Woodhouse authored
      [ Upstream commit 9bbe60a6 ]
      
      ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
      which they are to be sent. But it doesn't take ownership of those
      packets from the sock (if any) which originally owned them. They should
      remain owned by their actual sender until they've left the box.
      
      There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
      for certain skbs, precisely to avoid messing up sk_wmem_alloc
      accounting. Ideally that hack would cover the ATM use case too, but it
      doesn't — skbs which aren't owned by any sock, for example PPP control
      frames, still get their truesize adjusted when the low-level ATM driver
      adds headroom.
      
      This has always been an issue, it seems. The truesize of a packet
      increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
      for normal traffic, only for control frames. So I think we just got away
      with it, and we probably needed to send 2GiB of LCP echo frames before
      the misaccounting would ever have caused a problem and caused
      atm_may_send() to start refusing packets.
      
      Commit 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to
      refcount_t") did exactly what it was intended to do, and turned this
      mostly-theoretical problem into a real one, causing PPPoATM to fail
      immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
      starts refusing to allow new packets.
      
      The least intrusive solution to this problem is to stash the value of
      skb->truesize that was accounted to the VCC, in a new member of the
      ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
      value instead of the then-current value of skb->truesize.
      
      Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Tested-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f93d6593
    • Sabrina Dubroca's avatar
      alx: take rtnl before calling __alx_open from resume · c62e2f08
      Sabrina Dubroca authored
      [ Upstream commit bc800e8b ]
      
      The __alx_open function can be called from ndo_open, which is called
      under RTNL, or from alx_resume, which isn't. Since commit d768319c,
      we're calling the netif_set_real_num_{tx,rx}_queues functions, which
      need to be called under RTNL.
      
      This is similar to commit 0c2cc02e ("igb: Move the calls to set the
      Tx and Rx queues into igb_open").
      
      Fixes: d768319c ("alx: enable multiple tx queues")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c62e2f08
    • Christian Lamparter's avatar
      crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak · 03bb9187
      Christian Lamparter authored
      commit 5d59ad6e upstream.
      
      If one of the later memory allocations in rypto4xx_build_pdr()
      fails: dev->pdr (and/or) dev->pdr_uinfo wouldn't be freed.
      
      crypto4xx_build_sdr() has the same issue with dev->sdr.
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03bb9187
    • Christian Lamparter's avatar
      crypto: crypto4xx - remove bad list_del · 996a6a39
      Christian Lamparter authored
      commit a728a196 upstream.
      
      alg entries are only added to the list, after the registration
      was successful. If the registration failed, it was never added
      to the list in the first place.
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      996a6a39
    • Jaehoon Chung's avatar
      PCI: exynos: Fix a potential init_clk_resources NULL pointer dereference · dc3782a3
      Jaehoon Chung authored
      commit b5d6bc90 upstream.
      
      In order to avoid triggering a NULL pointer dereference in
      exynos_pcie_probe() a check must be put in place to detect if
      the init_clk_resources hook is initialized before calling it.
      
      Add the respective function pointer check in exynos_pcie_probe().
      Signed-off-by: default avatarJaehoon Chung <jh80.chung@samsung.com>
      [lorenzo.pieralisi@arm.com: rewrote the commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc3782a3
    • Jonas Gorski's avatar
      bcm63xx_enet: do not write to random DMA channel on BCM6345 · b1c3ce0c
      Jonas Gorski authored
      commit d6213c1f upstream.
      
      The DMA controller regs actually point to DMA channel 0, so the write to
      ENETDMA_CFG_REG will actually modify a random DMA channel.
      
      Since DMA controller registers do not exist on BCM6345, guard the write
      with the usual check for dma_has_sram.
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1c3ce0c
    • Jonas Gorski's avatar
      bcm63xx_enet: correct clock usage · b913a05a
      Jonas Gorski authored
      commit 9c86b846 upstream.
      
      Check the return code of prepare_enable and change one last instance of
      enable only to prepare_enable. Also properly disable and release the
      clock in error paths and on remove for enetsw.
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b913a05a
    • alex chen's avatar
      ocfs2: ip_alloc_sem should be taken in ocfs2_get_block() · 1ccab2bf
      alex chen authored
      commit 3e4c56d4 upstream.
      
      ip_alloc_sem should be taken in ocfs2_get_block() when reading file in
      DIRECT mode to prevent concurrent access to extent tree with
      ocfs2_dio_end_io_write(), which may cause BUGON in the following
      situation:
      
      read file 'A'                                  end_io of writing file 'A'
      vfs_read
       __vfs_read
        ocfs2_file_read_iter
         generic_file_read_iter
          ocfs2_direct_IO
           __blockdev_direct_IO
            do_blockdev_direct_IO
             do_direct_IO
              get_more_blocks
               ocfs2_get_block
                ocfs2_extent_map_get_blocks
                 ocfs2_get_clusters
                  ocfs2_get_clusters_nocache()
                   ocfs2_search_extent_list
                    return the index of record which
                    contains the v_cluster, that is
                    v_cluster > rec[i]->e_cpos.
                                                      ocfs2_dio_end_io
                                                       ocfs2_dio_end_io_write
                                                        down_write(&oi->ip_alloc_sem);
                                                        ocfs2_mark_extent_written
                                                         ocfs2_change_extent_flag
                                                          ocfs2_split_extent
                                                           ...
                                                       --> modify the rec[i]->e_cpos, resulting
                                                           in v_cluster < rec[i]->e_cpos.
                   BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))
      
      [alex.chen@huawei.com: v3]
        Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com
      Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com
      Fixes: c15471f7 ("ocfs2: fix sparse file & data ordering issue in direct io")
      Signed-off-by: default avatarAlex Chen <alex.chen@huawei.com>
      Reviewed-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Reviewed-by: default avatarGang He <ghe@suse.com>
      Acked-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Salvatore Bonaccorso <carnil@debian.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ccab2bf
    • alex chen's avatar
      ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent · c59a8f13
      alex chen authored
      commit 853bc26a upstream.
      
      The subsystem.su_mutex is required while accessing the item->ci_parent,
      otherwise, NULL pointer dereference to the item->ci_parent will be
      triggered in the following situation:
      
      add node                     delete node
      sys_write
       vfs_write
        configfs_write_file
         o2nm_node_store
          o2nm_node_local_write
                                   do_rmdir
                                    vfs_rmdir
                                     configfs_rmdir
                                      mutex_lock(&subsys->su_mutex);
                                      unlink_obj
                                       item->ci_group = NULL;
                                       item->ci_parent = NULL;
      	 to_o2nm_cluster_from_node
      	  node->nd_item.ci_parent->ci_parent
      	  BUG since of NULL pointer dereference to nd_item.ci_parent
      
      Moreover, the o2nm_cluster also should be protected by the
      subsystem.su_mutex.
      
      [alex.chen@huawei.com: v2]
        Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com
      Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.comSigned-off-by: default avatarAlex Chen <alex.chen@huawei.com>
      Reviewed-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Salvatore Bonaccorso <carnil@debian.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c59a8f13
    • Chuck Lever's avatar
      xprtrdma: Fix corner cases when handling device removal · f5778c2d
      Chuck Lever authored
      commit 25524288 upstream.
      
      Michal Kalderon has found some corner cases around device unload
      with active NFS mounts that I didn't have the imagination to test
      when xprtrdma device removal was added last year.
      
      - The ULP device removal handler is responsible for deallocating
        the PD. That wasn't clear to me initially, and my own testing
        suggested it was not necessary, but that is incorrect.
      
      - The transport destruction path can no longer assume that there
        is a valid ID.
      
      - When destroying a transport, ensure that ib_free_cq() is not
        invoked on a CQ that was already released.
      Reported-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from ...")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5778c2d
    • Prashanth Prakash's avatar
      cpufreq / CPPC: Set platform specific transition_delay_us · 1083a7e8
      Prashanth Prakash authored
      commit d4f3388a upstream.
      
      Add support to specify platform specific transition_delay_us instead
      of using the transition delay derived from PCC.
      
      With commit 3d41386d (cpufreq: CPPC: Use transition_delay_us
      depending transition_latency) we are setting transition_delay_us
      directly and not applying the LATENCY_MULTIPLIER. Because of that,
      on Qualcomm Centriq we can end up with a very high rate of frequency
      change requests when using the schedutil governor (default
      rate_limit_us=10 compared to an earlier value of 10000).
      
      The PCC subspace describes the rate at which the platform can accept
      commands on the CPPC's PCC channel. This includes read and write
      command on the PCC channel that can be used for reasons other than
      frequency transitions. Moreover the same PCC subspace can be used by
      multiple freq domains and deriving transition_delay_us from it as we
      do now can be sub-optimal.
      
      Moreover if a platform does not use PCC for desired_perf register then
      there is no way to compute the transition latency or the delay_us.
      
      CPPC does not have a standard defined mechanism to get the transition
      rate or the latency at the moment.
      
      Given the above limitations, it is simpler to have a platform specific
      transition_delay_us and rely on PCC derived value only if a platform
      specific value is not available.
      Signed-off-by: default avatarPrashanth Prakash <pprakash@codeaurora.org>
      Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
      Fixes: 3d41386d (cpufreq: CPPC: Use transition_delay_us depending transition_latency)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1083a7e8
    • Filipe Manana's avatar
      Btrfs: fix duplicate extents after fsync of file with prealloc extents · 61a9f6b7
      Filipe Manana authored
      commit 31d11b83 upstream.
      
      In commit 471d557a ("Btrfs: fix loss of prealloc extents past i_size
      after fsync log replay"), on fsync,  we started to always log all prealloc
      extents beyond an inode's i_size in order to avoid losing them after a
      power failure. However under some cases this can lead to the log replay
      code to create duplicate extent items, with different lengths, in the
      extent tree. That happens because, as of that commit, we can now log
      extent items based on extent maps that are not on the "modified" list
      of extent maps of the inode's extent map tree. Logging extent items based
      on extent maps is used during the fast fsync path to save time and for
      this to work reliably it requires that the extent maps are not merged
      with other adjacent extent maps - having the extent maps in the list
      of modified extents gives such guarantee.
      
      Consider the following example, captured during a long run of fsstress,
      which illustrates this problem.
      
      We have inode 271, in the filesystem tree (root 5), for which all of the
      following operations and discussion apply to.
      
      A buffered write starts at offset 312391 with a length of 933471 bytes
      (end offset at 1245862). At this point we have, for this inode, the
      following extent maps with the their field values:
      
      em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
            block_len 0, orig_block_len 0
      em B, start 40960, orig_start 40960, len 376832, block_start 1106399232,
            block_len 376832, orig_block_len 376832
      em C, start 417792, orig_start 417792, len 782336, block_start
            18446744073709551613, block_len 0, orig_block_len 0
      em D, start 1200128, orig_start 1200128, len 835584, block_start
            1106776064, block_len 835584, orig_block_len 835584
      em E, start 2035712, orig_start 2035712, len 245760, block_start
            1107611648, block_len 245760, orig_block_len 245760
      
      Extent map A corresponds to a hole and extent maps D and E correspond to
      preallocated extents.
      
      Extent map D ends where extent map E begins (1106776064 + 835584 =
      1107611648), but these extent maps were not merged because they are in
      the inode's list of modified extent maps.
      
      An fsync against this inode is made, which triggers the fast path
      (BTRFS_INODE_NEEDS_FULL_SYNC is not set). This fsync triggers writeback
      of the data previously written using buffered IO, and when the respective
      ordered extent finishes, btrfs_drop_extents() is called against the
      (aligned) range 311296..1249279. This causes a split of extent map D at
      btrfs_drop_extent_cache(), replacing extent map D with a new extent map
      D', also added to the list of modified extents,  with the following
      values:
      
      em D', start 1249280, orig_start of 1200128,
             block_start 1106825216 (= 1106776064 + 1249280 - 1200128),
             orig_block_len 835584,
             block_len 786432 (835584 - (1249280 - 1200128))
      
      Then, during the fast fsync, btrfs_log_changed_extents() is called and
      extent maps D' and E are removed from the list of modified extents. The
      flag EXTENT_FLAG_LOGGING is also set on them. After the extents are logged
      clear_em_logging() is called on each of them, and that makes extent map E
      to be merged with extent map D' (try_merge_map()), resulting in D' being
      deleted and E adjusted to:
      
      em E, start 1249280, orig_start 1200128, len 1032192,
            block_start 1106825216, block_len 1032192,
            orig_block_len 245760
      
      A direct IO write at offset 1847296 and length of 360448 bytes (end offset
      at 2207744) starts, and at that moment the following extent maps exist for
      our inode:
      
      em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
            block_len 0, orig_block_len 0
      em B, start 40960, orig_start 40960, len 270336, block_start 1106399232,
            block_len 270336, orig_block_len 376832
      em C, start 311296, orig_start 311296, len 937984, block_start 1112842240,
            block_len 937984, orig_block_len 937984
      em E (prealloc), start 1249280, orig_start 1200128, len 1032192,
            block_start 1106825216, block_len 1032192, orig_block_len 245760
      
      The dio write results in drop_extent_cache() being called twice. The first
      time for a range that starts at offset 1847296 and ends at offset 2035711
      (length of 188416), which results in a double split of extent map E,
      replacing it with two new extent maps:
      
      em F, start 1249280, orig_start 1200128, block_start 1106825216,
            block_len 598016, orig_block_len 598016
      em G, start 2035712, orig_start 1200128, block_start 1107611648,
            block_len 245760, orig_block_len 1032192
      
      It also creates a new extent map that represents a part of the requested
      IO (through create_io_em()):
      
      em H, start 1847296, len 188416, block_start 1107423232, block_len 188416
      
      The second call to drop_extent_cache() has a range with a start offset of
      2035712 and end offset of 2207743 (length of 172032). This leads to
      replacing extent map G with a new extent map I with the following values:
      
      em I, start 2207744, orig_start 1200128, block_start 1107783680,
            block_len 73728, orig_block_len 1032192
      
      It also creates a new extent map that represents the second part of the
      requested IO (through create_io_em()):
      
      em J, start 2035712, len 172032, block_start 1107611648, block_len 172032
      
      The dio write set the inode's i_size to 2207744 bytes.
      
      After the dio write the inode has the following extent maps:
      
      em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
            block_len 0, orig_block_len 0
      em B, start 40960, orig_start 40960, len 270336, block_start 1106399232,
            block_len 270336, orig_block_len 376832
      em C, start 311296, orig_start 311296, len 937984, block_start 1112842240,
            block_len 937984, orig_block_len 937984
      em F, start 1249280, orig_start 1200128, len 598016,
            block_start 1106825216, block_len 598016, orig_block_len 598016
      em H, start 1847296, orig_start 1200128, len 188416,
            block_start 1107423232, block_len 188416, orig_block_len 835584
      em J, start 2035712, orig_start 2035712, len 172032,
            block_start 1107611648, block_len 172032, orig_block_len 245760
      em I, start 2207744, orig_start 1200128, len 73728,
            block_start 1107783680, block_len 73728, orig_block_len 1032192
      
      Now do some change to the file, like adding a xattr for example and then
      fsync it again. This triggers a fast fsync path, and as of commit
      471d557a ("Btrfs: fix loss of prealloc extents past i_size after fsync
      log replay"), we use the extent map I to log a file extent item because
      it's a prealloc extent and it starts at an offset matching the inode's
      i_size. However when we log it, we create a file extent item with a value
      for the disk byte location that is wrong, as can be seen from the
      following output of "btrfs inspect-internal dump-tree":
      
       item 1 key (271 EXTENT_DATA 2207744) itemoff 3782 itemsize 53
           generation 22 type 2 (prealloc)
           prealloc data disk byte 1106776064 nr 1032192
           prealloc data offset 1007616 nr 73728
      
      Here the disk byte value corresponds to calculation based on some fields
      from the extent map I:
      
        1106776064 = block_start (1107783680) - 1007616 (extent_offset)
        extent_offset = 2207744 (start) - 1200128 (orig_start) = 1007616
      
      The disk byte value of 1106776064 clashes with disk byte values of the
      file extent items at offsets 1249280 and 1847296 in the fs tree:
      
              item 6 key (271 EXTENT_DATA 1249280) itemoff 3568 itemsize 53
                      generation 20 type 2 (prealloc)
                      prealloc data disk byte 1106776064 nr 835584
                      prealloc data offset 49152 nr 598016
              item 7 key (271 EXTENT_DATA 1847296) itemoff 3515 itemsize 53
                      generation 20 type 1 (regular)
                      extent data disk byte 1106776064 nr 835584
                      extent data offset 647168 nr 188416 ram 835584
                      extent compression 0 (none)
              item 8 key (271 EXTENT_DATA 2035712) itemoff 3462 itemsize 53
                      generation 20 type 1 (regular)
                      extent data disk byte 1107611648 nr 245760
                      extent data offset 0 nr 172032 ram 245760
                      extent compression 0 (none)
              item 9 key (271 EXTENT_DATA 2207744) itemoff 3409 itemsize 53
                      generation 20 type 2 (prealloc)
                      prealloc data disk byte 1107611648 nr 245760
                      prealloc data offset 172032 nr 73728
      
      Instead of the disk byte value of 1106776064, the value of 1107611648
      should have been logged. Also the data offset value should have been
      172032 and not 1007616.
      After a log replay we end up getting two extent items in the extent tree
      with different lengths, one of 835584, which is correct and existed
      before the log replay, and another one of 1032192 which is wrong and is
      based on the logged file extent item:
      
       item 12 key (1106776064 EXTENT_ITEM 835584) itemoff 3406 itemsize 53
          refs 2 gen 15 flags DATA
          extent data backref root 5 objectid 271 offset 1200128 count 2
       item 13 key (1106776064 EXTENT_ITEM 1032192) itemoff 3353 itemsize 53
          refs 1 gen 22 flags DATA
          extent data backref root 5 objectid 271 offset 1200128 count 1
      
      Obviously this leads to many problems and a filesystem check reports many
      errors:
      
       (...)
       checking extents
       Extent back ref already exists for 1106776064 parent 0 root 5 owner 271 offset 1200128 num_refs 1
       extent item 1106776064 has multiple extent items
       ref mismatch on [1106776064 835584] extent item 2, found 3
       Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 2 wanted 1 back 0x55b1d0ad7680
       Backref 1106776064 root 5 owner 271 offset 1200128 num_refs 0 not found in extent tree
       Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 1 wanted 0 back 0x55b1d0ad4e70
       Backref bytes do not match extent backref, bytenr=1106776064, ref bytes=835584, backref bytes=1032192
       backpointer mismatch on [1106776064 835584]
       checking free space cache
       block group 1103101952 has wrong amount of free space
       failed to load free space cache for block group 1103101952
       checking fs roots
       (...)
      
      So fix this by logging the prealloc extents beyond the inode's i_size
      based on searches in the subvolume tree instead of the extent maps.
      
      Fixes: 471d557a ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay")
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61a9f6b7
    • Nick Desaulniers's avatar
      x86/paravirt: Make native_save_fl() extern inline · edefb935
      Nick Desaulniers authored
      commit d0a8d937 upstream.
      
      native_save_fl() is marked static inline, but by using it as
      a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.
      
      paravirt's use of native_save_fl() also requires that no GPRs other than
      %rax are clobbered.
      
      Compilers have different heuristics which they use to emit stack guard
      code, the emittance of which can break paravirt's callee saved assumption
      by clobbering %rcx.
      
      Marking a function definition extern inline means that if this version
      cannot be inlined, then the out-of-line version will be preferred. By
      having the out-of-line version be implemented in assembly, it cannot be
      instrumented with a stack protector, which might violate custom calling
      conventions that code like paravirt rely on.
      
      The semantics of extern inline has changed since gnu89. This means that
      folks using GCC versions >= 5.1 may see symbol redefinition errors at
      link time for subdirs that override KBUILD_CFLAGS (making the C standard
      used implicit) regardless of this patch. This has been cleaned up
      earlier in the patch set, but is left as a note in the commit message
      for future travelers.
      
      Reports:
       https://lkml.org/lkml/2018/5/7/534
       https://github.com/ClangBuiltLinux/linux/issues/16
      
      Discussion:
       https://bugs.llvm.org/show_bug.cgi?id=37512
       https://lkml.org/lkml/2018/5/24/1371
      
      Thanks to the many folks that participated in the discussion.
      Debugged-by: default avatarAlistair Strachan <astrachan@google.com>
      Debugged-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Suggested-by: default avatarTom Stellar <tstellar@redhat.com>
      Reported-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: akataria@vmware.com
      Cc: akpm@linux-foundation.org
      Cc: andrea.parri@amarulasolutions.com
      Cc: ard.biesheuvel@linaro.org
      Cc: aryabinin@virtuozzo.com
      Cc: astrachan@google.com
      Cc: boris.ostrovsky@oracle.com
      Cc: brijesh.singh@amd.com
      Cc: caoj.fnst@cn.fujitsu.com
      Cc: geert@linux-m68k.org
      Cc: ghackmann@google.com
      Cc: gregkh@linuxfoundation.org
      Cc: jan.kiszka@siemens.com
      Cc: jarkko.sakkinen@linux.intel.com
      Cc: joe@perches.com
      Cc: jpoimboe@redhat.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: kstewart@linuxfoundation.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-kbuild@vger.kernel.org
      Cc: manojgupta@google.com
      Cc: mawilcox@microsoft.com
      Cc: michal.lkml@markovi.net
      Cc: mjg59@google.com
      Cc: mka@chromium.org
      Cc: pombredanne@nexb.com
      Cc: rientjes@google.com
      Cc: rostedt@goodmis.org
      Cc: thomas.lendacky@amd.com
      Cc: tweek@google.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: yamada.masahiro@socionext.com
      Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edefb935
    • H. Peter Anvin's avatar
      x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h> · 92e50158
      H. Peter Anvin authored
      commit 0e2e1600 upstream.
      
      i386 and x86-64 uses different registers for arguments; make them
      available so we don't have to #ifdef in the actual code.
      
      Native size and specified size (q, l, w, b) versions are provided.
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: akataria@vmware.com
      Cc: akpm@linux-foundation.org
      Cc: andrea.parri@amarulasolutions.com
      Cc: ard.biesheuvel@linaro.org
      Cc: arnd@arndb.de
      Cc: aryabinin@virtuozzo.com
      Cc: astrachan@google.com
      Cc: boris.ostrovsky@oracle.com
      Cc: brijesh.singh@amd.com
      Cc: caoj.fnst@cn.fujitsu.com
      Cc: geert@linux-m68k.org
      Cc: ghackmann@google.com
      Cc: gregkh@linuxfoundation.org
      Cc: jan.kiszka@siemens.com
      Cc: jarkko.sakkinen@linux.intel.com
      Cc: joe@perches.com
      Cc: jpoimboe@redhat.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: kstewart@linuxfoundation.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-kbuild@vger.kernel.org
      Cc: manojgupta@google.com
      Cc: mawilcox@microsoft.com
      Cc: michal.lkml@markovi.net
      Cc: mjg59@google.com
      Cc: mka@chromium.org
      Cc: pombredanne@nexb.com
      Cc: rientjes@google.com
      Cc: rostedt@goodmis.org
      Cc: thomas.lendacky@amd.com
      Cc: tstellar@redhat.com
      Cc: tweek@google.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: yamada.masahiro@socionext.com
      Link: http://lkml.kernel.org/r/20180621162324.36656-3-ndesaulniers@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92e50158
    • Nick Desaulniers's avatar
      compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations · 779145a6
      Nick Desaulniers authored
      commit d03db2bc upstream.
      
      Functions marked extern inline do not emit an externally visible
      function when the gnu89 C standard is used. Some KBUILD Makefiles
      overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without
      an explicit C standard specified, the default is gnu11. Since c99, the
      semantics of extern inline have changed such that an externally visible
      function is always emitted. This can lead to multiple definition errors
      of extern inline functions at link time of compilation units whose build
      files have removed an explicit C standard compiler flag for users of GCC
      5.1+ or Clang.
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: akataria@vmware.com
      Cc: akpm@linux-foundation.org
      Cc: andrea.parri@amarulasolutions.com
      Cc: ard.biesheuvel@linaro.org
      Cc: aryabinin@virtuozzo.com
      Cc: astrachan@google.com
      Cc: boris.ostrovsky@oracle.com
      Cc: brijesh.singh@amd.com
      Cc: caoj.fnst@cn.fujitsu.com
      Cc: geert@linux-m68k.org
      Cc: ghackmann@google.com
      Cc: gregkh@linuxfoundation.org
      Cc: jan.kiszka@siemens.com
      Cc: jarkko.sakkinen@linux.intel.com
      Cc: jpoimboe@redhat.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: kstewart@linuxfoundation.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-kbuild@vger.kernel.org
      Cc: manojgupta@google.com
      Cc: mawilcox@microsoft.com
      Cc: michal.lkml@markovi.net
      Cc: mjg59@google.com
      Cc: mka@chromium.org
      Cc: pombredanne@nexb.com
      Cc: rientjes@google.com
      Cc: rostedt@goodmis.org
      Cc: sedat.dilek@gmail.com
      Cc: thomas.lendacky@amd.com
      Cc: tstellar@redhat.com
      Cc: tweek@google.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: yamada.masahiro@socionext.com
      Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      779145a6
  2. 17 Jul, 2018 13 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.56 · cff26c95
      Greg Kroah-Hartman authored
      cff26c95
    • Jaegeuk Kim's avatar
      f2fs: give message and set need_fsck given broken node id · eab3a341
      Jaegeuk Kim authored
      commit a4f843bd upstream.
      
      syzbot hit the following crash on upstream commit
      83beed7b (Fri Apr 20 17:56:32 2018 +0000)
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
      syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887
      
      C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264
      syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368
      Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368
      Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
      compiler: gcc (GCC) 8.0.1 20180413 (experimental)
      
      IMPORTANT: if you fix the bug, please add the following tag to the commit:
      Reported-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com
      It will help syzbot understand when the bug is fixed. See footer for details.
      If you forward the report, please keep this part and the footer.
      
      F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
      F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      F2FS-fs (loop0): invalid crc value
      ------------[ cut here ]------------
      kernel BUG at fs/f2fs/node.c:1185!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185
      RSP: 0018:ffff8801d960e820 EFLAGS: 00010293
      RAX: ffff8801d88205c0 RBX: 0000000000000003 RCX: ffffffff82f6cc06
      RDX: 0000000000000000 RSI: ffffffff82f6d5e8 RDI: 0000000000000004
      RBP: ffff8801d960ec30 R08: ffff8801d88205c0 R09: ffffed003b5e46c2
      R10: 0000000000000003 R11: 0000000000000003 R12: ffff8801a86e00c0
      R13: 0000000000000001 R14: ffff8801a86e0530 R15: ffff8801d9745240
      FS:  000000000072c880(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3d403209b8 CR3: 00000001d8f3f000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       get_node_page fs/f2fs/node.c:1237 [inline]
       truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014
       remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039
       f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547
       evict+0x4a6/0x960 fs/inode.c:557
       iput_final fs/inode.c:1519 [inline]
       iput+0x62d/0xa80 fs/inode.c:1545
       f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849
       mount_bdev+0x30c/0x3e0 fs/super.c:1164
       f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
       mount_fs+0xae/0x328 fs/super.c:1267
       vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
       vfs_kern_mount fs/namespace.c:1027 [inline]
       do_new_mount fs/namespace.c:2518 [inline]
       do_mount+0x564/0x3070 fs/namespace.c:2848
       ksys_mount+0x12d/0x140 fs/namespace.c:3064
       __do_sys_mount fs/namespace.c:3078 [inline]
       __se_sys_mount fs/namespace.c:3075 [inline]
       __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x443dea
      RSP: 002b:00007ffcc7882368 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443dea
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffcc7882370
      RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a
      R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004
      R13: 0000000000402ce0 R14: 0000000000000000 R15: 0000000000000000
      RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: ffff8801d960e820
      ---[ end trace 4edbeb71f002bb76 ]---
      
      Reported-and-tested-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eab3a341
    • Tetsuo Handa's avatar
      loop: remember whether sysfs_create_group() was done · d2c18ad1
      Tetsuo Handa authored
      commit d3349b6b upstream.
      
      syzbot is hitting WARN() triggered by memory allocation fault
      injection [1] because loop module is calling sysfs_remove_group()
      when sysfs_create_group() failed.
      Fix this by remembering whether sysfs_create_group() succeeded.
      
      [1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90bSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+9f03168400f56df89dbc6f1751f4458fe739ff29@syzkaller.appspotmail.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Renamed sysfs_ready -> sysfs_inited.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2c18ad1
    • Leon Romanovsky's avatar
      RDMA/ucm: Mark UCM interface as BROKEN · e8484443
      Leon Romanovsky authored
      commit 7a8690ed upstream.
      
      In commit 357d23c811a7 ("Remove the obsolete libibcm library")
      in rdma-core [1], we removed obsolete library which used the
      /dev/infiniband/ucmX interface.
      
      Following multiple syzkaller reports about non-sanitized
      user input in the UCMA module, the short audit reveals the same
      issues in UCM module too.
      
      It is better to disable this interface in the kernel,
      before syzkaller team invests time and energy to harden
      this unused interface.
      
      [1] https://github.com/linux-rdma/rdma-core/pull/279Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8484443
    • Tetsuo Handa's avatar
      PM / hibernate: Fix oops at snapshot_write() · 140eae92
      Tetsuo Handa authored
      commit fc14eebf upstream.
      
      syzbot is reporting NULL pointer dereference at snapshot_write() [1].
      This is because data->handle is zero-cleared by ioctl(SNAPSHOT_FREE).
      Fix this by checking data_of(data->handle) != NULL before using it.
      
      [1] https://syzkaller.appspot.com/bug?id=828a3c71bd344a6de8b6a31233d51a72099f27fdSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+ae590932da6e45d6564d@syzkaller.appspotmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      140eae92
    • Theodore Ts'o's avatar
      loop: add recursion validation to LOOP_CHANGE_FD · 6f9f5797
      Theodore Ts'o authored
      commit d2ac838e upstream.
      
      Refactor the validation code used in LOOP_SET_FD so it is also used in
      LOOP_CHANGE_FD.  Otherwise it is possible to construct a set of loop
      devices that all refer to each other.  This can lead to a infinite
      loop in starting with "while (is_loop_device(f)) .." in loop_set_fd().
      
      Fix this by refactoring out the validation code and using it for
      LOOP_CHANGE_FD as well as LOOP_SET_FD.
      
      Reported-by: syzbot+4349872271ece473a7c91190b68b4bac7c5dbc87@syzkaller.appspotmail.com
      Reported-by: syzbot+40bd32c4d9a3cc12a339@syzkaller.appspotmail.com
      Reported-by: syzbot+769c54e66f994b041be7@syzkaller.appspotmail.com
      Reported-by: syzbot+0a89a9ce473936c57065@syzkaller.appspotmail.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f9f5797
    • Florian Westphal's avatar
      netfilter: x_tables: initialise match/target check parameter struct · 348b32aa
      Florian Westphal authored
      commit c568503e upstream.
      
      syzbot reports following splat:
      
      BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450
       net/bridge/netfilter/ebt_stp.c:162
       ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162
       xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506
       ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline]
       ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline]
      
      The uninitialised access is
         xt_mtchk_param->nft_compat
      
      ... which should be set to 0.
      Fix it by zeroing the struct beforehand, same for tgchk.
      
      ip(6)tables targetinfo uses c99-style initialiser, so no change
      needed there.
      
      Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      348b32aa
    • Eric Dumazet's avatar
      netfilter: nf_queue: augment nfqa_cfg_policy · e5ee20c6
      Eric Dumazet authored
      commit ba062ebb upstream.
      
      Three attributes are currently not verified, thus can trigger KMSAN
      warnings such as :
      
      BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
      BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
      BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
      CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ #5
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117
       __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620
       __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
       __fswab32 include/uapi/linux/swab.h:59 [inline]
       nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
       nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212
       netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448
       nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x43fd59
      RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680
      R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
       kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2753 [inline]
       __kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:988 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
       netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: fdb694a0 ("netfilter: Add fail-open support")
      Fixes: 829e17a1 ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5ee20c6
    • Oleg Nesterov's avatar
      uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn() · 00323226
      Oleg Nesterov authored
      commit 90718e32 upstream.
      
      insn_get_length() has the side-effect of processing the entire instruction
      but only if it was decoded successfully, otherwise insn_complete() can fail
      and in this case we need to just return an error without warning.
      
      Reported-by: syzbot+30d675e3ca03c1c351e7@syzkaller.appspotmail.com
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: syzkaller-bugs@googlegroups.com
      Link: https://lkml.kernel.org/lkml/20180518162739.GA5559@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00323226
    • Eric Biggers's avatar
      crypto: x86/salsa20 - remove x86 salsa20 implementations · 19f39eff
      Eric Biggers authored
      commit b7b73cd5 upstream.
      
      The x86 assembly implementations of Salsa20 use the frame base pointer
      register (%ebp or %rbp), which breaks frame pointer convention and
      breaks stack traces when unwinding from an interrupt in the crypto code.
      Recent (v4.10+) kernels will warn about this, e.g.
      
      WARNING: kernel stack regs at 00000000a8291e69 in syzkaller047086:4677 has bad 'bp' value 000000001077994c
      [...]
      
      But after looking into it, I believe there's very little reason to still
      retain the x86 Salsa20 code.  First, these are *not* vectorized
      (SSE2/SSSE3/AVX2) implementations, which would be needed to get anywhere
      close to the best Salsa20 performance on any remotely modern x86
      processor; they're just regular x86 assembly.  Second, it's still
      unclear that anyone is actually using the kernel's Salsa20 at all,
      especially given that now ChaCha20 is supported too, and with much more
      efficient SSSE3 and AVX2 implementations.  Finally, in benchmarks I did
      on both Intel and AMD processors with both gcc 8.1.0 and gcc 4.9.4, the
      x86_64 salsa20-asm is actually slightly *slower* than salsa20-generic
      (~3% slower on Skylake, ~10% slower on Zen), while the i686 salsa20-asm
      is only slightly faster than salsa20-generic (~15% faster on Skylake,
      ~20% faster on Zen).  The gcc version made little difference.
      
      So, the x86_64 salsa20-asm is pretty clearly useless.  That leaves just
      the i686 salsa20-asm, which based on my tests provides a 15-20% speed
      boost.  But that's without updating the code to not use %ebp.  And given
      the maintenance cost, the small speed difference vs. salsa20-generic,
      the fact that few people still use i686 kernels, the doubt that anyone
      is even using the kernel's Salsa20 at all, and the fact that a SSE2
      implementation would almost certainly be much faster on any remotely
      modern x86 processor yet no one has cared enough to add one yet, I don't
      think it's worthwhile to keep.
      
      Thus, just remove both the x86_64 and i686 salsa20-asm implementations.
      
      Reported-by: syzbot+ffa3a158337bbc01ff09@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19f39eff
    • Keith Busch's avatar
      nvme-pci: Remap CMB SQ entries on every controller reset · 2a017ea2
      Keith Busch authored
      commit 815c6704 upstream.
      
      The controller memory buffer is remapped into a kernel address on each
      reset, but the driver was setting the submission queue base address
      only on the very first queue creation. The remapped address is likely to
      change after a reset, so accessing the old address will hit a kernel bug.
      
      This patch fixes that by setting the queue's CMB base address each time
      the queue is created.
      
      Fixes: f63572df ("nvme: unmap CMB and remove sysfs file in reset path")
      Reported-by: default avatarChristian Black <christian.d.black@intel.com>
      Cc: Jon Derrick <jonathan.derrick@intel.com>
      Cc: <stable@vger.kernel.org> # 4.9+
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarScott Bauer <scott.bauer@intel.com>
      Reviewed-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a017ea2
    • Juergen Gross's avatar
      xen: setup pv irq ops vector earlier · 54ca2776
      Juergen Gross authored
      commit 0ce0bba4 upstream.
      
      Setting pv_irq_ops for Xen PV domains should be done as early as
      possible in order to support e.g. very early printk() usage.
      
      The same applies to xen_vcpu_info_reset(0), as it is needed for the
      pv irq ops.
      
      Move the call of xen_setup_machphys_mapping() after initializing the
      pv functions as it contains a WARN_ON(), too.
      
      Remove the no longer necessary conditional in xen_init_irq_ops()
      from PVH V1 times to make clear this is a PV only function.
      
      Cc: <stable@vger.kernel.org> # 4.14
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54ca2776
    • Steve Wise's avatar
      iw_cxgb4: correctly enforce the max reg_mr depth · f47f1f97
      Steve Wise authored
      commit 7b72717a upstream.
      
      The code was mistakenly using the length of the page array memory instead
      of the depth of the page array.
      
      This would cause MR creation to fail in some cases.
      
      Fixes: 8376b86d ("iw_cxgb4: Support the new memory registration API")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f47f1f97