1. 31 Aug, 2016 1 commit
    • Shaohua Li's avatar
      raid5-cache: fix a deadlock in superblock write · 8e018c21
      Shaohua Li authored
      There is a potential deadlock in superblock write. Discard could zero data, so
      before discard we must make sure superblock is updated to new log tail.
      Updating superblock (either directly call md_update_sb() or depend on md
      thread) must hold reconfig mutex. On the other hand, raid5_quiesce is called
      with reconfig_mutex hold. The first step of raid5_quiesce() is waitting for all
      IO finish, hence waitting for reclaim thread, while reclaim thread is calling
      this function and waitting for reconfig mutex. So there is a deadlock. We
      workaround this issue with a trylock. The downside of the solution is we could
      miss discard if we can't take reconfig mutex. But this should happen rarely
      (mainly in raid array stop), so miss discard shouldn't be a big problem.
      
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      8e018c21
  2. 30 Aug, 2016 9 commits
    • Linus Torvalds's avatar
      Merge tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · 86a16798
      Linus Torvalds authored
      Pull MD fixes from Shaohua Li:
       "This includes several bug fixes:
      
         - Alexey Obitotskiy fixed a hang for faulty raid5 array with external
           management
      
         - Song Liu fixed two raid5 journal related bugs
      
         - Tomasz Majchrzak fixed a bad block recording issue and an
           accounting issue for raid10
      
         - ZhengYuan Liu fixed an accounting issue for raid5
      
         - I fixed a potential race condition and memory leak with DIF/DIX
           enabled
      
         - other trival fixes"
      
      * tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        raid5: avoid unnecessary bio data set
        raid5: fix memory leak of bio integrity data
        raid10: record correct address of bad block
        md-cluster: fix error return code in join()
        r5cache: set MD_JOURNAL_CLEAN correctly
        md: don't print the same repeated messages about delayed sync operation
        md: remove obsolete ret in md_start_sync
        md: do not count journal as spare in GET_ARRAY_INFO
        md: Prevent IO hold during accessing to faulty raid5 array
        MD: hold mddev lock to change bitmap location
        raid5: fix incorrectly counter of conf->empty_inactive_list_nr
        raid10: increment write counter after bio is split
      86a16798
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 0cf21c66
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
        Stable patches:
         - Fix a refcount leak in nfs_callback_up_net
         - Fix an Oopsable condition when the flexfile pNFS driver connection
           to the DS fails
         - Fix an Oopsable condition in NFSv4.1 server callback races
         - Ensure pNFS clients stop doing I/O to the DS if their lease has
           expired, as required by the NFSv4.1 protocol
      
        Bugfixes:
         - Fix potential looping in the NFSv4.x migration code
         - Patch series to close callback races for OPEN, LAYOUTGET and
           LAYOUTRETURN
         - Silence WARN_ON when NFSv4.1 over RDMA is in use
         - Fix a LAYOUTCOMMIT race in the pNFS/blocks client
         - Fix pNFS timeout issues when the DS fails"
      
      * tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFSv4.x: Fix a refcount leak in nfs_callback_up_net
        NFS4: Avoid migration loops
        pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
        NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
        NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
        NFSv4.1: Defer bumping the slot sequence number until we free the slot
        NFSv4.1: Delay callback processing when there are referring triples
        NFSv4.1: Fix Oopsable condition in server callback races
        SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
        pnfs/blocklayout: update last_write_offset atomically with extents
        pNFS: The client must not do I/O to the DS if it's lease has expired
        pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
        pNFS/flexfiles: Set reasonable default retrans values for the data channel
        NFS: Allow the mount option retrans=0
        pNFS/flexfiles: Fix layoutstat periodic reporting
      0cf21c66
    • Josh Poimboeuf's avatar
      mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 0d025d27
      Josh Poimboeuf authored
      There are three usercopy warnings which are currently being silenced for
      gcc 4.6 and newer:
      
      1) "copy_from_user() buffer size is too small" compile warning/error
      
         This is a static warning which happens when object size and copy size
         are both const, and copy size > object size.  I didn't see any false
         positives for this one.  So the function warning attribute seems to
         be working fine here.
      
         Note this scenario is always a bug and so I think it should be
         changed to *always* be an error, regardless of
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
      
      2) "copy_from_user() buffer size is not provably correct" compile warning
      
         This is another static warning which happens when I enable
         __compiletime_object_size() for new compilers (and
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
         is const, but copy size is *not*.  In this case there's no way to
         compare the two at build time, so it gives the warning.  (Note the
         warning is a byproduct of the fact that gcc has no way of knowing
         whether the overflow function will be called, so the call isn't dead
         code and the warning attribute is activated.)
      
         So this warning seems to only indicate "this is an unusual pattern,
         maybe you should check it out" rather than "this is a bug".
      
         I get 102(!) of these warnings with allyesconfig and the
         __compiletime_object_size() gcc check removed.  I don't know if there
         are any real bugs hiding in there, but from looking at a small
         sample, I didn't see any.  According to Kees, it does sometimes find
         real bugs.  But the false positive rate seems high.
      
      3) "Buffer overflow detected" runtime warning
      
         This is a runtime warning where object size is const, and copy size >
         object size.
      
      All three warnings (both static and runtime) were completely disabled
      for gcc 4.6 with the following commit:
      
        2fb0815c ("gcc4: disable __compiletime_object_size for GCC 4.6+")
      
      That commit mistakenly assumed that the false positives were caused by a
      gcc bug in __compiletime_object_size().  But in fact,
      __compiletime_object_size() seems to be working fine.  The false
      positives were instead triggered by #2 above.  (Though I don't have an
      explanation for why the warnings supposedly only started showing up in
      gcc 4.6.)
      
      So remove warning #2 to get rid of all the false positives, and re-enable
      warnings #1 and #3 by reverting the above commit.
      
      Furthermore, since #1 is a real bug which is detected at compile time,
      upgrade it to always be an error.
      
      Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
      needed.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d025d27
    • Linus Torvalds's avatar
      Merge branch 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · d8dc020c
      Linus Torvalds authored
      Pull libata fixes from Tejun Heo:
       "Two libata driver specific fixes for v4.8-rc4.  Nothing too scary"
      
      * 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        pata_ninja32: Avoid corrupting status flags
        ahci: disable correct irq for dummy ports
      d8dc020c
    • Linus Torvalds's avatar
      Merge branch 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 748e7fc2
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "Two fixes for cgroup.
      
         - There still was a hole in enforcing cpuset rules, fixed by Li.
      
         - The recent switch to global percpu_rwseom for threadgroup locking
           revealed a couple issues in how percpu_rwsem is implemented and
           used by cgroup.  Balbir found that the read locking section was too
           wide unnecessarily including operations which can often depend on
           IOs.  With percpu_rwsem updates (coming through a different tree)
           and reduction of read locking section, all the reported locking
           latency issues, including the android one, are resolved.
      
        It looks like we can keep global percpu_rwsem locking for now.  If
        there actually are cases which can't be resolved, we can go back to
        more complex per-signal_struct locking"
      
      * 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
        cpuset: make sure new tasks conform to the current config of the cpuset
      748e7fc2
    • Alan Cox's avatar
      pata_ninja32: Avoid corrupting status flags · 9ebae9e4
      Alan Cox authored
      Ninja32 needs to set some flags to indicate it does 32bit IO. However it currently assigns this which
      loses the initializing flag and causes a warning spew. Fix it to use a logical or as is intended.
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Tested-by: default avatarEllmar Stelnberger <estellnb@elstel.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9ebae9e4
    • Trond Myklebust's avatar
      NFSv4.x: Fix a refcount leak in nfs_callback_up_net · 98b0f80c
      Trond Myklebust authored
      On error, the callers expect us to return without bumping
      nn->cb_users[].
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Cc: stable@vger.kernel.org # v3.7+
      98b0f80c
    • Benjamin Coddington's avatar
      NFS4: Avoid migration loops · 52442f9b
      Benjamin Coddington authored
      If a server returns itself as a location while migrating, the client may
      end up getting stuck attempting to migrate twice to the same server.  Catch
      this by checking if the nfs_client found is the same as the existing
      client.  For the other two callers to nfs4_set_client, the nfs_client will
      always be ERR_PTR(-EINVAL).
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      52442f9b
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.8-rc5' of... · e4e98c46
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fix from Guenter Roeck:
       "Add missing sysfs attribute group terminator to it87 driver"
      
      * tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (it87) Add missing sysfs attribute group terminator
      e4e98c46
  3. 29 Aug, 2016 24 commits
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · b8927721
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Fix bugs that could cause kernel deadlocks or file system corruption
        while moving xattrs to expand the extended inode.
      
        Also add some sanity checks to the block group descriptors to make
        sure we don't end up overwriting the superblock"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: avoid deadlock when expanding inode size
        ext4: properly align shifted xattrs when expanding inodes
        ext4: fix xattr shifting when expanding inodes part 2
        ext4: fix xattr shifting when expanding inodes
        ext4: validate that metadata blocks do not overlap superblock
        ext4: reserve xattr index for the Hurd
      b8927721
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1f6a563e
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Segregate namespaces properly in conntrack dumps, from Liping Zhang.
      
       2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet.
      
       3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz.
      
       4) Fix use-after-free in tcp_xmit_retransmit_queue().
      
       5) Userspace header fixups (use of __u32, missing includes, etc.) from
          Mikko Rapeli.
      
       6) Further refinements to fragmentation wrt gso and tunnels, from
          Shmulik Ladkani.
      
       7) Trigger poll correctly for zero length UDP packets, from Eric
          Dumazet.
      
       8) TCP window scaling fix, also from Eric Dumazet.
      
       9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets.
      
      10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet.
      
      11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng.
      
      12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet.
      
      13) Add new device ID to alx driver, from Owen Lin.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
        Add Killer E2500 device ID in alx driver.
        net: smc91x: fix SMC accesses
        Documentation: networking: dsa: Remove platform device TODO
        net/mlx5: Increase number of ethtool steering priorities
        net/mlx5: Add error prints when validate ETS failed
        net/mlx5e: Fix memory leak if refreshing TIRs fails
        net/mlx5e: Add ethtool counter for TX xmit_more
        net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
        net/mlx5e: Don't wait for SQ completions on close
        net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
        net/mlx5e: Don't wait for RQ completions on close
        net/mlx5e: Limit UMR length to the device's limitation
        rhashtable: fix a memory leak in alloc_bucket_locks()
        sfc: fix potential stack corruption from running past stat bitmask
        team: loadbalance: push lacpdus to exact delivery
        net: hns: dereference ppe_cb->ppe_common_cb if it is non-null
        8139cp: Fix one possible deadloop in cp_rx_poll
        i40e: Change some init flow for the client
        Revert "phy: IRQ cannot be shared"
        net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
        ...
      1f6a563e
    • Trond Myklebust's avatar
      pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails · 3dc14735
      Trond Myklebust authored
      If the attempt to connect to a DS fails inside ff_layout_pg_init_read or
      ff_layout_pg_init_write, then we currently end up clearing the layout
      segment carried by the struct nfs_pageio_descriptor, causing an Oops
      when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist.
      
      The fix is to ensure we return the layout and then retry.
      
      Fixes: 446ca219 ("pNFS/flexfiles: When initing reads or writes, we...")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      3dc14735
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.8-4' of... · cf4d3779
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull x86 platform driver fixes from Darren Hart:
       "Remove module related code from two drivers that are only configurable
        as built-in: intel_pmic_gpio and platform/olpc"
      
      * tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
        intel_pmic_gpio: Make explicitly non-modular
        platform/olpc: Make ec explicitly non-modular
      cf4d3779
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 2a90309e
      Linus Torvalds authored
      Pull powerpc fixes from Ben Herrenschmidt:
       "This was meant to be sent early last week, but I has a change pending
        on one of the fixes and other things made me forget all about.  Ugh.
      
        We have some misc fixes for powerpc 4.8.  Some trivial bits and some
        regressions, and a trivial cleanup or two that I saw no point in
        letting rot in patchwork"
      
      * tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc: signals: Discard transaction state from signal frames
        powerpc/powernv : Drop reference added by kset_find_obj()
        powerpc/tm: do not use r13 for tabort_syscall
        powerpc: move hmi.c to arch/powerpc/kvm/
        powerpc: sysdev: cpm: fix gpio save_regs functions
        powerpc/pseries: PACA save area fix for MCE vs MCE
        powerpc/pseries: PACA save area fix for general exception vs MCE
        powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
        powerpc, hotplug: Avoid to touch non-existent cpumasks.
        powerpc: migrate exception table users off module.h and onto extable.h
        powerpc/powernv/pci: fix iterator signedness
        powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
        cxl: use pcibios_free_controller_deferred() when removing vPHBs
        powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner"
        powerpc/512x: Delete unnecessary assignment for the field "owner"
        drivers/macintosh: Delete owner assignment
        powerpc: cputhreads: Add missing include file
      2a90309e
    • Jean Delvare's avatar
      hwmon: (it87) Add missing sysfs attribute group terminator · 3c329263
      Jean Delvare authored
      Attribute array it87_attributes_in lacks its NULL terminator,
      causing random behavior when operating on the attribute group.
      
      Fixes: 52929715 ("hwmon: (it87) Use is_visible for voltage sensors")
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      3c329263
    • Paul Gortmaker's avatar
      intel_pmic_gpio: Make explicitly non-modular · da43bf0c
      Paul Gortmaker authored
      The Kconfig entry controlling compilation of this code is:
      
      drivers/platform/x86/Kconfig:config GPIO_INTEL_PMIC
      drivers/platform/x86/Kconfig:   bool "Intel PMIC GPIO support"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the couple traces of modular infrastructure use, so that
      when reading the driver there is no doubt it is builtin-only.
      
      We delete the MODULE_LICENSE tag etc. since all that information
      was (or is now) contained at the top of the file in the comments.
      
      We don't replace module.h with init.h since the file already has that.
      
      Cc: Alek Du <alek.du@intel.com>
      Cc: platform-driver-x86@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      da43bf0c
    • Paul Gortmaker's avatar
      platform/olpc: Make ec explicitly non-modular · f48d1496
      Paul Gortmaker authored
      The Kconfig entry controlling compilation of this code is:
      
      arch/x86/Kconfig:config OLPC
      arch/x86/Kconfig:       bool "One Laptop Per Child support"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the couple traces of modular infrastructure use, so that
      when reading the driver there is no doubt it is builtin-only.
      
      We delete the MODULE_LICENSE tag etc. since all that information
      was (or is now) contained at the top of the file in the comments.
      
      Cc: platform-driver-x86@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: default avatarAndres Salomon <dilinger@queued.net>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      f48d1496
    • Owen Lin's avatar
      Add Killer E2500 device ID in alx driver. · b99b43bb
      Owen Lin authored
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b99b43bb
    • Russell King's avatar
      net: smc91x: fix SMC accesses · 2fb04fdf
      Russell King authored
      Commit b70661c7 ("net: smc91x: use run-time configuration on all ARM
      machines") broke some ARM platforms through several mistakes.  Firstly,
      the access size must correspond to the following rule:
      
      (a) at least one of 16-bit or 8-bit access size must be supported
      (b) 32-bit accesses are optional, and may be enabled in addition to
          the above.
      
      Secondly, it provides no emulation of 16-bit accesses, instead blindly
      making 16-bit accesses even when the platform specifies that only 8-bit
      is supported.
      
      Reorganise smc91x.h so we can make use of the existing 16-bit access
      emulation already provided - if 16-bit accesses are supported, use
      16-bit accesses directly, otherwise if 8-bit accesses are supported,
      use the provided 16-bit access emulation.  If neither, BUG().  This
      exactly reflects the driver behaviour prior to the commit being fixed.
      
      Since the conversion incorrectly cut down the available access sizes on
      several platforms, we also need to go through every platform and fix up
      the overly-restrictive access size: Arnd assumed that if a platform can
      perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
      size needed to be specified - not so, all available access sizes must
      be specified.
      
      This likely fixes some performance regressions in doing this: if a
      platform does not support 8-bit accesses, 8-bit accesses have been
      emulated by performing a 16-bit read-modify-write access.
      
      Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
      accesses, which was broken by the original commit.
      
      Fixes: b70661c7 ("net: smc91x: use run-time configuration on all ARM machines")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Tested-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fb04fdf
    • Florian Fainelli's avatar
      Documentation: networking: dsa: Remove platform device TODO · 7d13eca0
      Florian Fainelli authored
      Since commit 83c0afae ("net: dsa: Add new binding implementation"),
      the shortcomings of the dsa platform device have been addressed, remove
      that TODO item.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d13eca0
    • David S. Miller's avatar
      Merge branch 'mlx5-series' · e4d986a8
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox 100G mlx5 fixes 2016-08-29
      
      This series contains some bug fixes for the mlx5 core and mlx5
      ethernet driver.
      
      From Saeed, Fix UMR to consider hardware translation table field
      size limitation when calculating the maximum number of MTTs required
      by the driver.  Three patches to speed-up netdevice close time by
      serializing channel (SQs & RQs) destruction rather than issuing and
      waiting for hardware interrupts to free them.
      
      From Eran, Fix ethtool ring parameter reporting for striding RQ layout.
      Add error prints on ETS validation failure.
      
      From Kamal, Fix memory leak on error flow.
      
      From Maor, Fix ethtool steering priorities number.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4d986a8
    • Maor Gottlieb's avatar
      net/mlx5: Increase number of ethtool steering priorities · e5835f28
      Maor Gottlieb authored
      Ethtool has 11 flow tables, each flow table has its own priority.
      Increase the number of priorities to be aligned with the number of flow
      tables.
      
      Fixes: 1174fce8 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5835f28
    • Eran Ben Elisha's avatar
      net/mlx5: Add error prints when validate ETS failed · 1722b969
      Eran Ben Elisha authored
      Upon set ETS failure due to user invalid input, add error prints to
      specify the exact error to the user.
      
      Fixes: cdcf1121 ('net/mlx5e: Validate BW weight values of ETS')
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1722b969
    • Kamal Heib's avatar
      net/mlx5e: Fix memory leak if refreshing TIRs fails · bf50082c
      Kamal Heib authored
      Free 'in' command object also when mlx5_core_modify_tir fails.
      
      Fixes: 724b2aa1 ("net/mlx5e: TIRs management refactoring")
      Signed-off-by: default avatarKamal Heib <kamalh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf50082c
    • Tariq Toukan's avatar
      net/mlx5e: Add ethtool counter for TX xmit_more · c8cf78fe
      Tariq Toukan authored
      Add a counter in ethtool for the number of times that
      TX xmit_more was used.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8cf78fe
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ · cc8e9ebf
      Eran Ben Elisha authored
      The driver RQ has two possible configurations: striding RQ and
      non-striding RQ.  Until this patch, the driver always reported the
      number of hardware WQEs (ring descriptors). For non striding RQ
      configuration, this was OK since we have one WQE per pending packet
      For striding RQ, multiple packets can fit into one WQE. For better
      user experience we normalize the rx_pending parameter (size of wqe/mtu)
      as the average ring size in case of striding RQ.
      
      Fixes: 461017cb ('net/mlx5e: Support RX multi-packet WQE ...')
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc8e9ebf
    • Saeed Mahameed's avatar
      net/mlx5e: Don't wait for SQ completions on close · 6e8dd6d6
      Saeed Mahameed authored
      Instead of asking the firmware to flush the SQ (Send Queue) via
      asynchronous completions when moved to error, we handle SQ flush
      manually (mlx5e_free_tx_descs) same as we did when SQ flush got
      timed out or on tx_timeout.
      
      This will reduce SQs flush time and speedup interface down procedure.
      
      Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
      critical code locality.
      
      Fixes: 29429f33 ('net/mlx5e: Timeout if SQ doesn't flush during close')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e8dd6d6
    • Saeed Mahameed's avatar
      net/mlx5e: Don't post fragmented MPWQE when RQ is disabled · 8484f9ed
      Saeed Mahameed authored
      ICO (Internal control operations) SQ (Send Queue) is closed/disabled
      after RQ (Receive Queue).  After RQ is closed an ICO SQ completion
      might post a fragmented MPWQE (Multi Packet Work Queue Element) into
      that RQ.
      
      As on regular RQ post, check if we are allowed to post to that
      RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
      if needed.
      
      Fixes: bc77b240 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8484f9ed
    • Saeed Mahameed's avatar
      net/mlx5e: Don't wait for RQ completions on close · f2fde18c
      Saeed Mahameed authored
      This will significantly reduce receive queue flush time on interface
      down.
      
      Instead of asking the firmware to flush the RQ (Receive Queue) via
      asynchronous completions when moved to error, we handle RQ flush
      manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
      out.
      
      This will reduce RQs flush time and speedup interface down procedure
      (ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.
      
      Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
      free form non critical data path code for better code locality.
      
      Fixes: 6cd392a0 ('net/mlx5e: Handle RQ flush in error cases')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2fde18c
    • Saeed Mahameed's avatar
      net/mlx5e: Limit UMR length to the device's limitation · fe4c988b
      Saeed Mahameed authored
      ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
      is limited to U16_MAX, before this patch we ignored that limitation and
      requested the maximum possible UMR translation length that the netdev
      might need (MAX channels * MAX pages per channel).
      In case of a system with #cores > 32 and when linear WQE allocation fails,
      falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
      stuck.
      
      Here we limit UMR length to min(U16_MAX, max required pages) (while
      considering the required alignments) on driver load, by default U16_MAX is
      sufficient since the default RX rings value guarantees that we are in
      range, dynamically (on set_ringparam/set_channels) we will check if the
      new required UMR length (num mtts) is still in range, if not, fail the
      request.
      
      Fixes: bc77b240 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe4c988b
    • Cyril Bur's avatar
      powerpc: signals: Discard transaction state from signal frames · 78a3e888
      Cyril Bur authored
      Userspace can begin and suspend a transaction within the signal
      handler which means they might enter sys_rt_sigreturn() with the
      processor in suspended state.
      
      sys_rt_sigreturn() wants to restore process context (which may have
      been in a transaction before signal delivery). To do this it must
      restore TM SPRS. To achieve this, any transaction initiated within the
      signal frame must be discarded in order to be able to restore TM SPRs
      as TM SPRs can only be manipulated non-transactionally..
      >From the PowerPC ISA:
        TM Bad Thing Exception [Category: Transactional Memory]
         An attempt is made to execute a mtspr targeting a TM register in
         other than Non-transactional state.
      
      Not doing so results in a TM Bad Thing:
      [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
      [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
      [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
      [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
      [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
       nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
       xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
       ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
       uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
       scsi_transport_sas bnx2x ipr mdio libcrc32c
      [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
      [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
      [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
      [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700   Not tainted (4.7.0)
      [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280  XER: 20000000
      [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
      GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
      GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
      GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
      GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
      GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
      [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
      [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
      [12045.223630] Call Trace:
      [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
      [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
      [12045.223806] Instruction dump:
      [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
      [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
      [12045.224074] ---[ end trace cb8002ee240bae76 ]---
      
      It isn't clear exactly if there is really a use case for userspace
      returning with a suspended transaction, however, doing so doesn't (on
      its own) constitute a bad frame. As such, this patch simply discards
      the transactional state of the context calling the sigreturn and
      continues.
      Reported-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Tested-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Reviewed-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Acked-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      78a3e888
    • Mukesh Ojha's avatar
      powerpc/powernv : Drop reference added by kset_find_obj() · a9cbf0b2
      Mukesh Ojha authored
      In a situation, where Linux kernel gets notified about duplicate error log
      from OPAL, it is been observed that kernel fails to remove sysfs entries
      (/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
      we currently search the error log/dump kobject in the kset list via
      'kset_find_obj()' routine. Which eventually increment the reference count
      by one, once it founds the kobject.
      
      So, unless we decrement the reference count by one after it found the kobject,
      we would not be able to release the kobject properly later.
      
      This patch adds the 'kobject_put()' which was missing earlier.
      Signed-off-by: default avatarMukesh Ojha <mukesh02@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a9cbf0b2
    • Nicholas Piggin's avatar
      powerpc/tm: do not use r13 for tabort_syscall · cc7786d3
      Nicholas Piggin authored
      tabort_syscall runs with RI=1, so a nested recoverable machine
      check will load the paca into r13 and overwrite what we loaded
      it with, because exceptions returning to privileged mode do not
      restore r13.
      
      Fixes: b4b56f9e (powerpc/tm: Abort syscalls in active transactions)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNick Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cc7786d3
  4. 28 Aug, 2016 6 commits