1. 09 Jul, 2014 10 commits
    • Paul E. McKenney's avatar
      rcu: Simplify priority boosting by putting rt_mutex in rcu_node · abaa93d9
      Paul E. McKenney authored
      RCU priority boosting currently checks for boosting via a pointer in
      task_struct.  However, this is not needed: As Oleg noted, if the
      rt_mutex is placed in the rcu_node instead of on the booster's stack,
      the boostee can simply check it see if it owns the lock.  This commit
      makes this change, shrinking task_struct by one pointer and the kernel
      by thirteen lines.
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abaa93d9
    • Pranith Kumar's avatar
      rcu: Check both root and current rcu_node when setting up future grace period · 48bd8e9b
      Pranith Kumar authored
      The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
      and ->completed twice, once without ACCESS_ONCE() and once with it.
      Which is pointless because we hold that rcu_node's ->lock at that point.
      The intent was to check the current rcu_node structure and the root
      rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
      commit therefore makes that change.
      
      The reason that it is safe to locklessly check the root rcu_nodes's
      ->gpnum and ->completed fields is that we hold the current rcu_node's
      ->lock, which constrains the root rcu_node's ability to change its
      ->gpnum and ->completed fields.  Of course, if there is a single rcu_node
      structure, then rnp_root==rnp, and holding the lock prevents all changes.
      If there is more than one rcu_node structure, then the code updates the
      fields in the following order:
      
      1.	Increment rnp_root->gpnum to start new grace period.
      2.	Increment rnp->gpnum to initialize the current rcu_node,
      	continuing initialization for the new grace period.
      3.	Increment rnp_root->completed to end the current grace period.
      4.	Increment rnp->completed to continue cleaning up after the
      	old grace period.
      
      So there are four possible combinations of relative values of these
      four fields:
      
      N   N   N   N:  RCU idle, new grace period must be initiated.
      		Although rnp_root->gpnum might be incremented immediately
      		after we check, that will just result in unnecessary work.
      		The grace period already started, and we try to start it.
      
      N+1 N   N   N:  RCU grace period just started.  No further change is
      		possible because we hold rnp->lock, so the checks of
      		rnp_root->gpnum and rnp_root->completed are stable.
      		We know that our request for a future grace period will
      		be seen during grace-period cleanup.
      
      N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp->gpnum is
      		different than rnp->completed, we won't even look at
      		rnp_root->gpnum and rnp_root->completed, so the possible
      		concurrent change to rnp_root->completed does not matter.
      		We know that our request for a future grace period will
      		be seen during grace-period cleanup, which cannot pass
      		this rcu_node because we hold its ->lock.
      
      N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
      		Because rnp->gpnum is different than rnp->completed, we
      		won't look at rnp_root->gpnum and rnp_root->completed, so
      		the possible concurrent change to rnp_root->completed does
      		not matter.  We know that our request for a future grace
      		period will be seen during grace-period cleanup, which
      		cannot pass this rcu_node because we hold its ->lock.
      
      Therefore, despite initial appearances, the lockless check is safe.
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      [ paulmck: Update comment to say why the lockless check is safe. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      48bd8e9b
    • Paul E. McKenney's avatar
      rcu: Allow post-unlock reference for rt_mutex · dfeb9765
      Paul E. McKenney authored
      The current approach to RCU priority boosting uses an rt_mutex strictly
      for its priority-boosting side effects.  The rt_mutex_init_proxy_locked()
      function is used by the booster to initialize the lock as held by the
      boostee.  The booster then uses rt_mutex_lock() to acquire this rt_mutex,
      which priority-boosts the boostee.  When the boostee reaches the end
      of its outermost RCU read-side critical section, it checks a field in
      its task structure to see whether it has been boosted, and, if so, uses
      rt_mutex_unlock() to release the rt_mutex.  The booster can then go on
      to boost the next task that is blocking the current RCU grace period.
      
      But reasonable implementations of rt_mutex_unlock() might result in the
      boostee referencing the rt_mutex's data after releasing it.  But the
      booster might have re-initialized the rt_mutex between the time that the
      boostee released it and the time that it later referenced it.  This is
      clearly asking for trouble, so this commit introduces a completion that
      forces the booster to wait until the boostee has completely finished with
      the rt_mutex, thus avoiding the case where the booster is re-initializing
      the rt_mutex before the last boostee's last reference to that rt_mutex.
      
      This of course does introduce some overhead, but the priority-boosting
      code paths are miles from any possible fastpath, and the overhead of
      executing the completion will normally be quite small compared to the
      overhead of priority boosting and deboosting, so this should be OK.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dfeb9765
    • Paul E. McKenney's avatar
      rcu: Loosen __call_rcu()'s rcu_head alignment constraint · 1146edcb
      Paul E. McKenney authored
      The m68k architecture aligns only to 16-bit boundaries, which can cause
      the align-to-32-bits check in __call_rcu() to trigger.  Because there is
      currently no known potential need for more than one low-order bit, this
      commit loosens the check to 16-bit boundaries.
      Reported-by: default avatarGreg Ungerer <gerg@uclinux.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      1146edcb
    • Paul E. McKenney's avatar
      rcu: Eliminate read-modify-write ACCESS_ONCE() calls · a792563b
      Paul E. McKenney authored
      RCU contains code of the following forms:
      
      	ACCESS_ONCE(x)++;
      	ACCESS_ONCE(x) += y;
      	ACCESS_ONCE(x) -= y;
      
      Now these constructs do operate correctly, but they really result in a
      pair of volatile accesses, one to do the load and another to do the store.
      This can be confusing, as the casual reader might well assume that (for
      example) gcc might generate a memory-to-memory add instruction for each
      of these three cases.  In fact, gcc will do no such thing.  Also, there
      is a good chance that the kernel will move to separate load and store
      variants of ACCESS_ONCE(), and constructs like the above could easily
      confuse both people and scripts attempting to make that sort of change.
      Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
      only need the store to be volatile, so that the read-modify-write form
      might be misleading.
      
      This commit therefore changes the above forms in RCU so that each instance
      of ACCESS_ONCE() either does a load or a store, but not both.  In a few
      cases, ACCESS_ONCE() was not critical, for example, for maintaining
      statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
      entirely.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a792563b
    • Paul E. McKenney's avatar
      rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu · 4da117cf
      Paul E. McKenney authored
      In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant
      once boot completes.  Thus, there is no need to wrap it in ACCESS_ONCE()
      in code that is built only when CONFIG_NO_HZ_FULL.  This commit therefore
      removes the redundant ACCESS_ONCE().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      4da117cf
    • Fabian Frederick's avatar
      rcu: Make rcu node arrays static const char * const · b4426b49
      Fabian Frederick authored
      Those two arrays are being passed to lockdep_init_map(), which expects
      const char *, and are stored in lockdep_map the same way.
      
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b4426b49
    • Paul E. McKenney's avatar
      signal: Explain local_irq_save() call · c41247e1
      Paul E. McKenney authored
      The explicit local_irq_save() in __lock_task_sighand() is needed to avoid
      a potential deadlock condition, as noted in a841796f (signal:
      align __lock_task_sighand() irq disabling and RCU).  However, someone
      reading the code might be forgiven for concluding that this separate
      local_irq_save() was completely unnecessary.  This commit therefore adds
      a comment referencing the shiny new block comment on rcu_read_unlock().
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      c41247e1
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
  2. 23 Jun, 2014 2 commits
    • Paul E. McKenney's avatar
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney authored
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
    • Paul E. McKenney's avatar
      rcu: Export debug_init_rcu_head() and and debug_init_rcu_head() · 546a9d85
      Paul E. McKenney authored
      Currently, call_rcu() relies on implicit allocation and initialization
      for the debug-objects handling of RCU callbacks.  If you hammer the
      kernel hard enough with Sasha's modified version of trinity, you can end
      up with the sl*b allocators recursing into themselves via this implicit
      call_rcu() allocation.
      
      This commit therefore exports the debug_init_rcu_head() and
      debug_rcu_head_free() functions, which permits the allocators to allocated
      and pre-initialize the debug-objects information, so that there no longer
      any need for call_rcu() to do that initialization, which in turn prevents
      the recursion into the memory allocators.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Looks-good-to: Christoph Lameter <cl@linux.com>
      546a9d85
  3. 16 Jun, 2014 4 commits
    • Linus Torvalds's avatar
      Linux 3.16-rc1 · 7171511e
      Linus Torvalds authored
      7171511e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a9be2242
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix checksumming regressions, from Tom Herbert.
      
       2) Undo unintentional permissions changes for SCTP rto_alpha and
          rto_beta sysfs knobs, from Denial Borkmann.
      
       3) VXLAN, like other IP tunnels, should advertize it's encapsulation
          size using dev->needed_headroom instead of dev->hard_header_len.
          From Cong Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: sctp: fix permissions for rto_alpha and rto_beta knobs
        vxlan: Checksum fixes
        net: add skb_pop_rcv_encapsulation
        udp: call __skb_checksum_complete when doing full checksum
        net: Fix save software checksum complete
        net: Fix GSO constants to match NETIF flags
        udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
        vxlan: use dev->needed_headroom instead of dev->hard_header_len
        MAINTAINERS: update cxgb4 maintainer
      a9be2242
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux · dd1845af
      Linus Torvalds authored
      Pull more clock framework updates from Mike Turquette:
       "This contains the second half the of the clk changes for 3.16.
      
        They are simply fixes and code refactoring for the OMAP clock drivers.
        The sunxi clock driver changes include splitting out the one
        mega-driver into several smaller pieces and adding support for the A31
        SoC clocks"
      
      * tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits)
        clk: sunxi: document PRCM clock compatible strings
        clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support
        clk: sun6i: Protect SDRAM gating bit
        clk: sun6i: Protect CPU clock
        clk: sunxi: Rework clock protection code
        clk: sunxi: Move the GMAC clock to a file of its own
        clk: sunxi: Move the 24M oscillator to a file of its own
        clk: sunxi: Remove calls to clk_put
        clk: sunxi: document new A31 USB clock compatible
        clk: sunxi: Implement A31 USB clock
        ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies
        CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies
        ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC)
        CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck
        CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)
        dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings
        ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock
        CLK: TI: gate: add composite interface clock to OMAP2 only build
        ARM: OMAP2: clock: add DT boot support for cpufreq_ck
        CLK: TI: OMAP2: add clock init support
        ...
      dd1845af
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/users/willy/linux-nvme · b55b3902
      Linus Torvalds authored
      Pull NVMe update from Matthew Wilcox:
       "Mostly bugfixes again for the NVMe driver.  I'd like to call out the
        exported tracepoint in the block layer; I believe Keith has cleared
        this with Jens.
      
        We've had a few reports from people who're really pounding on NVMe
        devices at scale, hence the timeout changes (and new module
        parameters), hotplug cpu deadlock, tracepoints, and minor performance
        tweaks"
      
      [ Jens hadn't seen that tracepoint thing, but is ok with it - it will
        end up going away when mq conversion happens ]
      
      * git://git.infradead.org/users/willy/linux-nvme: (22 commits)
        NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.
        NVMe: Use Log Page constants in SCSI emulation
        NVMe: Define Log Page constants
        NVMe: Fix hot cpu notification dead lock
        NVMe: Rename io_timeout to nvme_io_timeout
        NVMe: Use last bytes of f/w rev SCSI Inquiry
        NVMe: Adhere to request queue block accounting enable/disable
        NVMe: Fix nvme get/put queue semantics
        NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
        NVMe: Make admin timeout a module parameter
        NVMe: Make iod bio timeout a parameter
        NVMe: Prevent possible NULL pointer dereference
        NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
        NVMe: Update data structures for NVMe 1.2
        NVMe: Enable BUILD_BUG_ON checks
        NVMe: Update namespace and controller identify structures to the 1.1a spec
        NVMe: Flush with data support
        NVMe: Configure support for block flush
        NVMe: Add tracepoints
        NVMe: Protect against badly formatted CQEs
        ...
      b55b3902
  4. 15 Jun, 2014 11 commits
    • Daniel Borkmann's avatar
      net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1
      Daniel Borkmann authored
      Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
      to jiffies conversions.") has silently changed permissions for
      rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
      this was to discourage users from tweaking rto_alpha and
      rto_beta knobs in production environments since they are key
      to correctly compute rtt/srtt.
      
      RFC4960 under section 6.3.1. RTO Calculation says regarding
      rto_alpha and rto_beta under rule C3 and C4:
      
        [...]
        C3)  When a new RTT measurement R' is made, set
      
             RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
      
             and
      
             SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
      
             Note: The value of SRTT used in the update to RTTVAR
             is its value before updating SRTT itself using the
             second assignment. After the computation, update
             RTO <- SRTT + 4 * RTTVAR.
      
        C4)  When data is in flight and when allowed by rule C5
             below, a new RTT measurement MUST be made each round
             trip. Furthermore, new RTT measurements SHOULD be
             made no more than once per round trip for a given
             destination transport address. There are two reasons
             for this recommendation: First, it appears that
             measuring more frequently often does not in practice
             yield any significant benefit [ALLMAN99]; second,
             if measurements are made more often, then the values
             of RTO.Alpha and RTO.Beta in rule C3 above should be
             adjusted so that SRTT and RTTVAR still adjust to
             changes at roughly the same rate (in terms of how many
             round trips it takes them to reflect new values) as
             they would if making only one measurement per
             round-trip and using RTO.Alpha and RTO.Beta as given
             in rule C3. However, the exact nature of these
             adjustments remains a research issue.
        [...]
      
      While it is discouraged to adjust rto_alpha and rto_beta
      and not further specified how to adjust them, the RFC also
      doesn't explicitly forbid it, but rather gives a RECOMMENDED
      default value (rto_alpha=3, rto_beta=2). We have a couple
      of users relying on the old permissions before they got
      changed. That said, if someone really has the urge to adjust
      them, we could allow it with a warning in the log.
      
      Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b58537a1
    • David S. Miller's avatar
      Merge branch 'csum_fixes' · e4f7ae93
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      Fixes related to some recent checksum modifications.
      
      - Fix GSO constants to match NETIF flags
      - Fix logic in saving checksum complete in __skb_checksum_complete
      - Call __skb_checksum_complete from UDP if we are checksumming over
        whole packet in order to save checksum.
      - Fixes to VXLAN to work correctly with checksum complete
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4f7ae93
    • Tom Herbert's avatar
      vxlan: Checksum fixes · f79b064c
      Tom Herbert authored
      Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet
      header to work properly with checksum complete.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f79b064c
    • Tom Herbert's avatar
      net: add skb_pop_rcv_encapsulation · e5eb4e30
      Tom Herbert authored
      This function is used by UDP encapsulation protocols in RX when
      crossing encapsulation boundary. If ip_summed is set to
      CHECKSUM_UNNECESSARY and encapsulation is not set, change to
      CHECKSUM_NONE since the checksum has not been validated within the
      encapsulation. Clears csum_valid by the same rationale.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5eb4e30
    • Tom Herbert's avatar
      udp: call __skb_checksum_complete when doing full checksum · bbdff225
      Tom Herbert authored
      In __udp_lib_checksum_complete check if checksum is being done over all
      the data (len is equal to skb->len) and if it is call
      __skb_checksum_complete instead of __skb_checksum_complete_head. This
      allows checksum to be saved in checksum complete.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbdff225
    • Tom Herbert's avatar
      net: Fix save software checksum complete · 46fb51eb
      Tom Herbert authored
      Geert reported issues regarding checksum complete and UDP.
      The logic introduced in commit 7e3cead5
      ("net: Save software checksum complete") is not correct.
      
      This patch:
      1) Restores code in __skb_checksum_complete_header except for setting
         CHECKSUM_UNNECESSARY. This function may be calculating checksum on
         something less than skb->len.
      2) Adds saving checksum to __skb_checksum_complete. The full packet
         checksum 0..skb->len is calculated without adding in pseudo header.
         This value is saved in skb->csum and then the pseudo header is added
         to that to derive the checksum for validation.
      3) In both __skb_checksum_complete_header and __skb_checksum_complete,
         set skb->csum_valid to whether checksum of zero was computed. This
         allows skb_csum_unnecessary to return true without changing to
         CHECKSUM_UNNECESSARY which was done previously.
      4) Copy new csum related bits in __copy_skb_header.
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46fb51eb
    • Tom Herbert's avatar
      net: Fix GSO constants to match NETIF flags · 4b28252c
      Tom Herbert authored
      Joseph Gasparakis reported that VXLAN GSO offload stopped working with
      i40e device after recent UDP changes. The problem is that the
      SKB_GSO_* bits are out of sync with the corresponding NETIF flags. This
      patch fixes that. Also, we add BUILD_BUG_ONs in net_gso_ok for several
      GSO constants that were missing to avoid the problem in the future.
      Reported-by: default avatarJoseph Gasparakis <joseph.gasparakis@intel.com>
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b28252c
    • Linus Torvalds's avatar
      Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · abf04af7
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "This is just a couple of drivers (hpsa and lpfc) that got left out for
        further testing in linux-next.  We also have one fix to a prior
        submission (qla2xxx sparse)"
      
      * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits)
        qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch
        lpfc: Update lpfc version to driver version 10.2.8001.0
        lpfc: Fix ExpressLane priority setup
        lpfc: mark old devices as obsolete
        lpfc: Fix for initializing RRQ bitmap
        lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries
        lpfc: Update lpfc version to driver version 10.2.8000.0
        lpfc: Update Copyright on changed files from 8.3.45 patches
        lpfc: Update Copyright on changed files
        lpfc: Fixed locking for scsi task management commands
        lpfc: Convert runtime references to old xlane cfg param to fof cfg param
        lpfc: Fix FW dump using sysfs
        lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock
        lpfc: Fixed kernel panic in lpfc_abort_handler
        lpfc: Fix locking for postbufq when freeing
        lpfc: Fix locking for lpfc_hba_down_post
        lpfc: Fix dynamic transitions of FirstBurst from on to off
        hpsa: fix handling of hpsa_volume_offline return value
        hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id
        hpsa: remove messages about volume status VPD inquiry page not supported
        ...
      abf04af7
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 16d52ef7
      Linus Torvalds authored
      Pull more btrfs updates from Chris Mason:
       "This has a few fixes since our last pull and a new ioctl for doing
        btree searches from userland.  It's very similar to the existing
        ioctl, but lets us return larger items back down to the app"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: fix error handling in create_pending_snapshot
        btrfs: fix use of uninit "ret" in end_extent_writepage()
        btrfs: free ulist in qgroup_shared_accounting() error path
        Btrfs: fix qgroups sanity test crash or hang
        btrfs: prevent RCU warning when dereferencing radix tree slot
        Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting
        btrfs: new ioctl TREE_SEARCH_V2
        btrfs: tree_search, search_ioctl: direct copy to userspace
        btrfs: new function read_extent_buffer_to_user
        btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW
        btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer
        btrfs: tree_search, search_ioctl: accept varying buffer
        btrfs: tree_search: eliminate redundant nr_items check
      16d52ef7
    • Linus Torvalds's avatar
      Merge git://git.kvack.org/~bcrl/aio-next · a311c480
      Linus Torvalds authored
      Pull aio fix and cleanups from Ben LaHaise:
       "This consists of a couple of code cleanups plus a minor bug fix"
      
      * git://git.kvack.org/~bcrl/aio-next:
        aio: cleanup: flatten kill_ioctx()
        aio: report error from io_destroy() when threads race in io_destroy()
        fs/aio.c: Remove ctx parameter in kiocb_cancel
      a311c480
    • Al Viro's avatar
      fix __swap_writepage() compile failure on old gcc versions · 05064084
      Al Viro authored
      Tetsuo Handa wrote:
       "Commit 62a8067a ("bio_vec-backed iov_iter") introduced an unnamed
        union inside a struct which gcc-4.4.7 cannot handle.  Name the unnamed
         union as u in order to fix build failure"
      
      Let's do this instead: there is only one place in the entire tree that
      steps into this breakage.  Anon structs and unions work in older gcc
      versions; as the matter of fact, we have those in the tree - see e.g.
      struct ieee80211_tx_info in include/net/mac80211.h
      
      What doesn't work is handling their initializers:
      
      struct {
      	int a;
      	union {
      		int b;
      		char c;
      	};
      } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};
      
      is the obvious syntax for initializer, perfectly fine for C11 and
      handled correctly by gcc-4.7 or later.
      
      Earlier versions, though, break on it - declaration is fine and so's
      access to fields (i.e.  x[0].c = 'a'; would produce the right code), but
      members of the anon structs and unions are not inserted into the right
      namespace.  Tellingly, those older versions will not barf on struct {int
      a; struct {int a;};}; - looks like they just have it hacked up somewhere
      around the handling of .  and -> instead of doing the right thing.
      
      The easiest way to deal with that crap is to turn initialization of
      those fields (in the only place where we have such initializer of
      iov_iter) into plain assignment.
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05064084
  5. 14 Jun, 2014 4 commits
  6. 13 Jun, 2014 9 commits
    • Eric Dumazet's avatar
      udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup · 63c6f81c
      Eric Dumazet authored
      Its too easy to add thousand of UDP sockets on a particular bucket,
      and slow down an innocent multicast receiver.
      
      Early demux is supposed to be an optimization, we should avoid spending
      too much time in it.
      
      It is interesting to note __udp4_lib_demux_lookup() only tries to
      match first socket in the chain.
      
      10 is the threshold we already have in __udp4_lib_lookup() to switch
      to secondary hash.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDavid Held <drheld@google.com>
      Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63c6f81c
    • Cong Wang's avatar
      vxlan: use dev->needed_headroom instead of dev->hard_header_len · 2853af6a
      Cong Wang authored
      When we mirror packets from a vxlan tunnel to other device,
      the mirror device should see the same packets (that is, without
      outer header). Because vxlan tunnel sets dev->hard_header_len,
      tcf_mirred() resets mac header back to outer mac, the mirror device
      actually sees packets with outer headers
      
      Vxlan tunnel should set dev->needed_headroom instead of
      dev->hard_header_len, like what other ip tunnels do. This fixes
      the above problem.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: stephen hemminger <stephen@networkplumber.org>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2853af6a
    • Dimitris Michailidis's avatar
      MAINTAINERS: update cxgb4 maintainer · 56f16c74
      Dimitris Michailidis authored
      Hari's been doing the patch submissions for a while now and he'll be
      taking over as maintainer.
      Signed-off-by: default avatarDimitris Michailidis <dm@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56f16c74
    • Andy Lutomirski's avatar
      x86/vdso: Fix vdso_install · a934fb5b
      Andy Lutomirski authored
      "make vdso_install" installs unstripped versions of the vdso objects
      for the benefit of the debugger.  This was broken by checkin:
      
      6f121e54 x86, vdso: Reimplement vdso.so preparation in build-time C
      
      The filenames are different now, so update the Makefile to cope.
      
      This still installs the 64-bit vdso as vdso64.so.  We believe this
      will be okay, as the only known user is a patched gdb which is known
      to use build-ids, but if it turns out to be a problem we may have to
      add a link.
      
      Inspired by a patch from Sam Ravnborg.
      Acked-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Reported-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Tested-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/b10299edd8ba98d17e07dafcd895b8ecf4d99eff.1402586707.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      a934fb5b
    • Dan McLeran's avatar
      NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. · b8e08084
      Dan McLeran authored
      This patch contains several fixes for Scsi START_STOP_UNIT. The previous
      code did not account for signed vs. unsigned arithmetic which resulted
      in an invalid lowest power state caculation when the device only supports
      1 power state.
      
      The code for Power Condition == 2 (Idle) was not following the spec. The
      spec calls for setting the device to specific power states, depending
      upon Power Condition Modifier, without accounting for the number of
      power states supported by the device.
      
      The code for Power Condition == 3 (Standby) was using a hard-coded '0'
      which is replaced with the macro POWER_STATE_0.
      Signed-off-by: default avatarDan McLeran <daniel.mcleran@intel.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      b8e08084
    • Eric Sandeen's avatar
      btrfs: fix error handling in create_pending_snapshot · 47a306a7
      Eric Sandeen authored
      fcebe456 cut and pasted some code to a later point
      in create_pending_snapshot(), but didn't switch
      to the appropriate error handling for this stage
      of the function.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      47a306a7
    • Eric Sandeen's avatar
      btrfs: fix use of uninit "ret" in end_extent_writepage() · 3e2426bd
      Eric Sandeen authored
      If this condition in end_extent_writepage() is false:
      
      	if (tree->ops && tree->ops->writepage_end_io_hook)
      
      we will then test an uninitialized "ret" at:
      
      	ret = ret < 0 ? ret : -EIO;
      
      The test for ret is for the case where ->writepage_end_io_hook
      failed, and we'd choose that ret as the error; but if
      there is no ->writepage_end_io_hook, nothing sets ret.
      
      Initializing ret to 0 should be sufficient; if
      writepage_end_io_hook wasn't set, (!uptodate) means
      non-zero err was passed in, so we choose -EIO in that case.
      Signed-of-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      3e2426bd
    • Eric Sandeen's avatar
      btrfs: free ulist in qgroup_shared_accounting() error path · d7372780
      Eric Sandeen authored
      If tmp = ulist_alloc(GFP_NOFS) fails, we return without
      freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS)
      and cause a memory leak.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      d7372780
    • Filipe Manana's avatar
      Btrfs: fix qgroups sanity test crash or hang · b050f9f6
      Filipe Manana authored
      Often when running the qgroups sanity test, a crash or a hang happened.
      This is because the extent buffer the test uses for the root node doesn't
      have an header level explicitly set, making it have a random level value.
      This is a problem when it's not zero for the btrfs_search_slot() calls
      the test ends up doing, resulting in crashes or hangs such as the following:
      
      [ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on
      (...)
      [ 6454.127760] BTRFS: selftest: Running qgroup tests
      [ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup
      [ 6454.127966] BTRFS: selftest: Qgroup basic add
      [ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383]
      [ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs]
      [ 6480.152005] irq event stamp: 188448
      [ 6480.152005] hardirqs last  enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30
      [ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80
      [ 6480.152005] softirqs last  enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450
      [ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0
      [ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4
      [ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000
      [ 6480.152005] RIP: 0010:[<ffffffff81349a63>]  [<ffffffff81349a63>] __write_lock_failed+0x13/0x20
      [ 6480.152005] RSP: 0018:ffff8800d0d038e8  EFLAGS: 00000287
      [ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852
      [ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8
      [ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb
      [ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858
      [ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0
      [ 6480.152005] FS:  00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000
      [ 6480.152005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0
      [ 6480.152005] Stack:
      [ 6480.152005]  ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8
      [ 6480.152005]  ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007
      [ 6480.152005]  ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16
      [ 6480.152005] Call Trace:
      [ 6480.152005]  [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0
      [ 6480.152005]  [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80
      [ 6480.152005]  [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs]
      [ 6480.152005]  [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs]
      [ 6480.152005]  [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
      [ 6480.152005]  [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs]
      [ 6480.152005]  [<ffffffff811a727f>] ? create_object+0x23f/0x300
      [ 6480.152005]  [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
      [ 6480.152005]  [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs]
      [ 6480.152005]  [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs]
      [ 6480.152005]  [<ffffffff8108cad6>] ? local_clock+0x16/0x30
      [ 6480.152005]  [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs]
      [ 6480.152005]  [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs]
      [ 6480.152005]  [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs]
      [ 6480.152005]  [<ffffffff81000352>] do_one_initcall+0x102/0x150
      [ 6480.152005]  [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50
      [ 6480.152005]  [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74
      [ 6480.152005]  [<ffffffff810d91cc>] load_module+0x1cdc/0x2630
      (...)
      
      Therefore initialize the extent buffer as an empty leaf (level 0).
      
      Issue easy to reproduce when btrfs is built as a module via:
      
          $ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      b050f9f6