1. 07 Nov, 2014 40 commits
    • David S. Miller's avatar
      sparc64: Do not disable interrupts in nmi_cpu_busy() · a4be156d
      David S. Miller authored
      nmi_cpu_busy() is a SMP function call that just makes sure that all of the
      cpus are spinning using cpu cycles while the NMI test runs.
      
      It does not need to disable IRQs because we just care about NMIs executing
      which will even with 'normal' IRQs disabled.
      
      It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
      triggers the BUG check in irq_work_run_list():
      
      	BUG_ON(!irqs_disabled());
      
      Because now irq_work_run() is invoked from the tail of
      generic_smp_call_function_single_interrupt().
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 58556104)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a4be156d
    • Bryan O'Donoghue's avatar
      usb: pch_udc: usb gadget device support for Intel Quark X1000 · 80007683
      Bryan O'Donoghue authored
      This patch is to enable the USB gadget device for Intel Quark X1000
      Signed-off-by: default avatarBryan O'Donoghue <bryan.odonoghue@intel.com>
      Signed-off-by: default avatarBing Niu <bing.niu@intel.com>
      Signed-off-by: default avatarAlvin (Weike) Chen <alvin.chen@intel.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      
      (cherry picked from commit a68df706)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      80007683
    • Steffen Klassert's avatar
      xfrm: Generate blackhole routes only from route lookup functions · a577c465
      Steffen Klassert authored
      Currently we genarate a blackhole route route whenever we have
      matching policies but can not resolve the states. Here we assume
      that dst_output() is called to kill the balckholed packets.
      Unfortunately this assumption is not true in all cases, so
      it is possible that these packets leave the system unwanted.
      
      We fix this by generating blackhole routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      
      (cherry picked from commit f92ee619)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a577c465
    • Vlad Yasevich's avatar
      tg3: Work around HW/FW limitations with vlan encapsulated frames · ca657c2a
      Vlad Yasevich authored
      TG3 appears to have an issue performing TSO and checksum offloading
      correclty when the frame has been vlan encapsulated (non-accelrated).
      In these cases, tcp checksum is not correctly updated.
      
      This patch attempts to work around this issue.  After the patch,
      802.1ad vlans start working correctly over tg3 devices.
      
      CC: Prashant Sreedharan <prashant@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 476c1885)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ca657c2a
    • David S. Miller's avatar
      sparc64: Implement __get_user_pages_fast(). · 20c887cf
      David S. Miller authored
      It is not sufficient to only implement get_user_pages_fast(), you
      must also implement the atomic version __get_user_pages_fast()
      otherwise you end up using the weak symbol fallback implementation
      which simply returns zero.
      
      This is dangerous, because it causes the futex code to loop forever
      if transparent hugepages are supported (see get_futex_key()).
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 06090e8e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      20c887cf
    • Dave Kleikamp's avatar
      sparc64: Increase size of boot string to 1024 bytes · 723366c9
      Dave Kleikamp authored
      This is the longest boot string that silo supports.
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 1cef94c3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      723366c9
    • David S. Miller's avatar
      sparc64: Adjust KTSB assembler to support larger physical addresses. · d9e308a8
      David S. Miller authored
      As currently coded the KTSB accesses in the kernel only support up to
      47 bits of physical addressing.
      
      Adjust the instruction and patching sequence in order to support
      arbitrary 64 bits addresses.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      
      (cherry picked from commit 8c82dc0e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d9e308a8
    • bob picco's avatar
      sparc64: T5 PMU · 12d05c9b
      bob picco authored
      The T5 (niagara5) has different PCR related HV fast trap values and a new
      HV API Group. This patch utilizes these and shares when possible with niagara4.
      
      We use the same sparc_pmu niagara4_pmu. Should there be new effort to
      obtain the MCU perf statistics then this would have to be changed.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 05aa1651)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      12d05c9b
    • David S. Miller's avatar
      sparc64: Do not define thread fpregs save area as zero-length array. · 05946379
      David S. Miller authored
      This breaks the stack end corruption detection facility.
      
      What that facility does it write a magic value to "end_of_stack()"
      and checking to see if it gets overwritten.
      
      "end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
      the beginning of the FPU register save area.
      
      So once the user uses the FPU, the magic value is overwritten and the
      debug checks trigger.
      
      Fix this by making the size explicit.
      
      Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
      are limited to 7 levels of FPU state saves.  So each FPU register set
      is 256 bytes, allocate 256 * 7 for the fpregs area.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit e2653143)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      05946379
    • David S. Miller's avatar
      sparc64: Fix FPU register corruption with AES crypto offload. · 7aba3ba3
      David S. Miller authored
      The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
      key material is preloaded into the FPU registers, and then we loop
      over and over doing the crypt operation, reusing those pre-cooked key
      registers.
      
      There are intervening blkcipher*() calls between the crypt operation
      calls.  And those might perform memcpy() and thus also try to use the
      FPU.
      
      The sparc64 kernel FPU usage mechanism is designed to allow such
      recursive uses, but with a catch.
      
      There has to be a trap between the two FPU using threads of control.
      
      The mechanism works by, when the FPU is already in use by the kernel,
      allocating a slot for FPU saving at trap time.  Then if, within the
      trap handler, we try to use the FPU registers, the pre-trap FPU
      register state is saved into the slot.  Then at trap return time we
      notice this and restore the pre-trap FPU state.
      
      Over the long term there are various more involved ways we can make
      this work, but for a quick fix let's take advantage of the fact that
      the situation where this happens is very limited.
      
      All sparc64 chips that support the crypto instructiosn also are using
      the Niagara4 memcpy routine, and that routine only uses the FPU for
      large copies where we can't get the source aligned properly to a
      multiple of 8 bytes.
      
      We look to see if the FPU is already in use in this context, and if so
      we use the non-large copy path which only uses integer registers.
      
      Furthermore, we also limit this special logic to when we are doing
      kernel copy, rather than a user copy.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit f4da3628)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7aba3ba3
    • David S. Miller's avatar
      sparc64: Fix lockdep warnings on reboot on Ultra-5 · 567e5998
      David S. Miller authored
      Inconsistently, the raw_* IRQ routines do not interact with and update
      the irqflags tracing and lockdep state, whereas the raw_* spinlock
      interfaces do.
      
      This causes problems in p1275_cmd_direct() because we disable hardirqs
      by hand using raw_local_irq_restore() and then do a raw_spin_lock()
      which triggers a lockdep trace because the CPU's hw IRQ state doesn't
      match IRQ tracing's internal software copy of that state.
      
      The CPU's irqs are disabled, yet current->hardirqs_enabled is true.
      
      ====================
      reboot: Restarting system
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240()
      DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      Modules linked in: openpromfs
      CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G        W      3.17.0-dirty #145
      Call Trace:
       [000000000045919c] warn_slowpath_common+0x5c/0xa0
       [0000000000459210] warn_slowpath_fmt+0x30/0x40
       [000000000048f41c] check_flags+0x7c/0x240
       [0000000000493280] lock_acquire+0x20/0x1c0
       [0000000000832b70] _raw_spin_lock+0x30/0x60
       [000000000068f2fc] p1275_cmd_direct+0x1c/0x60
       [000000000068ed28] prom_reboot+0x28/0x40
       [000000000043610c] machine_restart+0x4c/0x80
       [000000000047d2d4] kernel_restart+0x54/0x80
       [000000000047d618] SyS_reboot+0x138/0x200
       [00000000004060b4] linux_sparc_syscall32+0x34/0x60
      ---[ end trace 5c439fe81c05a100 ]---
      possible reason: unannotated irqs-off.
      irq event stamp: 2010267
      hardirqs last  enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580
      hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580
      softirqs last  enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0
      softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40
      Resetting ...
      ====================
      
      Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees
      all of our changes.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit bdcf81b6)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      567e5998
    • David S. Miller's avatar
      sparc64: Fix reversed start/end in flush_tlb_kernel_range() · 9a39deda
      David S. Miller authored
      When we have to split up a flush request into multiple pieces
      (in order to avoid the firmware range) we don't specify the
      arguments in the right order for the second piece.
      
      Fix the order, or else we get hangs as the code tries to
      flush "a lot" of entries and we get lockups like this:
      
      [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
      [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
      [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
      [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
      [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
      [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
      [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
      [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
      [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
      [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
      [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
      [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
      [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
      [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
      [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
      [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
      [ 4423.234193] Call Trace:
      [ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
      [ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
      [ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
      [ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
      [ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
      [ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
      [ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
      [ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
      [ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
      [ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
      [ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
      [ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
      [ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c
      
      Fixes: 4ca9a237 ("sparc64: Guard against flushing openfirmware mappings.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 473ad7f4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9a39deda
    • Andreas Larsson's avatar
      sparc: Let memset return the address argument · 4b1b701b
      Andreas Larsson authored
      This makes memset follow the standard (instead of returning 0 on success). This
      is needed when certain versions of gcc optimizes around memset calls and assume
      that the address argument is preserved in %o0.
      Signed-off-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 74cad25c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4b1b701b
    • bob picco's avatar
      sparc64: find_node adjustment · efee7787
      bob picco authored
      We have seen an issue with guest boot into LDOM that causes early boot failures
      because of no matching rules for node identitity of the memory. I analyzed this
      on my T4 and concluded there might not be a solution. I saw the issue in
      mainline too when booting into the control/primary domain - with guests
      configured.  Note, this could be a firmware bug on some older machines.
      
      I'll provide a full explanation of the issues below. Should we not find a
      matching BEST latency group for a real address (RA) then we will assume node 0.
      On the T4-2 here with the information provided I can't see an alternative.
      
      Technically the LDOM shown below should match the MBLOCK to the
      favorable latency group. However other factors must be considered too. Were
      the memory controllers configured "fine" grained interleave or "coarse"
      grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
      node?
      
      There has to be at least one Machine Description (MD) "group" and hence one
      NUMA node. The group can have one or more latency groups (lg) - more than one
      memory controller. The current code chooses the smallest latency as the most
      favorable per group. The latency and lg information is in MLGROUP below.
      MBLOCK is the base and size of the RAs for the machine as fetched from OBP
      /memory "available" property. My machine has one MBLOCK but more would be
      possible - with holes?
      
      For a T4-2 the following information has been gathered:
      with LDOM guest
      MEMBLOCK configuration:
       memory size = 0x27f870000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
       memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
       memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
      MBLOCK[0]: base[20000000] size[280000000] offset[0]
      (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
      (note: (RA + offset) & mask = val is the formula to detect a match for the
      memory controller. should there be no match for find_node node, a return
      value of -1 resulted for the node - BAD)
      
      There is one group. It has these forward links
      MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
      MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
      NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
      (note: "val" is the best lg's (smallest latency) "match")
      
      no LDOM guest - bare metal
      MEMBLOCK configuration:
       memory size = 0xfdf2d0000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
       memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
       memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
      MBLOCK[0]: base[20000000] size[fe0000000] offset[0]
      
      there are two groups
      group node[16d5]
      MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
      MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
      NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
      group node[171d]
      MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
      MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
      NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
      (note: for this two "group" bare metal machine, 1/2 memory is in group one's
      lg and 1/2 memory is in group two's lg).
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 3dee9df5)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      efee7787
    • David S. Miller's avatar
      sparc64: Fix corrupted thread fault code. · 4c538a30
      David S. Miller authored
      Every path that ends up at do_sparc64_fault() must install a valid
      FAULT_CODE_* bitmask in the per-thread fault code byte.
      
      Two paths leading to the label winfix_trampoline (which expects the
      FAULT_CODE_* mask in register %g4) were not doing so:
      
      1) For pre-hypervisor TLB protection violation traps, if we took
         the 'winfix_trampoline' path we wouldn't have %g4 initialized
         with the FAULT_CODE_* value yet.  Resulting in using the
         TLB_TAG_ACCESS register address value instead.
      
      2) In the TSB miss path, when we notice that we are going to use a
         hugepage mapping, but we haven't allocated the hugepage TSB yet, we
         still have to take the window fixup case into consideration and
         in that particular path we leave %g4 not setup properly.
      
      Errors on this sort were largely invisible previously, but after
      commit 4ccb9272 ("sparc64: sun4v TLB
      error power off events") we now have a fault_code mask bit
      (FAULT_CODE_BAD_RA) that triggers due to this bug.
      
      FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
      (see #1 above) and thus we get seemingly random bus errors triggered
      for user processes.
      
      Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 84bd6d8b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4c538a30
    • bob picco's avatar
      sparc64: sun4v TLB error power off events · dda836fb
      bob picco authored
      [ Upstream commit 4ccb9272 ]
      
      We've witnessed a few TLB events causing the machine to power off because
      of prom_halt. In one case it was some nfs related area during rmmod. Another
      was an mmapper of /dev/mem. A more recent one is an ITLB issue with
      a bad pagesize which could be a hardware bug. Bugs happen but we should
      attempt to not power off the machine and/or hang it when possible.
      
      This is a DTLB error from an mmapper of /dev/mem:
      [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
      SUN4V-DTLB: TPC<0xfffff80100903e6c>
      SUN4V-DTLB: O7[fffff801081979d0]
      SUN4V-DTLB: O7<0xfffff801081979d0>
      SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
      .
      
      This is recent mainline for ITLB:
      [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
      [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
      [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
      [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
      .
      
      Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
      prom_halt() and drop us to OF command prompt "ok". This isn't the case for
      LDOMs and the machine powers off.
      
      For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
      a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
      of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
      one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().
      
      The logic of this patch was partially inspired by David Miller's feedback.
      
      Power off of large sparc64 machines is painful. Plus die_if_kernel provides
      more context. A reset sequence isn't a brief period on large sparc64 but
      better than power-off/power-on sequence.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      (cherry picked from commit ebc43a92)
      dda836fb
    • Daniel Hellstrom's avatar
      sparc32: dma_alloc_coherent must honour gfp flags · 08f002c2
      Daniel Hellstrom authored
      dma_zalloc_coherent() calls dma_alloc_coherent(__GFP_ZERO)
      but the sparc32 implementations sbus_alloc_coherent() and
      pci32_alloc_coherent() doesn't take the gfp flags into
      account.
      
      Tested on the SPARC32/LEON GRETH Ethernet driver which fails
      due to dma_alloc_coherent(__GFP_ZERO) returns non zeroed
      pages.
      Signed-off-by: default avatarDaniel Hellstrom <daniel@gaisler.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit d1105287)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      08f002c2
    • Dmitry Kasatkin's avatar
      ima: fix fallback to use new_sync_read() · 46ec1781
      Dmitry Kasatkin authored
      3.16 commit aad4f8bb
      'switch simple generic_file_aio_read() users to ->read_iter()'
      replaced ->aio_read with ->read_iter in most of the file systems
      and introduced new_sync_read() as a replacement for do_sync_read().
      
      Most of file systems set '->read' and ima_kernel_read is not affected.
      When ->read is not set, this patch adopts fallback call changes from the
      vfs_read.
      Signed-off-by: default avatarDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>  3.16+
      
      (cherry picked from commit 27cd1fc3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      46ec1781
    • Catalin Marinas's avatar
      futex: Ensure get_futex_key_refs() always implies a barrier · 4a015d59
      Catalin Marinas authored
      Commit b0c29f79 (futexes: Avoid taking the hb->lock if there's
      nothing to wake up) changes the futex code to avoid taking a lock when
      there are no waiters. This code has been subsequently fixed in commit
      11d4616b (futex: revert back to the explicit waiter counting code).
      Both the original commit and the fix-up rely on get_futex_key_refs() to
      always imply a barrier.
      
      However, for private futexes, none of the cases in the switch statement
      of get_futex_key_refs() would be hit and the function completes without
      a memory barrier as required before checking the "waiters" in
      futex_wake() -> hb_waiters_pending(). The consequence is a race with a
      thread waiting on a futex on another CPU, allowing the waker thread to
      read "waiters == 0" while the waiter thread to have read "futex_val ==
      locked" (in kernel).
      
      Without this fix, the problem (user space deadlocks) can be seen with
      Android bionic's mutex implementation on an arm64 multi-cluster system.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarMatteo Franchin <Matteo.Franchin@arm.com>
      Fixes: b0c29f79 (futexes: Avoid taking the hb->lock if there's nothing to wake up)
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 76835b0e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4a015d59
    • Sasha Levin's avatar
      kernel: add support for gcc 5 · 5407784b
      Sasha Levin authored
      We're missing include/linux/compiler-gcc5.h which is required now
      because gcc branched off to v5 in trunk.
      
      Just copy the relevant bits out of include/linux/compiler-gcc4.h,
      no new code is added as of now.
      
      This fixes a build error when using gcc 5.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 71458cfc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5407784b
    • Yann Droneaud's avatar
      fanotify: enable close-on-exec on events' fd when requested in fanotify_init() · a382dac6
      Yann Droneaud authored
      According to commit 80af2588 ("fanotify: groups can specify their
      f_flags for new fd"), file descriptors created as part of file access
      notification events inherit flags from the event_f_flags argument passed
      to syscall fanotify_init(2)[1].
      
      Unfortunately O_CLOEXEC is currently silently ignored.
      
      Indeed, event_f_flags are only given to dentry_open(), which only seems to
      care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
      open_check_o_direct() and O_LARGEFILE in generic_file_open().
      
      It's a pity, since, according to some lookup on various search engines and
      http://codesearch.debian.net/, there's already some userspace code which
      use O_CLOEXEC:
      
      - in systemd's readahead[2]:
      
          fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
      
      - in clsync[3]:
      
          #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
      
          int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
      
      - in examples [4] from "Filesystem monitoring in the Linux
        kernel" article[5] by Aleksander Morgado:
      
          if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                            O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
      
      Additionally, since commit 48149e9d ("fanotify: check file flags
      passed in fanotify_init").  having O_CLOEXEC as part of fanotify_init()
      second argument is expressly allowed.
      
      So it seems expected to set close-on-exec flag on the file descriptors if
      userspace is allowed to request it with O_CLOEXEC.
      
      But Andrew Morton raised[6] the concern that enabling now close-on-exec
      might break existing applications which ask for O_CLOEXEC but expect the
      file descriptor to be inherited across exec().
      
      In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
      descriptor returned as part of file access notify can break applications
      due to deadlock.  So close-on-exec is needed for most applications.
      
      More, applications asking for close-on-exec are likely expecting it to be
      enabled, relying on O_CLOEXEC being effective.  If not, it might weaken
      their security, as noted by Jan Kara[8].
      
      So this patch replaces call to macro get_unused_fd() by a call to function
      get_unused_fd_flags() with event_f_flags value as argument.  This way
      O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
      interpreted and close-on-exec get enabled when requested.
      
      [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
      [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
      [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
          https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
      [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
      [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
      [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
      [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
      [8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz
      
      Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.comSigned-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Tested-by: default avatarHeinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
      Cc: Pádraig Brady <P@draigBrady.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
      Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 0b37e097)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a382dac6
    • Junxiao Bi's avatar
      ocfs2: fix a deadlock while o2net_wq doing direct memory reclaim · a67333d8
      Junxiao Bi authored
      Fix a deadlock problem caused by direct memory reclaim in o2net_wq.  The
      situation is as follows:
      
      1) Receive a connect message from another node, node queues a
         work_struct o2net_listen_work.
      
      2) o2net_wq processes this work and call the following functions:
      
      o2net_wq
      -> o2net_accept_one
        -> sock_create_lite
          -> sock_alloc()
            -> kmem_cache_alloc with GFP_KERNEL
              -> ____cache_alloc_node
                ->__alloc_pages_nodemask
                  -> do_try_to_free_pages
                    -> shrink_slab
                      -> evict
                        -> ocfs2_evict_inode
                          -> ocfs2_drop_lock
                            -> dlmunlock
                              -> o2net_send_message_vec
      
         then o2net_wq wait for the unlock reply from master.
      
      3) tcp layer received the reply, call o2net_data_ready() and queue
         sc_rx_work, waiting o2net_wq to process this work.
      
      4) o2net_wq is a single thread workqueue, it process the work one by
         one.  Right now it is still doing o2net_listen_work and cannot handle
         sc_rx_work.  so we deadlock.
      
      Junxiao Bi's patch "mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set"
      (http://ozlabs.org/~akpm/mmots/broken-out/mm-clear-__gfp_fs-when-pf_memalloc_noio-is-set.patch)
      clears __GFP_FS in memalloc_noio_flags() besides __GFP_IO.  We use
      memalloc_noio_save() to set process flag PF_MEMALLOC_NOIO so that all
      allocations done by this process are done as if GFP_NOIO was specified.
      We are not reentering filesystem while doing memory reclaim.
      Signed-off-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
      
      commit 21caf2fc ("mm: teach mm by current context info to not do I/O
      during memory allocation") introduces PF_MEMALLOC_NOIO flag to avoid doing
      I/O inside memory allocation, __GFP_IO is cleared when this flag is set,
      but __GFP_FS implies __GFP_IO, it should also be cleared.  Or it may still
      run into I/O, like in superblock shrinker.  And this will make the kernel
      run into the deadlock case described in that commit.
      
      See Dave Chinner's comment about io in superblock shrinker:
      
      Filesystem shrinkers do indeed perform IO from the superblock shrinker and
      have for years.  Even clean inodes can require IO before they can be freed
      - e.g.  on an orphan list, need truncation of post-eof blocks, need to
      wait for ordered operations to complete before it can be freed, etc.
      
      IOWs, Ext4, btrfs and XFS all can issue and/or block on arbitrary amounts
      of IO in the superblock shrinker context.  XFS, in particular, has been
      doing transactions and IO from the VFS inode cache shrinker since it was
      first introduced....
      
      Fix this by clearing __GFP_FS in memalloc_noio_flags(), this function has
      masked all the gfp_mask that will be passed into fs for the processes
      setting PF_MEMALLOC_NOIO in the direct reclaim path.
      
      v1 thread at: https://lkml.org/lkml/2014/9/3/32Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: joyce.xue <xuejiufei@huawei.com>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit b246d3d1
      934f3072)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a67333d8
    • Champion Chen's avatar
      Bluetooth: Fix issue with USB suspend in btusb driver · 5dda54e2
      Champion Chen authored
      Suspend could fail for some platforms because
      btusb_suspend==> btusb_stop_traffic ==> usb_kill_anchored_urbs.
      
      When btusb_bulk_complete returns before system suspend and resubmits
      an URB, the system cannot enter suspend state.
      Signed-off-by: default avatarChampion Chen <champion_chen@realsil.com.cn>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 85560c4a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5dda54e2
    • Loic Poulain's avatar
      Bluetooth: Fix HCI H5 corrupted ack value · 8c58923d
      Loic Poulain authored
      In this expression: seq = (seq - 1) % 8
      seq (u8) is implicitly converted to an int in the arithmetic operation.
      So if seq value is 0, operation is ((0 - 1) % 8) => (-1 % 8) => -1.
      The new seq value is 0xff which is an invalid ACK value, we expect 0x07.
      It leads to frequent dropped ACK and retransmission.
      Fix this by using '&' binary operator instead of '%'.
      Signed-off-by: default avatarLoic Poulain <loic.poulain@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 4807b518)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8c58923d
    • Stanislaw Gruszka's avatar
      rt2800: correct BBP1_TX_POWER_CTRL mask · dd84a854
      Stanislaw Gruszka authored
      Two bits control TX power on BBP_R1 register. Correct the mask,
      otherwise we clear additional bit on BBP_R1 register, what can have
      unknown, possible negative effect.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      
      (cherry picked from commit 01f7feea)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dd84a854
    • Ricardo Ribalda Delgado's avatar
      PCI: Generate uppercase hex for modalias interface class · b9779663
      Ricardo Ribalda Delgado authored
      Some implementations of modprobe fail to load the driver for a PCI device
      automatically because the "interface" part of the modalias from the kernel
      is lowercase, and the modalias from file2alias is uppercase.
      
      The "interface" is the low-order byte of the Class Code, defined in PCI
      r3.0, Appendix D.  Most interface types defined in the spec do not use
      alpha characters, so they won't be affected.  For example, 00h, 01h, 10h,
      20h, etc. are unaffected.
      
      Print the "interface" byte of the Class Code in uppercase hex, as we
      already do for the Vendor ID, Device ID, Class, etc.
      
      [bhelgaas: changelog]
      Signed-off-by: default avatarRicardo Ribalda Delgado <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      CC: stable@vger.kernel.org
      
      (cherry picked from commit 89ec3dcf)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b9779663
    • Douglas Lehr's avatar
      PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size · 682043e4
      Douglas Lehr authored
      The Crocodile chip occasionally comes up with 4k and 8k BAR sizes.  Due to
      an erratum, setting the SR-IOV page size causes the physical function BARs
      to expand to the system page size.  Since ppc64 uses 64k pages, when Linux
      tries to assign the smaller resource sizes to the now 64k BARs the address
      will be truncated and the BARs will overlap.
      
      Force Linux to allocate the resource as a full page, which avoids the
      overlap.
      
      [bhelgaas: print expanded resource, too]
      Signed-off-by: default avatarDouglas Lehr <dllehr@us.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarMilton Miller <miltonm@us.ibm.com>
      CC: stable@vger.kernel.org
      
      (cherry picked from commit 9fe373f9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      682043e4
    • Yinghai Lu's avatar
      PCI: Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() · ff384652
      Yinghai Lu authored
      In 5b285415 ("PCI: Restrict 64-bit prefetchable bridge windows to
      64-bit resources"), we added IORESOURCE_MEM_64 to the mask in
      pci_assign_unassigned_root_bus_resources(), but not to the mask in
      pci_assign_unassigned_bridge_resources().
      
      Add IORESOURCE_MEM_64 to the pci_assign_unassigned_bridge_resources() type
      mask.
      
      Fixes: 5b285415 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v3.16+
      
      (cherry picked from commit d61b0e87)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ff384652
    • Dave Chinner's avatar
      xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly · 161958b4
      Dave Chinner authored
      XFS has been having trouble with stray delayed allocation extents
      beyond EOF for a long time. Recent changes to the collapse range
      code has triggered erroneous EBUSY errors on page invalidtion for
      block size smaller than page size filesystems. These
      have been caused by dirty buffers beyond EOF on a partial page which
      do not get written to disk during a sync.
      
      The issue is that write-ahead in xfs_cluster_write() finds such a
      partial page and handles it by leaving the page dirty but pushing it
      into a writeback state. This used to work just fine, as the
      write_cache_pages() code would then find the dirty partial page in
      the next mapping tree lookup as the dirty tag is still set.
      
      Unfortunately, when we moved to a mark and sweep approach to
      writeback to fix other writeback sync issues, we broken this. THe
      act of marking the page as under writeback now clears the TOWRITE
      tag in the radix tree, even though the page is still dirty. This
      causes the TOWRITE tag to be cleared, and hence the next lookup on
      the mapping tree does not find the dirty partial page and so doesn't
      try to write it again.
      
      This same writeback bug was found recently in ext4 and fixed in
      commit 1c8349a1 ("ext4: fix data integrity sync in ordered mode")
      without communication to the wider filesystem community. We can use
      exactly the same fix here so the TOWRITE flag is not cleared on
      partial page writes.
      
      cc: stable@vger.kernel.org # dependent on 1c8349a1Root-cause-found-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      (cherry picked from commit 0d085a52)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      161958b4
    • Chao Yu's avatar
      ecryptfs: avoid to access NULL pointer when write metadata in xattr · 95fd0ed8
      Chao Yu authored
      Christopher Head 2014-06-28 05:26:20 UTC described:
      "I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
      in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      PGD d7840067 PUD b2c3c067 PMD 0
      Oops: 0002 [#1] SMP
      Modules linked in: nvidia(PO)
      CPU: 3 PID: 3566 Comm: bash Tainted: P           O 3.12.21-gentoo-r1 #2
      Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
      task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
      RIP: 0010:[<ffffffff8110eb39>]  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP: 0018:ffff8800bad71c10  EFLAGS: 00010246
      RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
      RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
      RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
      R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
      FS:  00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
      Stack:
      ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
      ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
      00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
      Call Trace:
      [<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
      [<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
      [<ffffffff81082c2c>] ? should_resched+0x5/0x23
      [<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
      [<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
      [<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
      [<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
      [<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
      [<ffffffff810fa2a3>] ? path_openat+0x224/0x472
      [<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
      [<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
      [<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
      [<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
      RIP  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP <ffff8800bad71c10>
      CR2: 0000000000000000
      ---[ end trace df9dba5f1ddb8565 ]---"
      
      If we create a file when we mount with ecryptfs_xattr_metadata option, we will
      encounter a crash in this path:
      ->ecryptfs_create
        ->ecryptfs_initialize_file
          ->ecryptfs_write_metadata
            ->ecryptfs_write_metadata_to_xattr
              ->ecryptfs_setxattr
                ->fsstack_copy_attr_all
      It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
      will be initialized when ecryptfs_initialize_file finish.
      
      So we should skip copying attr from lower inode when the value of ->d_inode is
      invalid.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Cc: stable@vger.kernel.org # v3.2+: b59db43a eCryptfs: Prevent file create race condition
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      
      (cherry picked from commit 35425ea2)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      95fd0ed8
    • Takashi Iwai's avatar
      ALSA: emu10k1: Fix deadlock in synth voice lookup · a027d722
      Takashi Iwai authored
      The emu10k1 voice allocator takes voice_lock spinlock.  When there is
      no empty stream available, it tries to release a voice used by synth,
      and calls get_synth_voice.  The callback function,
      snd_emu10k1_synth_get_voice(), however, also takes the voice_lock,
      thus it deadlocks.
      
      The fix is simply removing the voice_lock holds in
      snd_emu10k1_synth_get_voice(), as this is always called in the
      spinlock context.
      Reported-and-tested-by: default avatarArthur Marsh <arthur.marsh@internode.on.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit 95926035)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a027d722
    • Anatol Pomozov's avatar
      ALSA: pcm: use the same dma mmap codepath both for arm and arm64 · f90bea5a
      Anatol Pomozov authored
      This avoids following kernel crash when try to playback on arm64
      
      [  107.497203] [<ffffffc00046b310>] snd_pcm_mmap_data_fault+0x90/0xd4
      [  107.503405] [<ffffffc0001541ac>] __do_fault+0xb0/0x498
      [  107.508565] [<ffffffc0001576a0>] handle_mm_fault+0x224/0x7b0
      [  107.514246] [<ffffffc000092640>] do_page_fault+0x11c/0x310
      [  107.519738] [<ffffffc000081100>] do_mem_abort+0x38/0x98
      
      Tested: backported to 3.14 and tried to playback on arm64 machine
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit a011e213)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f90bea5a
    • Andy Shevchenko's avatar
      spi: dw-mid: terminate ongoing transfers at exit · 2c19514d
      Andy Shevchenko authored
      Do full clean up at exit, means terminate all ongoing DMA transfers.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 8e45ef68)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2c19514d
    • Andy Adamson's avatar
      NFSv4.1: Fix an NFSv4.1 state renewal regression · 51033015
      Andy Adamson authored
      Commit 2f60ea6b ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
      not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
      call, on the wire to renew the NFSv4.1 state if the flag was not set.
      
      The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
      (cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
      sometimes does the following:
      
      In normal operation, the only way a future state renewal call is put on the
      wire is via a call to nfs4_schedule_state_renewal, which schedules a
      nfs4_renew_state workqueue task. nfs4_renew_state determines if the
      NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
      which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
      Then the nfs41_proc_async_sequence rpc_release function schedules
      another state remewal via nfs4_schedule_state_renewal.
      
      Without this change we can get into a state where an application stops
      accessing the NFSv4.1 share, state renewal calls stop due to the
      NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
      from this situation is with a clientid re-establishment, once the application
      resumes and the server has timed out the lease and so returns
      NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.
      
      An example application:
      open, lock, write a file.
      
      sleep for 6 * lease (could be less)
      
      ulock, close.
      
      In the above example with NFSv4.1 delegations enabled, without this change,
      there are no OP_SEQUENCE state renewal calls during the sleep, and the
      clientid is recovered due to lease expiration on the close.
      
      This issue does not occur with NFSv4.1 delegations disabled, nor with
      NFSv4.0, with or without delegations enabled.
      Signed-off-by: default avatarAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
      Fixes: 2f60ea6b (NFSv4: The NFSv4.0 client must send RENEW calls...)
      Cc: stable@vger.kernel.org # 3.2.x
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit d1f456b0)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      51033015
    • Trond Myklebust's avatar
      NFSv4: fix open/lock state recovery error handling · d9e73039
      Trond Myklebust authored
      The current open/lock state recovery unfortunately does not handle errors
      such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
      just proceeds as if the state manager is finished recovering.
      This patch ensures that we loop back, handle higher priority errors
      and complete the open/lock state recovery.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit df817ba3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d9e73039
    • Trond Myklebust's avatar
      NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails · e6e258df
      Trond Myklebust authored
      If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
      CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
      a second time, then the client will currently take this to mean that it must
      declare all locks to be stale, and hence ineligible for reboot recovery.
      
      RFC3530 and RFC5661 both suggest that the client should instead rely on the
      server to respond to inelegible open share, lock and delegation reclaim
      requests with NFS4ERR_NO_GRACE in this situation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit a4339b7b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e6e258df
    • Willy Tarreau's avatar
      lzo: check for length overrun in variable length encoding. · 8b5e1a97
      Willy Tarreau authored
      This fix ensures that we never meet an integer overflow while adding
      255 while parsing a variable length encoding. It works differently from
      commit 206a81c1 ("lzo: properly check for overruns") because instead of
      ensuring that we don't overrun the input, which is tricky to guarantee
      due to many assumptions in the code, it simply checks that the cumulated
      number of 255 read cannot overflow by bounding this number.
      
      The MAX_255_COUNT is the maximum number of times we can add 255 to a base
      count without overflowing an integer. The multiply will overflow when
      multiplying 255 by more than MAXINT/255. The sum will overflow earlier
      depending on the base count. Since the base count is taken from a u8
      and a few bits, it is safe to assume that it will always be lower than
      or equal to 2*255, thus we can always prevent any overflow by accepting
      two less 255 steps.
      
      This patch also reduces the CPU overhead and actually increases performance
      by 1.1% compared to the initial code, while the previous fix costs 3.1%
      (measured on x86_64).
      
      The fix needs to be backported to all currently supported stable kernels.
      Reported-by: default avatarWillem Pinckaers <willem@lekkertech.net>
      Cc: "Don A. Bailey" <donb@securitymouse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 72cf9012)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8b5e1a97
    • Willy Tarreau's avatar
      Revert "lzo: properly check for overruns" · 85156b42
      Willy Tarreau authored
      This reverts commit 206a81c1 ("lzo: properly check for overruns").
      
      As analysed by Willem Pinckaers, this fix is still incomplete on
      certain rare corner cases, and it is easier to restart from the
      original code.
      Reported-by: default avatarWillem Pinckaers <willem@lekkertech.net>
      Cc: "Don A. Bailey" <donb@securitymouse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit af958a38)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      85156b42
    • Willy Tarreau's avatar
      Documentation: lzo: document part of the encoding · 16fc81f9
      Willy Tarreau authored
      Add a complete description of the LZO format as processed by the
      decompressor. I have not found a public specification of this format
      hence this analysis, which will be used to better understand the code.
      
      Cc: Willem Pinckaers <willem@lekkertech.net>
      Cc: "Don A. Bailey" <donb@securitymouse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit d98a0526)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      16fc81f9
    • Geert Uytterhoeven's avatar
      m68k: Disable/restore interrupts in hwreg_present()/hwreg_write() · e4b3c181
      Geert Uytterhoeven authored
      hwreg_present() and hwreg_write() temporarily change the VBR register to
      another vector table. This table contains a valid bus error handler
      only, all other entries point to arbitrary addresses.
      
      If an interrupt comes in while the temporary table is active, the
      processor will start executing at such an arbitrary address, and the
      kernel will crash.
      
      While most callers run early, before interrupts are enabled, or
      explicitly disable interrupts, Finn Thain pointed out that macsonic has
      one callsite that doesn't, causing intermittent boot crashes.
      There's another unsafe callsite in hilkbd.
      
      Fix this for good by disabling and restoring interrupts inside
      hwreg_present() and hwreg_write().
      
      Explicitly disabling interrupts can be removed from the callsites later.
      Reported-by: default avatarFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit e4dc601b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e4b3c181