1. 18 Jul, 2016 3 commits
    • Al Viro's avatar
      fix d_walk()/non-delayed __d_free() race · 760bd1ad
      Al Viro authored
      commit 3d56c25e upstream.
      
      Ascend-to-parent logics in d_walk() depends on all encountered child
      dentries not getting freed without an RCU delay.  Unfortunately, in
      quite a few cases it is not true, with hard-to-hit oopsable race as
      the result.
      
      Fortunately, the fix is simiple; right now the rule is "if it ever
      been hashed, freeing must be delayed" and changing it to "if it
      ever had a parent, freeing must be delayed" closes that hole and
      covers all cases the old rule used to cover.  Moreover, pipes and
      sockets remain _not_ covered, so we do not introduce RCU delay in
      the cases which are the reason for having that delay conditional
      in the first place.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      760bd1ad
    • Prasun Maiti's avatar
      wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel · 686a230a
      Prasun Maiti authored
      commit 3d5fdff4 upstream.
      
      iwpriv app uses iw_point structure to send data to Kernel. The iw_point
      structure holds a pointer. For compatibility Kernel converts the pointer
      as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers
      may use iw_handler_def.private_args to populate iwpriv commands instead
      of iw_handler_def.private. For those case, the IOCTLs from
      SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl().
      Accordingly when the filled up iw_point structure comes from 32 bit
      iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends
      it to driver. So, the driver may get the invalid data.
      
      The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to
      SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory.
      This patch adds pointer conversion from 32 bit to 64 bit and vice versa,
      if the ioctl comes from 32 bit iwpriv to 64 bit Kernel.
      Signed-off-by: default avatarPrasun Maiti <prasunmaiti87@gmail.com>
      Signed-off-by: default avatarUjjal Roy <royujjal@gmail.com>
      Tested-by: default avatarDibyajyoti Ghosh <dibyajyotig@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      686a230a
    • Jeff Mahoney's avatar
      ecryptfs: don't allow mmap when the lower fs doesn't support it · 114ffc5d
      Jeff Mahoney authored
      commit f0fe970d upstream.
      
      There are legitimate reasons to disallow mmap on certain files, notably
      in sysfs or procfs.  We shouldn't emulate mmap support on file systems
      that don't offer support natively.
      
      CVE-2016-1583
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Cc: Henry Jensen <hjensen@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      114ffc5d
  2. 12 Jul, 2016 13 commits
    • Helge Deller's avatar
      parisc: Fix pagefault crash in unaligned __get_user() call · 22079c65
      Helge Deller authored
      commit 8b78f260 upstream.
      
      One of the debian buildd servers had this crash in the syslog without
      any other information:
      
       Unaligned handler failed, ret = -2
       clock_adjtime (pid 22578): Unaligned data reference (code 28)
       CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G  E  4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
       task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000
      
            YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
       PSW: 00001000000001001111100000001111 Tainted: G            E
       r00-03  000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
       r04-07  00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
       r08-11  0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
       r12-15  000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
       r16-19  0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
       r20-23  0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
       r24-27  0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
       r28-31  0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
       sr00-03  0000000001200000 0000000001200000 0000000000000000 0000000001200000
       sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
       IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
        IIR: 0ca0d089    ISR: 0000000001200000  IOR: 00000000fa6f7fff
        CPU:        1   CR30: 00000001bde7c000 CR31: ffffffffffffffff
        ORIG_R28: 00000002369fe628
        IAOQ[0]: compat_get_timex+0x2dc/0x3c0
        IAOQ[1]: compat_get_timex+0x2e0/0x3c0
        RP(r2): compat_get_timex+0x40/0x3c0
       Backtrace:
        [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0
        [<0000000040205024>] syscall_exit+0x0/0x14
      
      This means the userspace program clock_adjtime called the clock_adjtime()
      syscall and then crashed inside the compat_get_timex() function.
      Syscalls should never crash programs, but instead return EFAULT.
      
      The IIR register contains the executed instruction, which disassebles
      into "ldw 0(sr3,r5),r9".
      This load-word instruction is part of __get_user() which tried to read the word
      at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in.  The
      unaligned handler is able to emulate all ldw instructions, but it fails if it
      fails to read the source e.g. because of page fault.
      
      The following program reproduces the problem:
      
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/mman.h>
      
      int main(void) {
              /* allocate 8k */
              char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
              /* free second half (upper 4k) and make it invalid. */
              munmap(ptr+4096, 4096);
              /* syscall where first int is unaligned and clobbers into invalid memory region */
              /* syscall should return EFAULT */
              return syscall(__NR_clock_adjtime, 0, ptr+4095);
      }
      
      To fix this issue we simply need to check if the faulting instruction address
      is in the exception fixup table when the unaligned handler failed. If it
      is, call the fixup routine instead of crashing.
      
      While looking at the unaligned handler I found another issue as well: The
      target register should not be modified if the handler was unsuccessful.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      22079c65
    • Thomas Huth's avatar
      powerpc: Use privileged SPR number for MMCR2 · 66c43f10
      Thomas Huth authored
      commit 8dd75ccb upstream.
      
      We are already using the privileged versions of MMCR0, MMCR1
      and MMCRA in the kernel, so for MMCR2, we should better use
      the privileged versions, too, to be consistent.
      
      Fixes: 240686c1 ("powerpc: Initialise PMU related regs on Power8")
      Suggested-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      66c43f10
    • Thomas Huth's avatar
      powerpc: Fix definition of SIAR and SDAR registers · 18fca9e1
      Thomas Huth authored
      commit d23fac2b upstream.
      
      The SIAR and SDAR registers are available twice, one time as SPRs
      780 / 781 (unprivileged, but read-only), and one time as the SPRs
      796 / 797 (privileged, but read and write). The Linux kernel code
      currently uses the unprivileged  SPRs - while this is OK for reading,
      writing to that register of course does not work.
      Since the KVM code tries to write to this register, too (see the mtspr
      in book3s_hv_rmhandlers.S), the contents of this register sometimes get
      lost for the guests, e.g. during migration of a VM.
      To fix this issue, simply switch to the privileged SPR numbers instead.
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      18fca9e1
    • Russell King's avatar
      ARM: fix PTRACE_SETVFPREGS on SMP systems · 9b22674c
      Russell King authored
      commit e2dfb4b8 upstream.
      
      PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
      reloaded, because it undoes one of the effects of vfp_flush_hwstate().
      
      Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
      an invalid CPU number, but vfp_set() overwrites this with the original
      CPU number, thereby rendering the hardware state as apparently "valid",
      even though the software state is more recent.
      
      Fix this by reverting the previous change.
      
      Fixes: 8130b9d7 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Tested-by: default avatarSimon Marchi <simon.marchi@ericsson.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9b22674c
    • Paolo Bonzini's avatar
      KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS · 0ad9e29d
      Paolo Bonzini authored
      commit d14bdb55 upstream.
      
      MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
      any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
      time, and the next KVM_RUN oopses:
      
         general protection fault: 0000 [#1] SMP
         CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
         Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
         [...]
         Call Trace:
          [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
          [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
          [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
          [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
          [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
         RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
          RSP <ffff88005836bd50>
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
      
              memcpy(&dr,
                     "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
                     "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
                     "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
                     "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
                     48);
              r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
              r[6] = ioctl(r[4], KVM_RUN, 0);
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0ad9e29d
    • Aaro Koskinen's avatar
      drivers: macintosh: rack-meter: limit idle ticks to total ticks · e58b6fc5
      Aaro Koskinen authored
      commit c796d1d9 upstream.
      
      Limit idle ticks to total ticks. This prevents the annoying rackmeter
      leds fully ON / OFF blinking state that happens on fully idling
      G5 Xserve systems.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Oliver Neukum <oliver@neukum.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e58b6fc5
    • Javier Martinez Canillas's avatar
      macintosh/therm_windtunnel: Export I2C module alias information · a498de21
      Javier Martinez Canillas authored
      commit cb0eefcc upstream.
      
      The I2C core always reports the MODALIAS uevent as "i2c:<client name"
      regardless if the driver was matched using the I2C id_table or the
      of_match_table. So the driver needs to export the I2C table and this
      be built into the module or udev won't have the necessary information
      to auto load the correct module when the device is added.
      Signed-off-by: default avatarJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Oliver Neukum <oliver@neukum.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a498de21
    • Jakub Sitnicki's avatar
      ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · 35291f0c
      Jakub Sitnicki authored
      [ Upstream commit 00bc0ef5 ]
      
      At present we perform an xfrm_lookup() for each UDPv6 message we
      send. The lookup involves querying the flow cache (flow_cache_lookup)
      and, in case of a cache miss, creating an XFRM bundle.
      
      If we miss the flow cache, we can end up creating a new bundle and
      deriving the path MTU (xfrm_init_pmtu) from on an already transformed
      dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
      to xfrm_lookup(). This can happen only if we're caching the dst_entry
      in the socket, that is when we're using a connected UDP socket.
      
      To put it another way, the path MTU shrinks each time we miss the flow
      cache, which later on leads to incorrectly fragmented payload. It can
      be observed with ESPv6 in transport mode:
      
        1) Set up a transformation and lower the MTU to trigger fragmentation
          # ip xfrm policy add dir out src ::1 dst ::1 \
            tmpl src ::1 dst ::1 proto esp spi 1
          # ip xfrm state add src ::1 dst ::1 \
            proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
          # ip link set dev lo mtu 1500
      
        2) Monitor the packet flow and set up an UDP sink
          # tcpdump -ni lo -ttt &
          # socat udp6-listen:12345,fork /dev/null &
      
        3) Send a datagram that needs fragmentation with a connected socket
          # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
          2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
          00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
          00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
          00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
          (^ ICMPv6 Parameter Problem)
          00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
      
        4) Compare it to a non-connected socket
          # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
          00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
          00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
      
      What happens in step (3) is:
      
        1) when connecting the socket in __ip6_datagram_connect(), we
           perform an XFRM lookup, miss the flow cache, create an XFRM
           bundle, and cache the destination,
      
        2) afterwards, when sending the datagram, we perform an XFRM lookup,
           again, miss the flow cache (due to mismatch of flowi6_iif and
           flowi6_oif, which is an issue of its own), and recreate an XFRM
           bundle based on the cached (and already transformed) destination.
      
      To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
      altogether whenever we already have a destination entry cached in the
      socket. This prevents the path MTU shrinkage and brings us on par with
      UDPv4.
      
      The fix also benefits connected PINGv6 sockets, another user of
      ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
      twice.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      35291f0c
    • Yuchung Cheng's avatar
      tcp: record TLP and ER timer stats in v6 stats · dd3dbed5
      Yuchung Cheng authored
      [ Upstream commit ce3cf4ec ]
      
      The v6 tcp stats scan do not provide TLP and ER timer information
      correctly like the v4 version . This patch fixes that.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Fixes: eed530b6 ("tcp: early retransmit")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dd3dbed5
    • Hannes Frederic Sowa's avatar
      udp: prevent skbs lingering in tunnel socket queues · c66991f0
      Hannes Frederic Sowa authored
      [ Upstream commit e5aed006 ]
      
      In case we find a socket with encapsulation enabled we should call
      the encap_recv function even if just a udp header without payload is
      available. The callbacks are responsible for correctly verifying and
      dropping the packets.
      
      Also, in case the header validation fails for geneve and vxlan we
      shouldn't put the skb back into the socket queue, no one will pick
      them up there.  Instead we can simply discard them in the respective
      encap_recv functions.
      
      [js] 3.12 does not have geneve yet.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c66991f0
    • Herbert Xu's avatar
      netlink: Fix dump skb leak/double free · 461dbb38
      Herbert Xu authored
      [ Upstream commit 92964c79 ]
      
      When we free cb->skb after a dump, we do it after releasing the
      lock.  This means that a new dump could have started in the time
      being and we'll end up freeing their skb instead of ours.
      
      This patch saves the skb and module before we unlock so we free
      the right memory.
      
      Fixes: 16b304f3 ("netlink: Eliminate kmalloc in netlink dump operation.")
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      461dbb38
    • Andrey Ryabinin's avatar
      perf/x86: Fix undefined shift on 32-bit kernels · b67568e4
      Andrey Ryabinin authored
      commit 6d6f2833 upstream.
      
      Jim reported:
      
      	UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
      	shift exponent 35 is too large for 32-bit type 'long unsigned int'
      
      The use of 'unsigned long' type obviously is not correct here, make it
      'unsigned long long' instead.
      Reported-by: default avatarJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Imre Palik <imrep@amazon.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 2c33645d ("perf/x86: Honor the architectural performance monitoring version")
      Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Kevin Christopher <kevinc@vmware.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b67568e4
    • Palik, Imre's avatar
      perf/x86: Honor the architectural performance monitoring version · ab4801b9
      Palik, Imre authored
      commit 2c33645d upstream.
      
      Architectural performance monitoring, version 1, doesn't support fixed counters.
      
      Currently, even if a hypervisor advertises support for architectural
      performance monitoring version 1, perf may still try to use the fixed
      counters, as the constraints are set up based on the CPU model.
      
      This patch ensures that perf honors the architectural performance monitoring
      version returned by CPUID, and it only uses the fixed counters for version 2
      and above.
      
      (Some of the ideas in this patch came from Peter Zijlstra.)
      Signed-off-by: default avatarImre Palik <imrep@amazon.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Anthony Liguori <aliguori@amazon.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1433767609-1039-1-git-send-email-imrep.amz@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Kevin Christopher <kevinc@vmware.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ab4801b9
  3. 23 Jun, 2016 10 commits
    • David S. Miller's avatar
      sparc64: Fix return from trap window fill crashes. · 9219a052
      David S. Miller authored
      [ Upstream commit 7cafc0b8 ]
      
      We must handle data access exception as well as memory address unaligned
      exceptions from return from trap window fill faults, not just normal
      TLB misses.
      
      Otherwise we can get an OOPS that looks like this:
      
      ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
      CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
      task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
      TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
      TPC: <do_sparc64_fault+0x5c4/0x700>
      g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
      g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
      o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
      o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
      RPC: <do_sparc64_fault+0x5bc/0x700>
      l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
      l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
      i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
      i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
      I7: <user_rtt_fill_fixup+0x6c/0x7c>
      Call Trace:
       [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c
      
      The window trap handlers are slightly clever, the trap table entries for them are
      composed of two pieces of code.  First comes the code that actually performs
      the window fill or spill trap handling, and then there are three instructions at
      the end which are for exception processing.
      
      The userland register window fill handler is:
      
      	add	%sp, STACK_BIAS + 0x00, %g1;		\
      	ldxa	[%g1 + %g0] ASI, %l0;			\
      	mov	0x08, %g2;				\
      	mov	0x10, %g3;				\
      	ldxa	[%g1 + %g2] ASI, %l1;			\
      	mov	0x18, %g5;				\
      	ldxa	[%g1 + %g3] ASI, %l2;			\
      	ldxa	[%g1 + %g5] ASI, %l3;			\
      	add	%g1, 0x20, %g1;				\
      	ldxa	[%g1 + %g0] ASI, %l4;			\
      	ldxa	[%g1 + %g2] ASI, %l5;			\
      	ldxa	[%g1 + %g3] ASI, %l6;			\
      	ldxa	[%g1 + %g5] ASI, %l7;			\
      	add	%g1, 0x20, %g1;				\
      	ldxa	[%g1 + %g0] ASI, %i0;			\
      	ldxa	[%g1 + %g2] ASI, %i1;			\
      	ldxa	[%g1 + %g3] ASI, %i2;			\
      	ldxa	[%g1 + %g5] ASI, %i3;			\
      	add	%g1, 0x20, %g1;				\
      	ldxa	[%g1 + %g0] ASI, %i4;			\
      	ldxa	[%g1 + %g2] ASI, %i5;			\
      	ldxa	[%g1 + %g3] ASI, %i6;			\
      	ldxa	[%g1 + %g5] ASI, %i7;			\
      	restored;					\
      	retry; nop; nop; nop; nop;			\
      	b,a,pt	%xcc, fill_fixup_dax;			\
      	b,a,pt	%xcc, fill_fixup_mna;			\
      	b,a,pt	%xcc, fill_fixup;
      
      And the way this works is that if any of those memory accesses
      generate an exception, the exception handler can revector to one of
      those final three branch instructions depending upon which kind of
      exception the memory access took.  In this way, the fault handler
      doesn't have to know if it was a spill or a fill that it's handling
      the fault for.  It just always branches to the last instruction in
      the parent trap's handler.
      
      For example, for a regular fault, the code goes:
      
      winfix_trampoline:
      	rdpr	%tpc, %g3
      	or	%g3, 0x7c, %g3
      	wrpr	%g3, %tnpc
      	done
      
      All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
      trap time program counter, we'll get that final instruction in the
      trap handler.
      
      On return from trap, we have to pull the register window in but we do
      this by hand instead of just executing a "restore" instruction for
      several reasons.  The largest being that from Niagara and onward we
      simply don't have enough levels in the trap stack to fully resolve all
      possible exception cases of a window fault when we are already at
      trap level 1 (which we enter to get ready to return from the original
      trap).
      
      This is executed inline via the FILL_*_RTRAP handlers.  rtrap_64.S's
      code branches directly to these to do the window fill by hand if
      necessary.  Now if you look at them, we'll see at the end:
      
      	    ba,a,pt    %xcc, user_rtt_fill_fixup;
      	    ba,a,pt    %xcc, user_rtt_fill_fixup;
      	    ba,a,pt    %xcc, user_rtt_fill_fixup;
      
      And oops, all three cases are handled like a fault.
      
      This doesn't work because each of these trap types (data access
      exception, memory address unaligned, and faults) store their auxiliary
      info in different registers to pass on to the C handler which does the
      real work.
      
      So in the case where the stack was unaligned, the unaligned trap
      handler sets up the arg registers one way, and then we branched to
      the fault handler which expects them setup another way.
      
      So the FAULT_TYPE_* value ends up basically being garbage, and
      randomly would generate the backtrace seen above.
      Reported-by: default avatarNick Alcock <nix@esperi.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9219a052
    • David S. Miller's avatar
      sparc: Harden signal return frame checks. · b3dbf811
      David S. Miller authored
      [ Upstream commit d11c2a0d ]
      
      All signal frames must be at least 16-byte aligned, because that is
      the alignment we explicitly create when we build signal return stack
      frames.
      
      All stack pointers must be at least 8-byte aligned.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b3dbf811
    • David S. Miller's avatar
      sparc64: Take ctx_alloc_lock properly in hugetlb_setup(). · acec837d
      David S. Miller authored
      [ Upstream commit 9ea46abe ]
      
      On cheetahplus chips we take the ctx_alloc_lock in order to
      modify the TLB lookup parameters for the indexed TLBs, which
      are stored in the context register.
      
      This is called with interrupts disabled, however ctx_alloc_lock
      is an IRQ safe lock, therefore we must take acquire/release it
      properly with spin_{lock,unlock}_irq().
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      acec837d
    • Babu Moger's avatar
      sparc/PCI: Fix for panic while enabling SR-IOV · 11885c1b
      Babu Moger authored
      [ Upstream commit d0c31e02 ]
      
      We noticed this panic while enabling SR-IOV in sparc.
      
      mlx4_core: Mellanox ConnectX core driver v2.2-1 (Jan  1 2015)
      mlx4_core: Initializing 0007:01:00.0
      mlx4_core 0007:01:00.0: Enabling SR-IOV with 5 VFs
      mlx4_core: Initializing 0007:01:00.1
      Unable to handle kernel NULL pointer dereference
      insmod(10010): Oops [#1]
      CPU: 391 PID: 10010 Comm: insmod Not tainted
      		4.1.12-32.el6uek.kdump2.sparc64 #1
      TPC: <dma_supported+0x20/0x80>
      I7: <__mlx4_init_one+0x324/0x500 [mlx4_core]>
      Call Trace:
       [00000000104c5ea4] __mlx4_init_one+0x324/0x500 [mlx4_core]
       [00000000104c613c] mlx4_init_one+0xbc/0x120 [mlx4_core]
       [0000000000725f14] local_pci_probe+0x34/0xa0
       [0000000000726028] pci_call_probe+0xa8/0xe0
       [0000000000726310] pci_device_probe+0x50/0x80
       [000000000079f700] really_probe+0x140/0x420
       [000000000079fa24] driver_probe_device+0x44/0xa0
       [000000000079fb5c] __device_attach+0x3c/0x60
       [000000000079d85c] bus_for_each_drv+0x5c/0xa0
       [000000000079f588] device_attach+0x88/0xc0
       [000000000071acd0] pci_bus_add_device+0x30/0x80
       [0000000000736090] virtfn_add.clone.1+0x210/0x360
       [00000000007364a4] sriov_enable+0x2c4/0x520
       [000000000073672c] pci_enable_sriov+0x2c/0x40
       [00000000104c2d58] mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
       [00000000104c49ac] mlx4_load_one+0x42c/0xd40 [mlx4_core]
      Disabling lock debugging due to kernel taint
      Caller[00000000104c5ea4]: __mlx4_init_one+0x324/0x500 [mlx4_core]
      Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
      Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
      Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
      Caller[0000000000726310]: pci_device_probe+0x50/0x80
      Caller[000000000079f700]: really_probe+0x140/0x420
      Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
      Caller[000000000079fb5c]: __device_attach+0x3c/0x60
      Caller[000000000079d85c]: bus_for_each_drv+0x5c/0xa0
      Caller[000000000079f588]: device_attach+0x88/0xc0
      Caller[000000000071acd0]: pci_bus_add_device+0x30/0x80
      Caller[0000000000736090]: virtfn_add.clone.1+0x210/0x360
      Caller[00000000007364a4]: sriov_enable+0x2c4/0x520
      Caller[000000000073672c]: pci_enable_sriov+0x2c/0x40
      Caller[00000000104c2d58]: mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
      Caller[00000000104c49ac]: mlx4_load_one+0x42c/0xd40 [mlx4_core]
      Caller[00000000104c5f90]: __mlx4_init_one+0x410/0x500 [mlx4_core]
      Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
      Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
      Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
      Caller[0000000000726310]: pci_device_probe+0x50/0x80
      Caller[000000000079f700]: really_probe+0x140/0x420
      Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
      Caller[000000000079fb08]: __driver_attach+0x88/0xa0
      Caller[000000000079d90c]: bus_for_each_dev+0x6c/0xa0
      Caller[000000000079f29c]: driver_attach+0x1c/0x40
      Caller[000000000079e35c]: bus_add_driver+0x17c/0x220
      Caller[00000000007a02d4]: driver_register+0x74/0x120
      Caller[00000000007263fc]: __pci_register_driver+0x3c/0x60
      Caller[00000000104f62bc]: mlx4_init+0x60/0xcc [mlx4_core]
      Kernel panic - not syncing: Fatal exception
      Press Stop-A (L1-A) to return to the boot prom
      ---[ end Kernel panic - not syncing: Fatal exception
      
      Details:
      Here is the call sequence
      virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported
      
      The panic happened at line 760(file arch/sparc/kernel/iommu.c)
      
      758 int dma_supported(struct device *dev, u64 device_mask)
      759 {
      760         struct iommu *iommu = dev->archdata.iommu;
      761         u64 dma_addr_mask = iommu->dma_addr_mask;
      762
      763         if (device_mask >= (1UL << 32UL))
      764                 return 0;
      765
      766         if ((device_mask & dma_addr_mask) == dma_addr_mask)
      767                 return 1;
      768
      769 #ifdef CONFIG_PCI
      770         if (dev_is_pci(dev))
      771		return pci64_dma_supported(to_pci_dev(dev), device_mask);
      772 #endif
      773
      774         return 0;
      775 }
      776 EXPORT_SYMBOL(dma_supported);
      
      Same panic happened with Intel ixgbe driver also.
      
      SR-IOV code looks for arch specific data while enabling
      VFs. When VF device is added, driver probe function makes set
      of calls to initialize the pci device. Because the VF device is
      added different way than the normal PF device(which happens via
      of_create_pci_dev for sparc), some of the arch specific initialization
      does not happen for VF device.  That causes panic when archdata is
      accessed.
      
      To fix this, I have used already defined weak function
      pcibios_setup_device to copy archdata from PF to VF.
      Also verified the fix.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reviewed-by: default avatarEthan Zhao <ethan.zhao@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      11885c1b
    • David S. Miller's avatar
      sparc64: Fix sparc64_set_context stack handling. · b98a9a51
      David S. Miller authored
      [ Upstream commit 397d1533 ]
      
      Like a signal return, we should use synchronize_user_stack() rather
      than flush_user_windows().
      Reported-by: default avatarIlya Malakhov <ilmalakhovthefirst@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b98a9a51
    • David S. Miller's avatar
      sparc64: Fix bootup regressions on some Kconfig combinations. · 40fd3377
      David S. Miller authored
      [ Upstream commit 49fa5230 ]
      
      The system call tracing bug fix mentioned in the Fixes tag
      below increased the amount of assembler code in the sequence
      of assembler files included by head_64.S
      
      This caused to total set of code to exceed 0x4000 bytes in
      size, which overflows the expression in head_64.S that works
      to place swapper_tsb at address 0x408000.
      
      When this is violated, the TSB is not properly aligned, and
      also the trap table is not aligned properly either.  All of
      this together results in failed boots.
      
      So, do two things:
      
      1) Simplify some code by using ba,a instead of ba/nop to get
         those bytes back.
      
      2) Add a linker script assertion to make sure that if this
         happens again the build will fail.
      
      Fixes: 1a40b953 ("sparc: Fix system call tracing register handling.")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Reported-by: default avatarJoerg Abraham <joerg.abraham@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      40fd3377
    • Mike Frysinger's avatar
      sparc: Fix system call tracing register handling. · 3428d1d1
      Mike Frysinger authored
      [ Upstream commit 1a40b953 ]
      
      A system call trace trigger on entry allows the tracing
      process to inspect and potentially change the traced
      process's registers.
      
      Account for that by reloading the %g1 (syscall number)
      and %i0-%i5 (syscall argument) values.  We need to be
      careful to revalidate the range of %g1, and reload the
      system call table entry it corresponds to into %l7.
      Reported-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3428d1d1
    • Russell Currey's avatar
      powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge · f856f1da
      Russell Currey authored
      commit 871e178e upstream.
      
      In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
      spec states that values of 9900-9905 can be returned, indicating that
      software should delay for 10^x (where x is the last digit, i.e. 990x)
      milliseconds and attempt the call again. Currently, the kernel doesn't
      know about this, and respecting it fixes some PCI failures when the
      hypervisor is busy.
      
      The delay is capped at 0.2 seconds.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Acked-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f856f1da
    • Ralf Baechle's avatar
      MIPS: Fix 64k page support for 32 bit kernels. · 05546689
      Ralf Baechle authored
      commit d7de4134 upstream.
      
      TASK_SIZE was defined as 0x7fff8000UL which for 64k pages is not a
      multiple of the page size.  Somewhere further down the math fails
      such that executing an ELF binary fails.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Tested-by: default avatarJoshua Henderson <joshua.henderson@microchip.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      05546689
    • Taku Izumi's avatar
      PCI/AER: Clear error status registers during enumeration and restore · 3d4036cb
      Taku Izumi authored
      commit b07461a8 upstream.
      
      AER errors might be recorded when powering-on devices.  These errors can be
      ignored, so firmware usually clears them before the OS enumerates devices.
      However, firmware is not involved when devices are added via hotplug, so
      the OS may discover power-up errors that should be ignored.  The same may
      happen when powering up devices when resuming after suspend.
      
      Clear the AER error status registers during enumeration and resume.
      
      [bhelgaas: changelog, remove repetitive comments]
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3d4036cb
  4. 15 Jun, 2016 14 commits