1. 17 May, 2018 6 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 58ddfe6c
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
      
       - ARM/ARM64 locking fixes
      
       - x86 fixes: PCID, UMIP, locking
      
       - improved support for recent Windows version that have a 2048 Hz APIC
         timer
      
       - rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME
      
       - better behaved selftests
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
        KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
        KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
        KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
        KVM: arm/arm64: Properly protect VGIC locks from IRQs
        KVM: X86: Lower the default timer frequency limit to 200us
        KVM: vmx: update sec exec controls for UMIP iff emulating UMIP
        kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled
        KVM: selftests: exit with 0 status code when tests cannot be run
        KVM: hyperv: idr_find needs RCU protection
        x86: Delay skip of emulated hypercall instruction
        KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
      58ddfe6c
    • Linus Torvalds's avatar
      Merge tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 7c9a0fc7
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "We have a core fix in the compat code for covering a potential race
        (double references), but it's a very minor change.
      
        The rest are all small device-specific quirks, as well as a correction
        of the new UAC3 support code"
      
      * tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Use Class Specific EP for UAC3 devices.
        ALSA: hda/realtek - Clevo P950ER ALC1220 Fixup
        ALSA: usb: mixer: volume quirk for CM102-A+/102S+
        ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
        ALSA: control: fix a redundant-copy issue
      7c9a0fc7
    • Michael S. Tsirkin's avatar
      kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8
      Michael S. Tsirkin authored
      KVM_HINTS_DEDICATED seems to be somewhat confusing:
      
      Guest doesn't really care whether it's the only task running on a host
      CPU as long as it's not preempted.
      
      And there are more reasons for Guest to be preempted than host CPU
      sharing, for example, with memory overcommit it can get preempted on a
      memory access, post copy migration can cause preemption, etc.
      
      Let's call it KVM_HINTS_REALTIME which seems to better
      match what guests expect.
      
      Also, the flag most be set on all vCPUs - current guests assume this.
      Note so in the documentation.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      633711e8
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 3e9245c5
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
      
       - a fix for the vfio ccw translation code
      
       - update an incorrect email address in the MAINTAINERS file
      
       - fix a division by zero oops in the cpum_sf code found by trinity
      
       - two fixes for the error handling of the qdio code
      
       - several spectre related patches to convert all left-over indirect
         branches in the kernel to expoline branches
      
       - update defconfigs to avoid warnings due to the netfilter Kconfig
         changes
      
       - avoid several compiler warnings in the kexec_file code for s390
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/qdio: don't release memory in qdio_setup_irq()
        s390/qdio: fix access to uninitialized qdio_q fields
        s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
        s390: use expoline thunks in the BPF JIT
        s390: extend expoline to BC instructions
        s390: remove indirect branch from do_softirq_own_stack
        s390: move spectre sysfs attribute code
        s390/kernel: use expoline for indirect branches
        s390/ftrace: use expoline for indirect branches
        s390/lib: use expoline for indirect branches
        s390/crc32-vx: use expoline for indirect branches
        s390: move expoline assembler macros to a header
        vfio: ccw: fix cleanup if cp_prefetch fails
        s390/kexec_file: add declaration of purgatory related globals
        s390: update defconfigs
        MAINTAINERS: update s390 zcrypt maintainers email address
      3e9245c5
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 305bb552
      Linus Torvalds authored
      Pull SELinux fixes from Paul Moore:
       "A small pull request to fix a few regressions in the SELinux/SCTP code
        with applications that call bind() with AF_UNSPEC/INADDR_ANY.
      
        The individual commit descriptions have more information, but the
        commits themselves should be self explanatory"
      
      * tag 'selinux-pr-20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: correctly handle sa_family cases in selinux_sctp_bind_connect()
        selinux: fix address family in bind() and connect() to match address/port
        selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
      305bb552
    • Willy Tarreau's avatar
      proc: do not access cmdline nor environ from file-backed areas · 7f7ccc2c
      Willy Tarreau authored
      proc_pid_cmdline_read() and environ_read() directly access the target
      process' VM to retrieve the command line and environment. If this
      process remaps these areas onto a file via mmap(), the requesting
      process may experience various issues such as extra delays if the
      underlying device is slow to respond.
      
      Let's simply refuse to access file-backed areas in these functions.
      For this we add a new FOLL_ANON gup flag that is passed to all calls
      to access_remote_vm(). The code already takes care of such failures
      (including unmapped areas). Accesses via /proc/pid/mem were not
      changed though.
      
      This was assigned CVE-2018-1120.
      
      Note for stable backports: the patch may apply to kernels prior to 4.11
      but silently miss one location; it must be checked that no call to
      access_remote_vm() keeps zero as the last argument.
      Reported-by: default avatarQualys Security Advisory <qsa@qualys.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f7ccc2c
  2. 16 May, 2018 3 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · e6506eb2
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Some of the ftrace internal events use a zero for a data size of a
        field event. This is increasingly important for the histogram trigger
        work that is being extended.
      
        While auditing trace events, I found that a couple of the xen events
        were used as just marking that a function was called, by creating a
        static array of size zero. This can play havoc with the tracing
        features if these events are used, because a zero size of a static
        array is denoted as a special nul terminated dynamic array (this is
        what the trace_marker code uses). But since the xen events have no
        size, they are not nul terminated, and unexpected results may occur.
      
        As trace events were never intended on being a marker to denote that a
        function was hit or not, especially since function tracing and kprobes
        can trivially do the same, the best course of action is to simply
        remove these events"
      
      * tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
      e6506eb2
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc5-vsprintf' of... · 9d38cd06
      Linus Torvalds authored
      Merge tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull memory barrier for from Steven Rostedt:
       "The memory barrier usage in updating the random ptr hash for %p in
        vsprintf is incorrect.
      
        Instead of adding the read memory barrier into vsprintf() which will
        cause a slight degradation to a commonly used function in the kernel
        just to solve a very unlikely race condition that can only happen at
        boot up, change the code from using a variable branch to a
        static_branch.
      
        Not only does this solve the race condition, it actually will improve
        the performance of vsprintf() by removing the conditional branch that
        is only needed at boot"
      
      * tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        vsprintf: Replace memory barrier with static_key for random_ptr_key update
      9d38cd06
    • Steven Rostedt (VMware)'s avatar
      vsprintf: Replace memory barrier with static_key for random_ptr_key update · 85f4f12d
      Steven Rostedt (VMware) authored
      Reviewing Tobin's patches for getting pointers out early before
      entropy has been established, I noticed that there's a lone smp_mb() in
      the code. As with most lone memory barriers, this one appears to be
      incorrectly used.
      
      We currently basically have this:
      
      	get_random_bytes(&ptr_key, sizeof(ptr_key));
      	/*
      	 * have_filled_random_ptr_key==true is dependent on get_random_bytes().
      	 * ptr_to_id() needs to see have_filled_random_ptr_key==true
      	 * after get_random_bytes() returns.
      	 */
      	smp_mb();
      	WRITE_ONCE(have_filled_random_ptr_key, true);
      
      And later we have:
      
      	if (unlikely(!have_filled_random_ptr_key))
      		return string(buf, end, "(ptrval)", spec);
      
      /* Missing memory barrier here. */
      
      	hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
      
      As the CPU can perform speculative loads, we could have a situation
      with the following:
      
      	CPU0				CPU1
      	----				----
      				   load ptr_key = 0
         store ptr_key = random
         smp_mb()
         store have_filled_random_ptr_key
      
      				   load have_filled_random_ptr_key = true
      
      				    BAD BAD BAD! (you're so bad!)
      
      Because nothing prevents CPU1 from loading ptr_key before loading
      have_filled_random_ptr_key.
      
      But this race is very unlikely, but we can't keep an incorrect smp_mb() in
      place. Instead, replace the have_filled_random_ptr_key with a static_branch
      not_filled_random_ptr_key, that is initialized to true and changed to false
      when we get enough entropy. If the update happens in early boot, the
      static_key is updated immediately, otherwise it will have to wait till
      entropy is filled and this happens in an interrupt handler which can't
      enable a static_key, as that requires a preemptible context. In that case, a
      work_queue is used to enable it, as entropy already took too long to
      establish in the first place waiting a little more shouldn't hurt anything.
      
      The benefit of using the static key is that the unlikely branch in
      vsprintf() now becomes a nop.
      
      Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      85f4f12d
  3. 15 May, 2018 12 commits
  4. 14 May, 2018 18 commits
    • Steven Rostedt (VMware)'s avatar
      tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all} · 45dd9b06
      Steven Rostedt (VMware) authored
      Doing an audit of trace events, I discovered two trace events in the xen
      subsystem that use a hack to create zero data size trace events. This is not
      what trace events are for. Trace events add memory footprint overhead, and
      if all you need to do is see if a function is hit or not, simply make that
      function noinline and use function tracer filtering.
      
      Worse yet, the hack used was:
      
       __array(char, x, 0)
      
      Which creates a static string of zero in length. There's assumptions about
      such constructs in ftrace that this is a dynamic string that is nul
      terminated. This is not the case with these tracepoints and can cause
      problems in various parts of ftrace.
      
      Nuke the trace events!
      
      Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 95a7d768 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      45dd9b06
    • Alexey Kodanev's avatar
      selinux: correctly handle sa_family cases in selinux_sctp_bind_connect() · 4152dc91
      Alexey Kodanev authored
      Allow to pass the socket address structure with AF_UNSPEC family for
      compatibility purposes. selinux_socket_bind() will further check it
      for INADDR_ANY and selinux_socket_connect_helper() should return
      EINVAL.
      
      For a bad address family return EINVAL instead of AFNOSUPPORT error,
      i.e. what is expected from SCTP protocol in such case.
      
      Fixes: d452930f ("selinux: Add SCTP support")
      Suggested-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      4152dc91
    • Alexey Kodanev's avatar
      selinux: fix address family in bind() and connect() to match address/port · 88b7d370
      Alexey Kodanev authored
      Since sctp_bindx() and sctp_connectx() can have multiple addresses,
      sk_family can differ from sa_family. Therefore, selinux_socket_bind()
      and selinux_socket_connect_helper(), which process sockaddr structure
      (address and port), should use the address family from that structure
      too, and not from the socket one.
      
      The initialization of the data for the audit record is moved above,
      in selinux_socket_bind(), so that there is no duplicate changes and
      code.
      
      Fixes: d452930f ("selinux: Add SCTP support")
      Suggested-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      88b7d370
    • Alexey Kodanev's avatar
      selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind() · 0f8db8cc
      Alexey Kodanev authored
      Commit d452930f ("selinux: Add SCTP support") breaks compatibility
      with the old programs that can pass sockaddr_in structure with AF_UNSPEC
      and INADDR_ANY to bind(). As a result, bind() returns EAFNOSUPPORT error.
      This was found with LTP/asapi_01 test.
      
      Similar to commit 29c486df ("net: ipv4: relax AF_INET check in
      bind()"), which relaxed AF_INET check for compatibility, add AF_UNSPEC
      case to AF_INET and make sure that the address is INADDR_ANY.
      
      Fixes: d452930f ("selinux: Add SCTP support")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      0f8db8cc
    • David Howells's avatar
      afs: Fix the non-encryption of calls · 4776cab4
      David Howells authored
      Some AFS servers refuse to accept unencrypted traffic, so can't be accessed
      with kAFS.  Set the AF_RXRPC security level to encrypt client calls to deal
      with this.
      
      Note that incoming service calls are set by the remote client and so aren't
      affected by this.
      
      This requires an AF_RXRPC patch to pass the value set by setsockopt to calls
      begun by the kernel.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4776cab4
    • David Howells's avatar
      afs: Fix CB.CallBack handling · 428edade
      David Howells authored
      The handling of CB.CallBack messages sent by the fileserver to the client
      is broken in that they are currently being processed after the reply has
      been transmitted.
      
      This is not what the fileserver expects, however.  It holds up change
      visibility until the reply comes so as to maintain cache coherency, and so
      expects the client to have to refetch the state on the affected files.
      
      Fix CB.CallBack handling to perform the callback break before sending the
      reply.
      
      The fileserver is free to hold up status fetches issued by other threads on
      the same client that occur in reponse to the callback until any pending
      changes have been committed.
      
      Fixes: d001648e ("rxrpc: Don't expose skbs to in-kernel users [ver #2]")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      428edade
    • David Howells's avatar
      afs: Fix whole-volume callback handling · 68251f0a
      David Howells authored
      It's possible for an AFS file server to issue a whole-volume notification
      that callbacks on all the vnodes in the file have been broken.  This is
      done for R/O and backup volumes (which don't have per-file callbacks) and
      for things like a volume being taken offline.
      
      Fix callback handling to detect whole-volume notifications, to track it
      across operations and to check it during inode validation.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      68251f0a
    • Marc Dionne's avatar
      afs: Fix afs_find_server search loop · f9c1bba3
      Marc Dionne authored
      The code that looks up servers by addresses makes the assumption
      that the list of addresses for a server is sorted.  It exits the
      loop if it finds that the target address is larger than the
      current candidate.  As the list is not currently sorted, this
      can lead to a failure to find a matching server, which can cause
      callbacks from that server to be ignored.
      
      Remove the early exit case so that the complete list is searched.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f9c1bba3
    • David Howells's avatar
      afs: Fix the handling of an unfound server in CM operations · a86b06d1
      David Howells authored
      If the client cache manager operations that need the server record
      (CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find
      the server record, they abort the call from the file server with
      RX_CALL_DEAD when they should return okay.
      
      Fixes: c35eccb1 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a86b06d1
    • David Howells's avatar
      afs: Add a tracepoint to record callbacks from unlisted servers · 3709a399
      David Howells authored
      Add a tracepoint to record callbacks from servers for which we don't have a
      record.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3709a399
    • David Howells's avatar
      afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID · 001ab5a6
      David Howells authored
      Fix the handling of the CB.InitCallBackState3 service call to find the
      record of a server that we're using by looking it up by the UUID passed as
      the parameter rather than by its address (of which it might have many, and
      which may change).
      
      Fixes: c35eccb1 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      001ab5a6
    • David Howells's avatar
      afs: Fix VNOVOL handling in address rotation · 3d9fa911
      David Howells authored
      If a volume location record lists multiple file servers for a volume, then
      it's possible that due to a misconfiguration or a changing configuration
      that one of the file servers doesn't know about it yet and will abort
      VNOVOL.  Currently, the rotation algorithm will stop with EREMOTEIO.
      
      Fix this by moving on to try the next server if VNOVOL is returned.  Once
      all the servers have been tried and the record rechecked, the algorithm
      will stop with EREMOTEIO or ENOMEDIUM.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3d9fa911
    • David Howells's avatar
      afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility · 684b0f68
      David Howells authored
      The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug
      whereby if an error occurs on one of the vnodes being queried, then the
      errorCode field is set correctly in the corresponding status, but the
      interfaceVersion field is left unset.
      
      Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against
      the following cases when called from FS.InlineBulkStatus delivery:
      
       (1) If InterfaceVersion == 0 then:
      
           (a) If errorCode != 0 then it indicates the abort code for the
               corresponding vnode.
      
           (b) If errorCode == 0 then the status record is invalid.
      
       (2) If InterfaceVersion == 1 then:
      
           (a) If errorCode != 0 then it indicates the abort code for the
               corresponding vnode.
      
           (b) If errorCode == 0 then the status record is valid and can be
           	 parsed.
      
       (3) If InterfaceVersion is anything else then the status record is
           invalid.
      
      Fixes: dd9fbcb8 ("afs: Rearrange status mapping")
      Reported-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      684b0f68
    • David Howells's avatar
      afs: Fix server rotation's handling of fileserver probe failure · ec5a3b4b
      David Howells authored
      The server rotation algorithm just gives up if it fails to probe a
      fileserver.  Fix this by rotating to the next fileserver instead.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      ec5a3b4b
    • David Howells's avatar
      afs: Fix refcounting in callback registration · d4a96bec
      David Howells authored
      The refcounting on afs_cb_interest struct objects in
      afs_register_server_cb_interest() is wrong as it uses the server list
      entry's call back interest pointer without regard for the fact that it
      might be replaced at any time and the object thrown away.
      
      Fix this by:
      
       (1) Put a lock on the afs_server_list struct that can be used to
           mediate access to the callback interest pointers in the servers array.
      
       (2) Keep a ref on the callback interest that we get from the entry.
      
       (3) Dropping the old reference held by vnode->cb_interest if we replace
           the pointer.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d4a96bec
    • David Howells's avatar
      afs: Fix giving up callbacks on server destruction · f2686b09
      David Howells authored
      When a server record is destroyed, we want to send a message to the server
      telling it that we're giving up all the callbacks it has promised us.
      
      Apply two fixes to this:
      
       (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
           callback from that server.  We assume this to be the case if we
           performed at least one successful FS operation on that server.
      
       (2) Send it to the address last used for that server rather than always
           picking the first address in the list (which might be unreachable).
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f2686b09
    • David Howells's avatar
      afs: Fix address list parsing · 01fd79e6
      David Howells authored
      The parsing of port specifiers in the address list obtained from the DNS
      resolution upcall doesn't work as in4_pton() and in6_pton() will fail on
      encountering an unexpected delimiter (in this case, the '+' marking the
      port number).  However, in*_pton() can't be given multiple specifiers.
      
      Fix this by finding the delimiter in advance and not relying on in*_pton()
      to find the end of the address for us.
      
      Fixes: 8b2a464c ("afs: Add an address list concept")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      01fd79e6
    • David Howells's avatar
      afs: Fix directory page locking · b61f7dcf
      David Howells authored
      The afs directory loading code (primarily afs_read_dir()) locks all the
      pages that hold a directory's content blob to defend against
      getdents/getdents races and getdents/lookup races where the competitors
      issue conflicting reads on the same data.  As the reads will complete
      consecutively, they may retrieve different versions of the data and
      one may overwrite the data that the other is busy parsing.
      
      Fix this by not locking the pages at all, but rather by turning the
      validation lock into an rwsem and getting an exclusive lock on it whilst
      reading the data or validating the attributes and a shared lock whilst
      parsing the data.  Sharing the attribute validation lock should be fine as
      the data fetch will retrieve the attributes also.
      
      The individual page locks aren't needed at all as the only place they're
      being used is to serialise data loading.
      
      Without this patch, the:
      
       	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
      		...
      	}
      
      part of afs_read_dir() may be skipped, leaving the pages unlocked when we
      hit the success: clause - in which case we try to unlock the not-locked
      pages, leading to the following oops:
      
        page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
        flags: 0xfffe000001004(referenced|private)
        raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
        raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
        page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
        page->mem_cgroup:ffff98156b27c000
        ------------[ cut here ]------------
        kernel BUG at mm/filemap.c:1205!
        ...
        RIP: 0010:unlock_page+0x43/0x50
        ...
        Call Trace:
         afs_dir_iterate+0x789/0x8f0 [kafs]
         ? _cond_resched+0x15/0x30
         ? kmem_cache_alloc_trace+0x166/0x1d0
         ? afs_do_lookup+0x69/0x490 [kafs]
         ? afs_do_lookup+0x101/0x490 [kafs]
         ? key_default_cmp+0x20/0x20
         ? request_key+0x3c/0x80
         ? afs_lookup+0xf1/0x340 [kafs]
         ? __lookup_slow+0x97/0x150
         ? lookup_slow+0x35/0x50
         ? walk_component+0x1bf/0x490
         ? path_lookupat.isra.52+0x75/0x200
         ? filename_lookup.part.66+0xa0/0x170
         ? afs_end_vnode_operation+0x41/0x60 [kafs]
         ? __check_object_size+0x9c/0x171
         ? strncpy_from_user+0x4a/0x170
         ? vfs_statx+0x73/0xe0
         ? __do_sys_newlstat+0x39/0x70
         ? __x64_sys_getdents+0xc9/0x140
         ? __x64_sys_getdents+0x140/0x140
         ? do_syscall_64+0x5b/0x160
         ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: f3ddee8d ("afs: Fix directory handling")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b61f7dcf
  5. 13 May, 2018 1 commit