1. 01 Apr, 2019 19 commits
    • Ming Lei's avatar
      block: remove argument of 'request_queue' from __blk_bvec_map_sg · cae6c2e5
      Ming Lei authored
      The argument of 'request_queue' isn't used by __blk_bvec_map_sg(),
      so remove it.
      
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cae6c2e5
    • Ming Lei's avatar
      block: enable multi-page bvec for passthrough IO · 489fbbcb
      Ming Lei authored
      Now block IO stack is basically ready for supporting multi-page bvec,
      however it isn't enabled on passthrough IO.
      
      One reason is that passthrough IO is dispatched to LLD directly and bio
      split is bypassed, so the bio has to be built correctly for dispatch to
      LLD from the beginning.
      
      Implement multi-page support for passthrough IO by limitting each bvec
      as block device's segment and applying all kinds of queue limit in
      blk_add_pc_page(). Then we don't need to calculate segments any more for
      passthrough IO any more, turns out code is simplified much.
      
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      489fbbcb
    • Ming Lei's avatar
      block: put the same page when adding it to bio · 19047087
      Ming Lei authored
      When the added page is merged to last same page in bio_add_pc_page(),
      the user may need to put this page for avoiding page leak.
      
      bio_map_user_iov() needs this kind of handling, and now it deals with
      it by itself in hack style.
      
      Moves the handling of put page into __bio_add_pc_page(), so
      bio_map_user_iov() may be simplified a bit, and maybe more users
      can benefit from this change.
      
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      19047087
    • Ming Lei's avatar
      block: check if page is mergeable in one helper · 5919482e
      Ming Lei authored
      Now the check for deciding if one page is mergeable to current bvec
      becomes a bit complicated, and we need to reuse the code before
      adding pc page.
      
      So move the check in one dedicated helper.
      
      No function change.
      
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5919482e
    • Ming Lei's avatar
      block: cleanup bio_add_pc_page · 5a8ce240
      Ming Lei authored
      REQ_PC is out of date, so replace it with passthrough IO.
      
      Also remove the local variable of 'prev' since we can reuse
      the top local variable of 'bvec'.
      
      No function change.
      
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5a8ce240
    • Ming Lei's avatar
      block: don't merge adjacent bvecs to one segment in bio blk_queue_split · fd7d8d42
      Ming Lei authored
      For normal filesystem IO, each page is added via blk_add_page(),
      in which bvec(page) merge has been handled already, and basically
      not possible to merge two adjacent bvecs in one bio.
      
      So not try to merge two adjacent bvecs in blk_queue_split().
      
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fd7d8d42
    • Ming Lei's avatar
      block: avoid to break XEN by multi-page bvec · db5ebd6e
      Ming Lei authored
      XEN has special page merge requirement, see xen_biovec_phys_mergeable().
      We can't merge pages into one bvec simply for XEN.
      
      So move XEN's specific check on page merge into __bio_try_merge_page(),
      then abvoid to break XEN by multi-page bvec.
      
      Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: xen-devel@lists.xenproject.org
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      db5ebd6e
    • Ming Lei's avatar
      block: pass page to xen_biovec_phys_mergeable · 0383ad43
      Ming Lei authored
      xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
      for checking if the two bvecs can be merged, so pass page to
      xen_biovec_phys_mergeable() directly.
      
      No function change.
      
      Cc: ris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: xen-devel@lists.xenproject.org
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0383ad43
    • Holger Hoffstätte's avatar
      loop: properly observe rotational flag of underlying device · 56a85fd8
      Holger Hoffstätte authored
      The loop driver always declares the rotational flag of its device as
      rotational, even when the device of the mapped file is nonrotational,
      as is the case with SSDs or on tmpfs. This can confuse filesystem tools
      which are SSD-aware; in my case I frequently forget to tell mkfs.btrfs
      that my loop device on tmpfs is nonrotational, and that I really don't
      need any automatic metadata redundancy.
      
      The attached patch fixes this by introspecting the rotational flag of the
      mapped file's underlying block device, if it exists. If the mapped file's
      filesystem has no associated block device - as is the case on e.g. tmpfs -
      we assume nonrotational storage. If there is a better way to identify such
      non-devices I'd love to hear them.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: holger@applied-asynchrony.com
      Signed-off-by: default avatarHolger Hoffstätte <holger.hoffstaette@googlemail.com>
      Signed-off-by: default avatarGwendal Grignou <gwendal@chromium.org>
      Signed-off-by: default avatarBenjamin Gordon <bmgordon@chromium.org>
      Reviewed-by: default avatarGuenter Roeck <groeck@chromium.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      56a85fd8
    • Paolo Valente's avatar
      doc, block, bfq: add information on bfq execution time · 4438cf50
      Paolo Valente authored
      The execution time of BFQ has been slightly lowered. Report the new
      execution time in BFQ documentation.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4438cf50
    • Francesco Pollicino's avatar
      block, bfq: save & resume weight on a queue merge/split · fffca087
      Francesco Pollicino authored
      bfq saves the state of a queue each time a merge occurs, to be
      able to resume such a state when the queue is associated again
      with its original process, on a split.
      
      Unfortunately bfq does not save & restore also the weight of the
      queue. If the weight is not correctly resumed when the queue is
      recycled, then the weight of the recycled queue could differ
      from the weight of the original queue.
      
      This commit adds the missing save & resume of the weight.
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarFrancesco Pollicino <fra.fra.800@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fffca087
    • Francesco Pollicino's avatar
      block, bfq: print SHARED instead of pid for shared queues in logs · 1e66413c
      Francesco Pollicino authored
      The function "bfq_log_bfqq" prints the pid of the process
      associated with the queue passed as input.
      
      Unfortunately, if the queue is shared, then more than one process
      is associated with the queue. The pid that gets printed in this
      case is the pid of one of the associated processes.
      Which process gets printed depends on the exact sequence of merge
      events the queue underwent. So printing such a pid is rather
      useless and above all is often rather confusing because it
      reports a random pid between those of the associated processes.
      
      This commit addresses this issue by printing SHARED instead of a pid
      if the queue is shared.
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarFrancesco Pollicino <fra.fra.800@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1e66413c
    • Paolo Valente's avatar
      block, bfq: always protect newly-created queues from existing active queues · 84a74689
      Paolo Valente authored
      If many bfq_queues belonging to the same group happen to be created
      shortly after each other, then the processes associated with these
      queues have typically a common goal. In particular, bursts of queue
      creations are usually caused by services or applications that spawn
      many parallel threads/processes. Examples are systemd during boot, or
      git grep. If there are no other active queues, then, to help these
      processes get their job done as soon as possible, the best thing to do
      is to reach a high throughput. To this goal, it is usually better to
      not grant either weight-raising or device idling to the queues
      associated with these processes. And this is exactly what BFQ
      currently does.
      
      There is however a drawback: if, in contrast, some other queues are
      already active, then the newly created queues must be protected from
      the I/O flowing through the already existing queues. In this case, the
      best thing to do is the opposite as in the other case: it is much
      better to grant weight-raising and device idling to the newly-created
      queues, if they deserve it. This commit addresses this issue by doing
      so if there are already other active queues.
      
      This change also helps eliminating false positives, which occur when
      the newly-created queues do not belong to an actual large burst of
      creations, but some background task (e.g., a service) happens to
      trigger the creation of new queues in the middle, i.e., very close to
      when the victim queues are created. These false positive may cause
      total loss of control on process latencies.
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      84a74689
    • Paolo Valente's avatar
      block, bfq: do not tag totally seeky queues as soft rt · 7074f076
      Paolo Valente authored
      Sync random I/O is likely to be confused with soft real-time I/O,
      because it is characterized by limited throughput and apparently
      isochronous arrival pattern. To avoid false positives, this commits
      prevents bfq_queues containing only random (seeky) I/O from being
      tagged as soft real-time.
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7074f076
    • Paolo Valente's avatar
      block, bfq: do not merge queues on flash storage with queueing · 8cacc5ab
      Paolo Valente authored
      To boost throughput with a set of processes doing interleaved I/O
      (i.e., a set of processes whose individual I/O is random, but whose
      merged cumulative I/O is sequential), BFQ merges the queues associated
      with these processes, i.e., redirects the I/O of these processes into a
      common, shared queue. In the shared queue, I/O requests are ordered by
      their position on the medium, thus sequential I/O gets dispatched to
      the device when the shared queue is served.
      
      Queue merging costs execution time, because, to detect which queues to
      merge, BFQ must maintain a list of the head I/O requests of active
      queues, ordered by request positions. Measurements showed that this
      costs about 10% of BFQ's total per-request processing time.
      
      Request processing time becomes more and more critical as the speed of
      the underlying storage device grows. Yet, fortunately, queue merging
      is basically useless on the very devices that are so fast to make
      request processing time critical. To reach a high throughput, these
      devices must have many requests queued at the same time. But, in this
      configuration, the internal scheduling algorithms of these devices do
      also the job of queue merging: they reorder requests so as to obtain
      as much as possible a sequential I/O pattern. As a consequence, with
      processes doing interleaved I/O, the throughput reached by one such
      device is likely to be the same, with and without queue merging.
      
      In view of this fact, this commit disables queue merging, and all
      related housekeeping, for non-rotational devices with internal
      queueing. The total, single-lock-protected, per-request processing
      time of BFQ drops to, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
      (time measured with simple code instrumentation, and using the
      throughput-sync.sh script of the S suite [1], in performance-profiling
      mode). To put this result into context, the total,
      single-lock-protected, per-request execution time of the lightest I/O
      scheduler available in blk-mq, mq-deadline, is 0.7 us (mq-deadline is
      ~800 LOC, against ~10500 LOC for BFQ).
      
      Disabling merging provides a further, remarkable benefit in terms of
      throughput. Merging tends to make many workloads artificially more
      uneven, mainly because of shared queues remaining non empty for
      incomparably more time than normal queues. So, if, e.g., one of the
      queues in a set of merged queues has a higher weight than a normal
      queue, then the shared queue may inherit such a high weight and, by
      staying almost always active, may force BFQ to perform I/O plugging
      most of the time. This evidently makes it harder for BFQ to let the
      device reach a high throughput.
      
      As a practical example of this problem, and of the benefits of this
      commit, we measured again the throughput in the nasty scenario
      considered in previous commit messages: dbench test (in the Phoronix
      suite), with 6 clients, on a filesystem with journaling, and with the
      journaling daemon enjoying a higher weight than normal processes. With
      this commit, the throughput grows from ~150 MB/s to ~200 MB/s on a
      PLEXTOR PX-256M5 SSD. This is the same peak throughput reached by any
      of the other I/O schedulers. As such, this is also likely to be the
      maximum possible throughput reachable with this workload on this
      device, because I/O is mostly random, and the other schedulers
      basically just pass I/O requests to the drive as fast as possible.
      
      [1] https://github.com/Algodev-github/STested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarFrancesco Pollicino <fra.fra.800@gmail.com>
      Signed-off-by: default avatarAlessio Masola <alessio.masola@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8cacc5ab
    • Paolo Valente's avatar
      block, bfq: tune service injection basing on request service times · 2341d662
      Paolo Valente authored
      The processes associated with a bfq_queue, say Q, may happen to
      generate their cumulative I/O at a lower rate than the rate at which
      the device could serve the same I/O. This is rather probable, e.g., if
      only one process is associated with Q and the device is an SSD. It
      results in Q becoming often empty while in service. If BFQ is not
      allowed to switch to another queue when Q becomes empty, then, during
      the service of Q, there will be frequent "service holes", i.e., time
      intervals during which Q gets empty and the device can only consume
      the I/O already queued in its hardware queues. This easily causes
      considerable losses of throughput.
      
      To counter this problem, BFQ implements a request injection mechanism,
      which tries to fill the above service holes with I/O requests taken
      from other bfq_queues. The hard part in this mechanism is finding the
      right amount of I/O to inject, so as to both boost throughput and not
      break Q's bandwidth and latency guarantees. To this goal, the current
      version of this mechanism measures the bandwidth enjoyed by Q while it
      is being served, and tries to inject the maximum possible amount of
      extra service that does not cause Q's bandwidth to decrease too
      much.
      
      This solution has an important shortcoming. For bandwidth measurements
      to be stable and reliable, Q must remain in service for a much longer
      time than that needed to serve a single I/O request. Unfortunately,
      this does not hold with many workloads. This commit addresses this
      issue by changing the way the amount of injection allowed is
      dynamically computed. It tunes injection as a function of the service
      times of single I/O requests of Q, instead of Q's
      bandwidth. Single-request service times are evidently meaningful even
      if Q gets very few I/O requests completed while it is in service.
      
      As a testbed for this new solution, we measured the throughput reached
      by BFQ for one of the nastiest workloads and configurations for this
      scheduler: the workload generated by the dbench test (in the Phoronix
      suite), with 6 clients, on a filesystem with journaling, and with the
      journaling daemon enjoying a higher weight than normal processes.
      With this commit, the throughput grows from ~100 MB/s to ~150 MB/s on
      a PLEXTOR PX-256M5.
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarFrancesco Pollicino <fra.fra.800@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2341d662
    • Paolo Valente's avatar
      block, bfq: do not idle for lowest-weight queues · fb53ac6c
      Paolo Valente authored
      In most cases, it is detrimental for throughput to plug I/O dispatch
      when the in-service bfq_queue becomes temporarily empty (plugging is
      performed to wait for the possible arrival, soon, of new I/O from the
      in-service queue). There is however a case where plugging is needed
      for service guarantees. If a bfq_queue, say Q, has a higher weight
      than some other active bfq_queue, and is sync, i.e., contains sync
      I/O, then, to guarantee that Q does receive a higher share of the
      throughput than other lower-weight queues, it is necessary to plug I/O
      dispatch when Q remains temporarily empty while being served.
      
      For this reason, BFQ performs I/O plugging when some active bfq_queue
      has a higher weight than some other active bfq_queue. But this is
      unnecessarily overkill. In fact, if the in-service bfq_queue actually
      has a weight lower than or equal to the other queues, then the queue
      *must not* be guaranteed a higher share of the throughput than the
      other queues. So, not plugging I/O cannot cause any harm to the
      queue. And can boost throughput.
      
      Taking advantage of this fact, this commit does not plug I/O for sync
      bfq_queues with a weight lower than or equal to the weights of the
      other queues. Here is an example of the resulting throughput boost
      with the dbench workload, which is particularly nasty for BFQ. With
      the dbench test in the Phoronix suite, BFQ reaches its lowest total
      throughput with 6 clients on a filesystem with journaling, in case the
      journaling daemon has a higher weight than normal processes. Before
      this commit, the total throughput was ~80 MB/sec on a PLEXTOR PX-256M5,
      after this commit it is ~100 MB/sec.
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fb53ac6c
    • Paolo Valente's avatar
      block, bfq: increase idling for weight-raised queues · 778c02a2
      Paolo Valente authored
      If a sync bfq_queue has a higher weight than some other queue, and
      remains temporarily empty while in service, then, to preserve the
      bandwidth share of the queue, it is necessary to plug I/O dispatching
      until a new request arrives for the queue. In addition, a timeout
      needs to be set, to avoid waiting for ever if the process associated
      with the queue has actually finished its I/O.
      
      Even with the above timeout, the device is however not fed with new
      I/O for a while, if the process has finished its I/O. If this happens
      often, then throughput drops and latencies grow. For this reason, the
      timeout is kept rather low: 8 ms is the current default.
      
      Unfortunately, such a low value may cause, on the opposite end, a
      violation of bandwidth guarantees for a process that happens to issue
      new I/O too late. The higher the system load, the higher the
      probability that this happens to some process. This is a problem in
      scenarios where service guarantees matter more than throughput. One
      important case are weight-raised queues, which need to be granted a
      very high fraction of the bandwidth.
      
      To address this issue, this commit lower-bounds the plugging timeout
      for weight-raised queues to 20 ms. This simple change provides
      relevant benefits. For example, on a PLEXTOR PX-256M5S, with which
      gnome-terminal starts in 0.6 seconds if there is no other I/O in
      progress, the same applications starts in
      - 0.8 seconds, instead of 1.2 seconds, if ten files are being read
        sequentially in parallel
      - 1 second, instead of 2 seconds, if, in parallel, five files are
        being read sequentially, and five more files are being written
        sequentially
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      778c02a2
    • Konstantin Khlebnikov's avatar
      block/bfq: fix ifdef for CONFIG_BFQ_GROUP_IOSCHED=y · 42b1bd33
      Konstantin Khlebnikov authored
      Replace BFQ_GROUP_IOSCHED_ENABLED with CONFIG_BFQ_GROUP_IOSCHED.
      Code under these ifdefs never worked, something might be broken.
      
      Fixes: 0471559c ("block, bfq: add/remove entity weights correctly")
      Fixes: 73d58118 ("block, bfq: consider also ioprio classes in symmetry detection")
      Reviewed-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      42b1bd33
  2. 31 Mar, 2019 9 commits
    • Linus Torvalds's avatar
      Linux 5.1-rc3 · 79a3aaa7
      Linus Torvalds authored
      79a3aaa7
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 63fc9c23
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "A collection of x86 and ARM bugfixes, and some improvements to
        documentation.
      
        On top of this, a cleanup of kvm_para.h headers, which were exported
        by some architectures even though they not support KVM at all. This is
        responsible for all the Kbuild changes in the diffstat"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
        Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
        KVM: doc: Document the life cycle of a VM and its resources
        KVM: selftests: complete IO before migrating guest state
        KVM: selftests: disable stack protector for all KVM tests
        KVM: selftests: explicitly disable PIE for tests
        KVM: selftests: assert on exit reason in CR4/cpuid sync test
        KVM: x86: update %rip after emulating IO
        x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
        kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
        KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
        kvm: don't redefine flags as something else
        kvm: mmu: Used range based flushing in slot_handle_level_range
        KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
        KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
        kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
        KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
        KVM: Reject device ioctls from processes other than the VM's creator
        KVM: doc: Fix incorrect word ordering regarding supported use of APIs
        KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
        KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
        ...
      63fc9c23
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 915ee0da
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A pile of x86 updates:
      
         - Prevent exceeding he valid physical address space in the /dev/mem
           limit checks.
      
         - Move all header content inside the header guard to prevent compile
           failures.
      
         - Fix the bogus __percpu annotation in this_cpu_has() which makes
           sparse very noisy.
      
         - Disable switch jump tables completely when retpolines are enabled.
      
         - Prevent leaking the trampoline address"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/realmode: Make set_real_mode_mem() static inline
        x86/cpufeature: Fix __percpu annotation in this_cpu_has()
        x86/mm: Don't exceed the valid physical address space
        x86/retpolines: Disable switch jump tables when retpolines are enabled
        x86/realmode: Don't leak the trampoline kernel address
        x86/boot: Fix incorrect ifdeffery scope
        x86/resctrl: Remove unused variable
      915ee0da
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 590627f7
      Linus Torvalds authored
      Pull perf tooling fixes from Thomas Gleixner:
       "Core libraries:
         - Fix max perf_event_attr.precise_ip detection.
         - Fix parser error for uncore event alias
         - Fixup ordering of kernel maps after obtaining the main kernel map
           address.
      
        Intel PT:
         - Fix TSC slip where A TSC packet can slip past MTC packets so that
           the timestamp appears to go backwards.
         - Fixes for exported-sql-viewer GUI conversion to python3.
      
        ARM coresight:
         - Fix the build by adding a missing case value for enumeration value
           introduced in newer library, that now is the required one.
      
        tool headers:
         - Syncronize kernel headers with the kernel, getting new io_uring and
           pidfd_send_signal syscalls so that 'perf trace' can handle them"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf pmu: Fix parser error for uncore event alias
        perf scripts python: exported-sql-viewer.py: Fix python3 support
        perf scripts python: exported-sql-viewer.py: Fix never-ending loop
        perf machine: Update kernel map address and re-order properly
        tools headers uapi: Sync powerpc's asm/kvm.h copy with the kernel sources
        tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd
        tools headers uapi: Update drm/i915_drm.h
        tools arch x86: Sync asm/cpufeatures.h with the kernel sources
        tools headers uapi: Sync linux/fcntl.h to get the F_SEAL_FUTURE_WRITE addition
        tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h
        perf evsel: Fix max perf_event_attr.precise_ip detection
        perf intel-pt: Fix TSC slip
        perf cs-etm: Add missing case value
      590627f7
    • Linus Torvalds's avatar
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c29d8541
      Linus Torvalds authored
      Pull CPU hotplug fixes from Thomas Gleixner:
       "Two SMT/hotplug related fixes:
      
         - Prevent crash when HOTPLUG_CPU is disabled and the CPU bringup
           aborts. This is triggered with the 'nosmt' command line option, but
           can happen by any abort condition. As the real unplug code is not
           compiled in, prevent the fail by keeping the CPU in zombie state.
      
         - Enforce HOTPLUG_CPU for SMP on x86 to avoid the above situation
           completely. With 'nosmt' being a popular option it's required to
           unplug the half brought up sibling CPUs (due to the MCE wreckage)
           completely"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
        cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
      c29d8541
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 573efdc5
      Linus Torvalds authored
      Pull locking fixlet from Thomas Gleixner:
       "Trivial update to the maintainers file"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MAINTAINERS: Remove deleted file from futex file pattern
      573efdc5
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f78b5be2
      Linus Torvalds authored
      Pull core fixes from Thomas Gleixner:
       "A small set of core updates:
      
         - Make the watchdog respect the selected CPU mask again. That was
           broken by the rework of the watchdog thread management and caused
           inconsistent state and NMI watchdog being unstoppable.
      
         - Ensure that the objtool build can find the libelf location.
      
         - Remove dead kcore stub code"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        watchdog: Respect watchdog cpumask on CPU hotplug
        objtool: Query pkg-config for libelf location
        proc/kcore: Remove unused kclist_add_remap()
      f78b5be2
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 6536c5f2
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Three non-regression fixes.
      
         - Our optimised memcmp could read past the end of one of the buffers
           and potentially trigger a page fault leading to an oops.
      
         - Some of our code to read energy management data on PowerVM had an
           endian bug leading to bogus results.
      
         - When reporting a machine check exception we incorrectly reported
           TLB multihits as D-Cache multhits due to a missing entry in the
           array of causes.
      
        Thanks to: Chandan Rajendra, Gautham R. Shenoy, Mahesh Salgaonkar,
        Segher Boessenkool, Vaidyanathan Srinivasan"
      
      * tag 'powerpc-5.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/pseries/mce: Fix misleading print for TLB mutlihit
        powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes
        powerpc/64: Fix memcmp reading past the end of src/dest
      6536c5f2
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-5.1-rc3' of git://git.infradead.org/users/vkoul/slave-dma · c877b3df
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
      
       - Revert "dmaengine: stm32-mdma: Add a check on read_u32_array" as that
         caused regression
      
       - Fix MAINTAINER file uniphier-mdmac.c file path
      
      * tag 'dmaengine-fix-5.1-rc3' of git://git.infradead.org/users/vkoul/slave-dma:
        MAINTAINERS: Fix uniphier-mdmac.c file path
        dmaengine: stm32-mdma: Revert "dmaengine: stm32-mdma: Add a check on read_u32_array"
      c877b3df
  3. 30 Mar, 2019 10 commits
    • Linus Torvalds's avatar
      Merge tag 'led-fixes-for-5.1-rc3' of... · b5c8314f
      Linus Torvalds authored
      Merge tag 'led-fixes-for-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
      
      Pull LED fixes from Jacek Anaszewski:
      
       - fix refcnt leak on interface rename
      
       - use memcpy in device_name_store() to avoid including garbage from a
         previous, longer value in the device_name
      
       - fix a potential NULL pointer dereference in case of_match_device()
         cannot find a match
      
      * tag 'led-fixes-for-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
        leds: trigger: netdev: use memcpy in device_name_store
        leds: pca9532: fix a potential NULL pointer dereference
        leds: trigger: netdev: fix refcnt leak on interface rename
      b5c8314f
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 3af9a525
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
       "As you can see [in the git history] I was away on leave and Bartosz
        kindly stepped in and collected a slew of fixes, I pulled them into my
        tree in two sets and merged some two more fixes (fixing my own caused
        bugs) on top.
      
        Summary:
      
         - Revert the extended use of gpio_set_config() and think about how we
           can do this properly.
      
         - Fix up the SPI CS GPIO handling so it now works properly on the SPI
           bus children, as intended.
      
         - Error paths and driver fixes"
      
      * tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: mockup: use simple_read_from_buffer() in debugfs read callback
        gpio: of: Fix of_gpiochip_add() error path
        gpio: of: Check for "spi-cs-high" in child instead of parent node
        gpio: of: Check propname before applying "cs-gpios" quirks
        gpio: mockup: fix debugfs read
        Revert "gpio: use new gpio_set_config() helper in more places"
        gpio: aspeed: fix a potential NULL pointer dereference
        gpio: amd-fch: Fix bogus SPDX identifier
        gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
        gpio: exar: add a check for the return value of ida_simple_get fails
      3af9a525
    • Rasmus Villemoes's avatar
      leds: trigger: netdev: use memcpy in device_name_store · 90934643
      Rasmus Villemoes authored
      If userspace doesn't end the input with a newline (which can easily
      happen if the write happens from a C program that does write(fd,
      iface, strlen(iface))), we may end up including garbage from a
      previous, longer value in the device_name. For example
      
      # cat device_name
      
      # printf 'eth12' > device_name
      # cat device_name
      eth12
      # printf 'eth3' > device_name
      # cat device_name
      eth32
      
      I highly doubt anybody is relying on this behaviour, so switch to
      simply copying the bytes (we've already checked that size is <
      IFNAMSIZ) and unconditionally zero-terminate it; of course, we also
      still have to strip a trailing newline.
      
      This is also preparation for future patches.
      
      Fixes: 06f502f5 ("leds: trigger: Introduce a NETDEV trigger")
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      90934643
    • Kangjie Lu's avatar
      leds: pca9532: fix a potential NULL pointer dereference · 0aab8e4d
      Kangjie Lu authored
      In case of_match_device cannot find a match, return -EINVAL to avoid
      NULL pointer dereference.
      
      Fixes: fa4191a6 ("leds: pca9532: Add device tree support")
      Signed-off-by: default avatarKangjie Lu <kjlu@umn.edu>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      0aab8e4d
    • Linus Torvalds's avatar
      Merge tag 'staging-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 32faca66
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are some small staging driver fixes for 5.1-rc3, and one driver
        removal.
      
        The biggest thing here is the removal of the mt7621-eth driver as a
        "real" network driver was merged in 5.1-rc1 for this hardware, so this
        old driver can now be removed.
      
        Other than that, there are just a number of small fixes, all resolving
        reported issues and some potential corner cases for error handling
        paths.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'staging-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: vt6655: Remove vif check from vnt_interrupt
        staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir()
        staging: octeon-ethernet: fix incorrect PHY mode
        staging: vc04_services: Fix an error code in vchiq_probe()
        staging: erofs: fix error handling when failed to read compresssed data
        staging: vt6655: Fix interrupt race condition on device start up.
        staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc
        staging: rtl8712: uninitialized memory in read_bbreg_hdl()
        staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference
        staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc
        staging, mt7621-pci: fix build without pci support
        staging: speakup_soft: Fix alternate speech with other synths
        staging: axis-fifo: add CONFIG_OF dependency
        staging: olpc_dcon_xo_1: add missing 'const' qualifier
        staging: comedi: ni_mio_common: Fix divide-by-zero for DIO cmdtest
        staging: erofs: fix to handle error path of erofs_vmap()
        staging: mt7621-dts: update ethernet settings.
        staging: remove mt7621-eth
      32faca66
    • Linus Torvalds's avatar
      Merge tag 'tty-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 52afe190
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some small tty and serial driver fixes for 5.1-rc3.
      
        Nothing major here, just a number of potential problems fixes for
        error handling paths, as well as some other minor bugfixes for
        reported issues with 5.1-rc1.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'tty-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: fix NULL pointer issue when tty_port ops is not set
        Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc
        dt-bindings: serial: Add compatible for Mediatek MT8183
        tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
        tty/serial: atmel: Add is_half_duplex helper
        serial: sh-sci: Fix setting SCSCR_TIE while transferring data
        serial: ar933x_uart: Fix build failure with disabled console
        tty: serial: qcom_geni_serial: Initialize baud in qcom_geni_console_setup
        sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
        tty: mxs-auart: fix a potential NULL pointer dereference
        tty: atmel_serial: fix a potential NULL pointer dereference
        serial: max310x: Fix to avoid potential NULL pointer dereference
        serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference
      52afe190
    • Linus Torvalds's avatar
      Merge tag 'usb-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 8d02a9a8
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 5.1-rc3.
      
        Nothing major at all here, just a small collection of fixes for
        reported issues, and potential problems with error handling paths.
        Also a few new device ids, as normal.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
        USB: serial: option: add Olicard 600
        USB: serial: cp210x: add new device id
        usb: u132-hcd: fix resource leak
        usb: cdc-acm: fix race during wakeup blocking TX traffic
        usb: mtu3: fix EXTCON dependency
        usb: usb251xb: fix to avoid potential NULL pointer dereference
        usb: core: Try generic PHY_MODE_USB_HOST if usb_phy_roothub_set_mode fails
        phy: sun4i-usb: Support set_mode to USB_HOST for non-OTG PHYs
        xhci: Don't let USB3 ports stuck in polling state prevent suspend
        usb: xhci: dbc: Don't free all memory with spinlock held
        xhci: Fix port resume done detection for SS ports with LPM enabled
        USB: serial: mos7720: fix mos_parport refcount imbalance on error path
        USB: gadget: f_hid: fix deadlock in f_hidg_write()
        usb: gadget: net2272: Fix net2272_dequeue()
        usb: gadget: net2280: Fix net2280_dequeue()
        usb: gadget: net2280: Fix overrun of OUT messages
        usb: dwc3: pci: add support for Comet Lake PCH ID
        usb: usb251xb: Remove unnecessary comparison of unsigned integer with >= 0
        usb: common: Consider only available nodes for dr_mode
        usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps
        ...
      8d02a9a8
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 782492a7
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "This corrects a previous attempt to make Linux use its own set of ACPI
        debug flags different from the upstream ACPICA's default (Erik
        Schmauss)"
      
      * tag 'acpi-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: use different default debug value than ACPICA
      782492a7
    • Linus Torvalds's avatar
      Merge tag 'pm-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 8e377a1c
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix CPU base frequency reporting in the intel_pstate driver and
        a use-after-free in the scpi-cpufreq driver.
      
        Specifics:
      
         - Fix the ACPI CPPC library to actually follow the specification when
           decoding the guaranteed performance register information and make
           the intel_pstate driver to fall back to the nominal frequency when
           reporting the base frequency if the guaranteed performance register
           information is not there (Srinivas Pandruvada).
      
         - Fix use-after-free in the exit callback of the scpi-cpufreq left
           after an update during the 5.0 development cycle (Vincent Stehlé)"
      
      * tag 'pm-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: scpi: Fix use after free
        cpufreq: intel_pstate: Also use CPPC nominal_perf for base_frequency
        ACPI / CPPC: Fix guaranteed performance handling
      8e377a1c
    • Linus Torvalds's avatar
      Merge branch 'fixes-v5.1-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 12195302
      Linus Torvalds authored
      Pull security layer fixes from James Morris:
       "Yama and LSM config fixes"
      
      * 'fixes-v5.1-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        LSM: Revive CONFIG_DEFAULT_SECURITY_* for "make oldconfig"
        Yama: mark local symbols as static
      12195302
  4. 29 Mar, 2019 2 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 922c010c
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "22 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
        fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
        fs: fs_parser: fix printk format warning
        checkpatch: add %pt as a valid vsprintf extension
        mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
        drivers/block/zram/zram_drv.c: fix idle/writeback string compare
        mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()
        mm/memory_hotplug.c: fix notification in offline error path
        ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
        fs/proc/kcore.c: make kcore_modules static
        include/linux/list.h: fix list_is_first() kernel-doc
        mm/debug.c: fix __dump_page when mapping->host is not set
        mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
        include/linux/hugetlb.h: convert to use vm_fault_t
        iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging
        mm: add support for kmem caches in DMA32 zone
        ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock
        mm/hotplug: fix offline undo_isolate_page_range()
        fs/open.c: allow opening only regular files during execve()
        mailmap: add Changbin Du
        mm/debug.c: add a cast to u64 for atomic64_read()
        ...
      922c010c
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · f9007cc6
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "Use memblock_alloc() instead of memblock_alloc_low() in
        request_standard_resources(), the latter being limited to the low 4G
        memory range on arm64"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: replace memblock_alloc_low with memblock_alloc
      f9007cc6