1. 09 Jul, 2019 19 commits
  2. 05 Jul, 2019 1 commit
    • Dennis Zhou's avatar
      blk-iolatency: fix STS_AGAIN handling · c9b3007f
      Dennis Zhou authored
      The iolatency controller is based on rq_qos. It increments on
      rq_qos_throttle() and decrements on either rq_qos_cleanup() or
      rq_qos_done_bio(). a3fb01ba fixes the double accounting issue where
      blk_mq_make_request() may call both rq_qos_cleanup() and
      rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
      double decrement.
      
      The above works upstream as the only way we can get STS_AGAIN is from
      blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
      problem as bio_endio() skipping only happens on reserved tag allocation
      failures which can only be caused by driver bugs and already triggers
      WARN.
      
      However, the fix creates a not so great dependency on how STS_AGAIN can
      be propagated. Internally, we (Facebook) carry a patch that kills read
      ahead if a cgroup is io congested or a fatal signal is pending. This
      combined with chained bios progagate their bi_status to the parent is
      not already set can can cause the parent bio to not clean up properly
      even though it was successful. This consequently leaks the inflight
      counter and can hang all IOs under that blkg.
      
      To nip the adverse interaction early, this removes the rq_qos_cleanup()
      callback in iolatency in favor of cleaning up always on the
      rq_qos_done_bio() path.
      
      Fixes: a3fb01ba ("blk-iolatency: only account submitted bios")
      Debugged-by: default avatarTejun Heo <tj@kernel.org>
      Debugged-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c9b3007f
  3. 03 Jul, 2019 3 commits
  4. 01 Jul, 2019 3 commits
    • Pavel Begunkov's avatar
      sbitmap: Replace cmpxchg with xchg · 41723288
      Pavel Begunkov authored
      cmpxchg() with an immediate value could be replaced with less expensive
      xchg(). The same true if new value don't _depend_ on the old one.
      
      In the second block, atomic_cmpxchg() return value isn't checked, so
      after atomic_cmpxchg() ->  atomic_xchg() conversion it could be replaced
      with atomic_set(). Comparison with atomic_read() in the second chunk was
      left as an optimisation (if that was the initial intention).
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      41723288
    • Ming Lei's avatar
      block: fix .bi_size overflow · 79d08f89
      Ming Lei authored
      'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
      bytes.
      
      Before 07173c3e ("block: enable multipage bvecs"), one bio can
      include very limited pages, and usually at most 256, so the fs bio
      size won't be bigger than 1M bytes most of times.
      
      Since we support multi-page bvec, in theory one fs bio really can
      be added > 1M pages, especially in case of hugepage, or big writeback
      with too many dirty pages. Then there is chance in which .bi_size
      is overflowed.
      
      Fixes this issue by using bio_full() to check if the added segment may
      overflow .bi_size.
      
      Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
      Cc: kernel test robot <rong.a.chen@intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: linux-xfs@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3e ("block: enable multipage bvecs")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      79d08f89
    • Jens Axboe's avatar
      Merge tag 'v5.2-rc6' into for-5.3/block · 5be1f9d8
      Jens Axboe authored
      Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
      fix. Otherwise we end up having conflicts with future patches between
      for-5.3/block and master that touch this area. In particular, it makes
      the bio_full() fix hard to backport to stable.
      
      * tag 'v5.2-rc6': (482 commits)
        Linux 5.2-rc6
        Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
        Bluetooth: Fix regression with minimum encryption key size alignment
        tcp: refine memory limit test in tcp_fragment()
        x86/vdso: Prevent segfaults due to hoisted vclock reads
        SUNRPC: Fix a credential refcount leak
        Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
        net :sunrpc :clnt :Fix xps refcount imbalance on the error path
        NFS4: Only set creation opendata if O_CREAT
        ARM: 8867/1: vdso: pass --be8 to linker if necessary
        KVM: nVMX: reorganize initial steps of vmx_set_nested_state
        KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
        habanalabs: use u64_to_user_ptr() for reading user pointers
        nfsd: replace Jeff by Chuck as nfsd co-maintainer
        inet: clear num_timeout reqsk_alloc()
        PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
        net: mvpp2: debugfs: Add pmap to fs dump
        ipv6: Default fib6_type to RTN_UNICAST when not set
        net: hns3: Fix inconsistent indenting
        net/af_iucv: always register net_device notifier
        ...
      5be1f9d8
  5. 29 Jun, 2019 14 commits