1. 22 Apr, 2016 19 commits
    • Seth Forshee's avatar
      fs: Don't remove suid for CAP_FSETID in s_user_ns · a81b4456
      Seth Forshee authored
      Expand the check in should_remove_suid() to keep privileges for
      CAP_FSETID in s_user_ns rather than init_user_ns.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      a81b4456
    • Seth Forshee's avatar
      fs: Allow superblock owner to change ownership of inodes with unmappable ids · a24b1241
      Seth Forshee authored
      In a userns mount some on-disk inodes may have ids which do not
      map into s_user_ns, in which case the in-kernel inodes are owned
      by invalid users. The superblock owner should be able to change
      attributes of these inodes but cannot. However it is unsafe to
      grant the superblock owner privileged access to all inodes in the
      superblock since proc, sysfs, etc. use DAC to protect files which
      may not belong to s_user_ns. The problem is restricted to only
      inodes where the owner or group is an invalid user.
      
      We can work around this by allowing users with CAP_CHOWN in
      s_user_ns to change an invalid owner or group id, so long as the
      other id is either invalid or mappable in s_user_ns. After
      changing ownership the user will be privileged towards the inode
      and thus able to change other attributes.
      
      As an precaution, checks for invalid ids are added to the proc
      and kernfs setattr interfaces. These filesystems are not expected
      to have inodes with invalid ids, but if it does happen any
      setattr operations will return -EPERM.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      a24b1241
    • Seth Forshee's avatar
      fs: Update posix_acl support to handle user namespace mounts · 34fc14a1
      Seth Forshee authored
      ids in on-disk ACLs should be converted to s_user_ns instead of
      init_user_ns as is done now. This introduces the possibility for
      id mappings to fail, and when this happens syscalls will return
      EOVERFLOW.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      34fc14a1
    • Seth Forshee's avatar
      fs: Refuse uid/gid changes which don't map into s_user_ns · a8354211
      Seth Forshee authored
      Add checks to inode_change_ok to verify that uid and gid changes
      will map into the superblock's user namespace. If they do not
      fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      a8354211
    • Seth Forshee's avatar
      cred: Reject inodes with invalid ids in set_create_file_as() · e2944f66
      Seth Forshee authored
      Using INVALID_[UG]ID for the LSM file creation context doesn't
      make sense, so return an error if the inode passed to
      set_create_file_as() has an invalid id.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      e2944f66
    • Seth Forshee's avatar
      fs: Check for invalid i_uid in may_follow_link() · 340a0c02
      Seth Forshee authored
      Filesystem uids which don't map into a user namespace may result
      in inode->i_uid being INVALID_UID. A symlink and its parent
      could have different owners in the filesystem can both get
      mapped to INVALID_UID, which may result in following a symlink
      when this would not have otherwise been permitted when protected
      symlinks are enabled.
      
      Add a new helper function, uid_valid_eq(), and use this to
      validate that the ids in may_follow_link() are both equal and
      valid. Also add an equivalent helper for gids, which is
      currently unused.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      340a0c02
    • Seth Forshee's avatar
      Smack: Handle labels consistently in untrusted mounts · c294ecad
      Seth Forshee authored
      The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled
      differently in untrusted mounts. This is confusing and
      potentically problematic. Change this to handle them all the same
      way that SMACK64 is currently handled; that is, read the label
      from disk and check it at use time. For SMACK64 and SMACK64MMAP
      access is denied if the label does not match smk_root. To be
      consistent with suid, a SMACK64EXEC label which does not match
      smk_root will still allow execution of the file but will not run
      with the label supplied in the xattr.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      c294ecad
    • Seth Forshee's avatar
      userns: Replace in_userns with current_in_userns · a8ec51e1
      Seth Forshee authored
      All current callers of in_userns pass current_user_ns as the
      first argument. Simplify by replacing in_userns with
      current_in_userns which checks whether current_user_ns is in the
      namespace supplied as an argument.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      a8ec51e1
    • Seth Forshee's avatar
      selinux: Add support for unprivileged mounts from user namespaces · 4b4db9d8
      Seth Forshee authored
      Security labels from unprivileged mounts in user namespaces must
      be ignored. Force superblocks from user namespaces whose labeling
      behavior is to use xattrs to use mountpoint labeling instead.
      For the mountpoint label, default to converting the current task
      context into a form suitable for file objects, but also allow the
      policy writer to specify a different label through policy
      transition rules.
      
      Pieced together from code snippets provided by Stephen Smalley.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      4b4db9d8
    • Andy Lutomirski's avatar
      fs: Treat foreign mounts as nosuid · c9476050
      Andy Lutomirski authored
      If a process gets access to a mount from a different user
      namespace, that process should not be able to take advantage of
      setuid files or selinux entrypoints from that filesystem.  Prevent
      this by treating mounts from other mount namespaces and those not
      owned by current_user_ns() or an ancestor as nosuid.
      
      This will make it safer to allow more complex filesystems to be
      mounted in non-root user namespaces.
      
      This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
      setgid, and file capability bits can no longer be abused if code in
      a user namespace were to clear nosuid on an untrusted filesystem,
      but this patch, by itself, is insufficient to protect the system
      from abuse of files that, when execed, would increase MAC privilege.
      
      As a more concrete explanation, any task that can manipulate a
      vfsmount associated with a given user namespace already has
      capabilities in that namespace and all of its descendents.  If they
      can cause a malicious setuid, setgid, or file-caps executable to
      appear in that mount, then that executable will only allow them to
      elevate privileges in exactly the set of namespaces in which they
      are already privileges.
      
      On the other hand, if they can cause a malicious executable to
      appear with a dangerous MAC label, running it could change the
      caller's security context in a way that should not have been
      possible, even inside the namespace in which the task is confined.
      
      As a hardening measure, this would have made CVE-2014-5207 much
      more difficult to exploit.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      c9476050
    • Seth Forshee's avatar
      block_dev: Check permissions towards block device inode when mounting · 6b4b3e9b
      Seth Forshee authored
      Unprivileged users should not be able to mount block devices when
      they lack sufficient privileges towards the block device inode.
      Update blkdev_get_by_path() to validate that the user has the
      required access to the inode at the specified path. The check
      will be skipped for CAP_SYS_ADMIN, so privileged mounts will
      continue working as before.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      6b4b3e9b
    • Seth Forshee's avatar
      block_dev: Support checking inode permissions in lookup_bdev() · 10a98749
      Seth Forshee authored
      When looking up a block device by path no permission check is
      done to verify that the user has access to the block device inode
      at the specified path. In some cases it may be necessary to
      check permissions towards the inode, such as allowing
      unprivileged users to mount block devices in user namespaces.
      
      Add an argument to lookup_bdev() to optionally perform this
      permission check. A value of 0 skips the permission check and
      behaves the same as before. A non-zero value specifies the mask
      of access rights required towards the inode at the specified
      path. The check is always skipped if the user has CAP_SYS_ADMIN.
      
      All callers of lookup_bdev() currently pass a mask of 0, so this
      patch results in no functional change. Subsequent patches will
      add permission checks where appropriate.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      10a98749
    • Seth Forshee's avatar
      fs: Allow sysfs and cgroupfs to share super blocks between user namespaces · 2fc00781
      Seth Forshee authored
      Both of these filesystems already have use cases for mounting the
      same super block from multiple user namespaces. For sysfs this
      happens when using criu for snapshotting a container, where sysfs
      is mounted in the containers network ns but the hosts user ns.
      The cgroup filesystem shares the same super block for all mounts
      of the same hierarchy regardless of the namespace.
      
      As a result, the restriction on mounting a super block from a
      single user namespace creates regressions for existing uses of
      these filesystems. For these specific filesystems this
      restriction isn't really necessary since the backing store is
      objects in kernel memory and thus the ids assigned from inodes
      is not subject to translation relative to s_user_ns.
      
      Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set
      causes sget_userns() to skip the check of s_user_ns. Set this
      flag for the sysfs and cgroup filesystems to fix the
      regressions.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      2fc00781
    • Seth Forshee's avatar
      fs: Remove check of s_user_ns for existing mounts in fs_fully_visible() · 5c9f53a7
      Seth Forshee authored
      fs_fully_visible() ignores MNT_LOCK_NODEV when FS_USERS_DEV_MOUNT
      is not set for the filesystem, but there is a bug in the logic
      that may cause mounting to fail. It is doing this only when the
      existing mount is not in init_user_ns but should check the new
      mount instead. But the new mount is always in a non-init
      namespace when fs_fully_visible() is called, so that condition
      can simply be removed.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      5c9f53a7
    • Pavel Tikhomirov's avatar
      fs: fix a posible leak of allocated superblock · 0e2c763c
      Pavel Tikhomirov authored
      We probably need to fix superblock leak in patch (v4 "fs: Add user
      namesapace member to struct super_block"):
      
      Imagine posible code path in sget_userns: we iterate through
      type->fs_supers and do not find suitable sb, we drop sb_lock to
      allocate s and go to retry. After we dropped sb_lock some other
      task from different userns takes sb_lock, it is already in retry
      stage and has s allocated, so it puts its s in type->fs_supers
      list. So in retry we will find these sb in list and check it has
      a different userns, and finally we will return without freeing s.
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Acked-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      0e2c763c
    • Seth Forshee's avatar
      Smack: Add support for unprivileged mounts from user namespaces · e823b4e3
      Seth Forshee authored
      Security labels from unprivileged mounts cannot be trusted.
      Ideally for these mounts we would assign the objects in the
      filesystem the same label as the inode for the backing device
      passed to mount. Unfortunately it's currently impossible to
      determine which inode this is from the LSM mount hooks, so we
      settle for the label of the process doing the mount.
      
      This label is assigned to s_root, and also to smk_default to
      ensure that new inodes receive this label. The transmute property
      is also set on s_root to make this behavior more explicit, even
      though it is technically not necessary.
      
      If a filesystem has existing security labels, access to inodes is
      permitted if the label is the same as smk_root, otherwise access
      is denied. The SMACK64EXEC xattr is completely ignored.
      
      Explicit setting of security labels continues to require
      CAP_MAC_ADMIN in init_user_ns.
      
      Altogether, this ensures that filesystem objects are not
      accessible to subjects which cannot already access the backing
      store, that MAC is not violated for any objects in the fileystem
      which are already labeled, and that a user cannot use an
      unprivileged mount to gain elevated MAC privileges.
      
      sysfs, tmpfs, and ramfs are already mountable from user
      namespaces and support security labels. We can't rule out the
      possibility that these filesystems may already be used in mounts
      from user namespaces with security lables set from the init
      namespace, so failing to trust lables in these filesystems may
      introduce regressions. It is safe to trust labels from these
      filesystems, since the unprivileged user does not control the
      backing store and thus cannot supply security labels, so an
      explicit exception is made to trust labels from these
      filesystems.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      e823b4e3
    • Seth Forshee's avatar
      fs: Limit file caps to the user namespace of the super block · c34ae98b
      Seth Forshee authored
      Capability sets attached to files must be ignored except in the
      user namespaces where the mounter is privileged, i.e. s_user_ns
      and its descendants. Otherwise a vector exists for gaining
      privileges in namespaces where a user is not already privileged.
      
      Add a new helper function, in_user_ns(), to test whether a user
      namespace is the same as or a descendant of another namespace.
      Use this helper to determine whether a file's capability set
      should be applied to the caps constructed during exec.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      c34ae98b
    • Eric W. Biederman's avatar
      userns: Simpilify MNT_NODEV handling. · 57a3711a
      Eric W. Biederman authored
      - Consolidate the testing if a device node may be opened in a new
        function may_open_dev.
      
      - Move the check for allowing access to device nodes on filesystems
        not mounted in the initial user namespace from mount time to open
        time and include it in may_open_dev.
      
      This set of changes removes the implicit adding of MNT_NODEV which
      simplifies the logic in fs/namespace.c and removes a potentially
      problematic difference in how normal and unprivileged mount
      namespaces work.  This is a user visible change in behavior for
      remount in unpriviliged mount namespaces but is unlikely to cause
      problems for existing software.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      57a3711a
    • Seth Forshee's avatar
      fs: Add user namesapace member to struct super_block · dafbc5d7
      Seth Forshee authored
      Initially this will be used to eliminate the implicit MNT_NODEV
      flag for mounts from user namespaces. In the future it will also
      be used for translating ids and checking capabilities for
      filesystems mounted from user namespaces.
      
      s_user_ns is initialized in alloc_super() and is generally set to
      current_user_ns(). To avoid security and corruption issues, two
      additional mount checks are also added:
      
       - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
         in current_user_ns().
      
       - sget() will fail with EBUSY when the filesystem it's looking
         for is already mounted from another user namespace.
      
      proc requires some special handling. The user namespace of
      current isn't appropriate when forking as a result of clone (2)
      with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
      namespace of the parent and make proc unmountable in the new user
      namespace. Instead, the user namespace which owns the new pid
      namespace is used. sget_userns() is allowed to allow passing in
      a namespace other than that of current, and sget becomes a
      wrapper around sget_userns() which passes current_user_ns().
      
      Changes to original version of this patch
        * Documented @user_ns in sget_userns, alloc_super and fs.h
        * Kept an blank line in fs.h
        * Removed unncessary include of user_namespace.h from fs.h
        * Tweaked the location of get_user_ns and put_user_ns so
          the security modules can (if they wish) depend on it.
        -- EWB
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      dafbc5d7
  2. 18 Apr, 2016 1 commit
  3. 17 Apr, 2016 5 commits
  4. 16 Apr, 2016 7 commits
  5. 15 Apr, 2016 8 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 2e572599
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes for the current series. This contains:
      
         - Two fixes for NVMe:
      
           One fixes a reset race that can be triggered by repeated
           insert/removal of the module.
      
           The other fixes an issue on some platforms, where we get probe
           timeouts since legacy interrupts isn't working.  This used not to
           be a problem since we had the worker thread poll for completions,
           but since that was killed off, it means those poor souls can't
           successfully probe their NVMe device.  Use a proper IRQ check and
           probe (msi-x -> msi ->legacy), like most other drivers to work
           around this.  Both from Keith.
      
         - A loop corruption issue with offset in iters, from Ming Lei.
      
         - A fix for not having the partition stat per cpu ref count
           initialized before sending out the KOBJ_ADD, which could cause user
           space to access the counter prior to initialization.  Also from
           Ming Lei.
      
         - A fix for using the wrong congestion state, from Kaixu Xia"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: loop: fix filesystem corruption in case of aio/dio
        NVMe: Always use MSI/MSI-x interrupts
        NVMe: Fix reset/remove race
        writeback: fix the wrong congested state variable definition
        block: partition: initialize percpuref before sending out KOBJ_ADD
      2e572599
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · f3c9a1ab
      Linus Torvalds authored
      Pull libnvdimm fixes from Ross Zwisler:
       "Two fixes:
      
         - Fix memcpy_from_pmem() to fallback to memcpy() for architectures
           where CONFIG_ARCH_HAS_PMEM_API=n.
      
         - Add a comment explaining why we write data twice when clearing
           poison in pmem_do_bvec().
      
        This has passed a boot test on an X86_32 config, which was the
        architecture where issue #1 above was first noticed"
      
      Dan Williams adds:
       "We're giving this multi-maintainer setup a shot, so expect libnvdimm
        pull requests from either Ross or I going forward"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, pmem: clarify the write+clear_poison+write flow
        pmem: fix BUG() error in pmem.h:48 on X86_32
      f3c9a1ab
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20160415' of git://git.infradead.org/linux-mtd · 29dde7c2
      Linus Torvalds authored
      Pull MTD fix from Brian Norris:
       "One MTD fix for v4.6-rc4:
      
        In the v4.4 cycle, we relaxed the requirement for assigning
        mtd->owner, but we didn't remove this error case.  It's hit only
        by drivers that are both:
      
         (a) using nand_scan() directly
        and
         (b) built as modules
      
        We haven't seen explicit complaints about this (most use cases don't
        fit one or both of the above), but we should definitely not be
        BUG()'ing here"
      
      * tag 'for-linus-20160415' of git://git.infradead.org/linux-mtd:
        mtd: nand: Drop mtd.owner requirement in nand_scan
      29dde7c2
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.6-rc3' of git://git.linaro.org/people/ulf.hansson/mmc · 2fffad12
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "Here are a couple of mmc fixes intended for v4.6 rc4.
      
        Regarding the fix for the regression about mmcblk device indexes.  The
        approach taken to solve the problem seems to be good enough.  There
        were some discussions around the solution, but it seems like people
        were happy about it in the end.
      
        MMC core:
         - Restore similar old behaviour when assigning mmcblk device indexes
      
        MMC host:
         - tegra: Disable UHS-I modes for Tegra124 to fix regression"
      
      * tag 'mmc-v4.6-rc3' of git://git.linaro.org/people/ulf.hansson/mmc:
        mmc: tegra: Disable UHS-I modes for Tegra124
        mmc: block: Use the mmc host device index as the mmcblk device index
      2fffad12
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · ab5f9eba
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This contains fixes for exynos, amdgpu, radeon, i915 and qxl.
      
        It also contains some fixes to the core drm edid parser.
      
        qxl:
         - fix for a cursor hotspot issue
      
        radeon:
         - some MST fixes that I've been running locally and make my monitor a
           bit happier
      
        exynos:
         - fix some regressions and build fixes
      
        amdgpu:
         - a couple of small fixes
      
        i915:
         - two DP MST fixes and a couple of other regression fixes
      
        Nothing too out of the ordinary or surprising at this point"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/exynos: Use VIDEO_SAMSUNG_S5P_G2D=n as G2D Kconfig dependency
        drm/exynos: fix a warning message
        drm/exynos: mic: fix an error code
        drm/exynos: fimd: fix broken dp_clock control
        drm/exynos: build fbdev code conditionally
        drm/exynos: fix adjusted_mode pointer in exynos_plane_mode_set
        drm/exynos: fix error handling in exynos_drm_subdrv_open
        drm/amd/amdgpu: fix irq domain remove for tonga ih
        drm/i915: fix deadlock on lid open
        drm/radeon: use helper for mst connector dpms.
        drm/radeon/mst: port some MST setup code from DAL.
        drm/amdgpu: add invisible pin size statistic
        drm/edid: Fix DMT 1024x768@43Hz (interlaced) timings
        drm/i915: Exit cherryview_irq_handler() after one pass
        drm/i915: Call intel_dp_mst_resume() before resuming displays
        drm/i915: Fix race condition in intel_dp_destroy_mst_connector()
        drm/edid: Fix parsing of EDID 1.4 Established Timings III descriptor
        drm/edid: Fix EDID Established Timings I and II
        drm/qxl: fix cursor position with non-zero hotspot
      ab5f9eba
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 60ea7bb0
      Linus Torvalds authored
      Pull parisc ftrace fixes from Helge Deller:
       "This is (most likely) the last pull request for v4.6 for the parisc
        architecture.
      
        It fixes the FTRACE feature for parisc, which is horribly broken since
         quite some time and doesn't even compile.  This patch just fixes the
        bare minimum (it actually removes more lines than it adds), so that
        the function tracer works again on 32- and 64bit kernels.
      
        I've queued up additional patches on top of this patch which e.g. add
        the syscall tracer, but those have to wait for the merge window for
        v4.7."
      
      * 'parisc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix ftrace function tracer
      60ea7bb0
    • Dan Williams's avatar
      libnvdimm, pmem: clarify the write+clear_poison+write flow · 0a370d26
      Dan Williams authored
      The ACPI specification does not specify the state of data after a clear
      poison operation.  Potential future libnvdimm bus implementations for
      other architectures also might not specify or disagree on the state of
      data after clear poison.  Clarify why we write twice.
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Reported-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      0a370d26
    • Ming Lei's avatar
      block: loop: fix filesystem corruption in case of aio/dio · a7297a6a
      Ming Lei authored
      Starting from commit e36f6204(block: split bios to max possible length),
      block core starts to split bio in the middle of bvec.
      
      Unfortunately loop dio/aio doesn't consider this situation, and
      always treat 'iter.iov_offset' as zero. Then filesystem corruption
      is observed.
      
      This patch figures out the offset of the base bvevc via
      'bio->bi_iter.bi_bvec_done' and fixes the issue by passing the offset
      to iov iterator.
      
      Fixes: e36f6204 (block: split bios to max possible length)
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org (4.5)
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a7297a6a