1. 04 Nov, 2019 21 commits
  2. 01 Nov, 2019 1 commit
    • Darrick J. Wong's avatar
      loop: fix no-unmap write-zeroes request behavior · efcfec57
      Darrick J. Wong authored
      Currently, if the loop device receives a WRITE_ZEROES request, it asks
      the underlying filesystem to punch out the range.  This behavior is
      correct if unmapping is allowed.  However, a NOUNMAP request means that
      the caller doesn't want us to free the storage backing the range, so
      punching out the range is incorrect behavior.
      
      To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
      underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
      the fallocate documentation) required to ensure that the entire range is
      backed by real storage, which suffices for our purposes.
      
      Fixes: 19372e27 ("loop: implement REQ_OP_WRITE_ZEROES")
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      efcfec57
  3. 25 Oct, 2019 1 commit
  4. 24 Oct, 2019 5 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · 9c6694bd
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.5/drivers
      
      Pull MD changes from Song.
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: no longer compare spare disk superblock events in super_load
        md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
        md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
        md/raid0: Fix an error message in raid0_make_request()
      9c6694bd
    • Yufen Yu's avatar
      md: no longer compare spare disk superblock events in super_load · 6a5cb53a
      Yufen Yu authored
      We have a test case as follow:
      
        mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] \
      	--assume-clean --bitmap=internal
        mdadm -S /dev/md1
        mdadm -A /dev/md1 /dev/sd[b-c] --run --force
      
        mdadm --zero /dev/sda
        mdadm /dev/md1 -a /dev/sda
      
        echo offline > /sys/block/sdc/device/state
        echo offline > /sys/block/sdb/device/state
        sleep 5
        mdadm -S /dev/md1
      
        echo running > /sys/block/sdb/device/state
        echo running > /sys/block/sdc/device/state
        mdadm -A /dev/md1 /dev/sd[a-c] --run --force
      
      When we readd /dev/sda to the array, it started to do recovery.
      After offline the other two disks in md1, the recovery have
      been interrupted and superblock update info cannot be written
      to the offline disks. While the spare disk (/dev/sda) can continue
      to update superblock info.
      
      After stopping the array and assemble it, we found the array
      run fail, with the follow kernel message:
      
      [  172.986064] md: kicking non-fresh sdb from array!
      [  173.004210] md: kicking non-fresh sdc from array!
      [  173.022383] md/raid1:md1: active with 0 out of 4 mirrors
      [  173.022406] md1: failed to create bitmap (-5)
      [  173.023466] md: md1 stopped.
      
      Since both sdb and sdc have the value of 'sb->events' smaller than
      that in sda, they have been kicked from the array. However, the only
      remained disk sda is in 'spare' state before stop and it cannot be
      added to conf->mirrors[] array. In the end, raid array assemble
      and run fail.
      
      In fact, we can use the older disk sdb or sdc to assemble the array.
      That means we should not choose the 'spare' disk as the fresh disk in
      analyze_sbs().
      
      To fix the problem, we do not compare superblock events when it is
      a spare disk, as same as validate_super.
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      6a5cb53a
    • David Jeffery's avatar
      md: improve handling of bio with REQ_PREFLUSH in md_flush_request() · 775d7831
      David Jeffery authored
      If pers->make_request fails in md_flush_request(), the bio is lost. To
      fix this, pass back a bool to indicate if the original make_request call
      should continue to handle the I/O and instead of assuming the flush logic
      will push it to completion.
      
      Convert md_flush_request to return a bool and no longer calls the raid
      driver's make_request function.  If the return is true, then the md flush
      logic has or will complete the bio and the md make_request call is done.
      If false, then the md make_request function needs to keep processing like
      it is a normal bio. Let the original call to md_handle_request handle any
      need to retry sending the bio to the raid driver's make_request function
      should it be needed.
      
      Also mark md_flush_request and the make_request function pointer as
      __must_check to issue warnings should these critical return values be
      ignored.
      
      Fixes: 2bc13b83 ("md: batch flush requests.")
      Cc: stable@vger.kernel.org # # v4.19+
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      775d7831
    • Guoqing Jiang's avatar
      md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit · fadcbd29
      Guoqing Jiang authored
      We need to move "spin_lock_irq(&bitmap->counts.lock)" before unmap previous
      storage, otherwise panic like belows could happen as follows.
      
      [  902.353802] sdl: detected capacity change from 1077936128 to 3221225472
      [  902.616948] general protection fault: 0000 [#1] SMP
      [snip]
      [  902.618588] CPU: 12 PID: 33698 Comm: md0_raid1 Tainted: G           O    4.14.144-1-pserver #4.14.144-1.1~deb10
      [  902.618870] Hardware name: Supermicro SBA-7142G-T4/BHQGE, BIOS 3.00       10/24/2012
      [  902.619120] task: ffff9ae1860fc600 task.stack: ffffb52e4c704000
      [  902.619301] RIP: 0010:bitmap_file_clear_bit+0x90/0xd0 [md_mod]
      [  902.619464] RSP: 0018:ffffb52e4c707d28 EFLAGS: 00010087
      [  902.619626] RAX: ffe8008b0d061000 RBX: ffff9ad078c87300 RCX: 0000000000000000
      [  902.619792] RDX: ffff9ad986341868 RSI: 0000000000000803 RDI: ffff9ad078c87300
      [  902.619986] RBP: ffff9ad0ed7a8000 R08: 0000000000000000 R09: 0000000000000000
      [  902.620154] R10: ffffb52e4c707ec0 R11: ffff9ad987d1ed44 R12: ffff9ad0ed7a8360
      [  902.620320] R13: 0000000000000003 R14: 0000000000060000 R15: 0000000000000800
      [  902.620487] FS:  0000000000000000(0000) GS:ffff9ad987d00000(0000) knlGS:0000000000000000
      [  902.620738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  902.620901] CR2: 000055ff12aecec0 CR3: 0000001005207000 CR4: 00000000000406e0
      [  902.621068] Call Trace:
      [  902.621256]  bitmap_daemon_work+0x2dd/0x360 [md_mod]
      [  902.621429]  ? find_pers+0x70/0x70 [md_mod]
      [  902.621597]  md_check_recovery+0x51/0x540 [md_mod]
      [  902.621762]  raid1d+0x5c/0xeb0 [raid1]
      [  902.621939]  ? try_to_del_timer_sync+0x4d/0x80
      [  902.622102]  ? del_timer_sync+0x35/0x40
      [  902.622265]  ? schedule_timeout+0x177/0x360
      [  902.622453]  ? call_timer_fn+0x130/0x130
      [  902.622623]  ? find_pers+0x70/0x70 [md_mod]
      [  902.622794]  ? md_thread+0x94/0x150 [md_mod]
      [  902.622959]  md_thread+0x94/0x150 [md_mod]
      [  902.623121]  ? wait_woken+0x80/0x80
      [  902.623280]  kthread+0x119/0x130
      [  902.623437]  ? kthread_create_on_node+0x60/0x60
      [  902.623600]  ret_from_fork+0x22/0x40
      [  902.624225] RIP: bitmap_file_clear_bit+0x90/0xd0 [md_mod] RSP: ffffb52e4c707d28
      
      Because mdadm was running on another cpu to do resize, so bitmap_resize was
      called to replace bitmap as below shows.
      
      PID: 38801  TASK: ffff9ad074a90e00  CPU: 0   COMMAND: "mdadm"
         [exception RIP: queued_spin_lock_slowpath+56]
         [snip]
      -- <NMI exception stack> --
       #5 [ffffb52e60f17c58] queued_spin_lock_slowpath at ffffffff9c0b27b8
       #6 [ffffb52e60f17c58] bitmap_resize at ffffffffc0399877 [md_mod]
       #7 [ffffb52e60f17d30] raid1_resize at ffffffffc0285bf9 [raid1]
       #8 [ffffb52e60f17d50] update_size at ffffffffc038a31a [md_mod]
       #9 [ffffb52e60f17d70] md_ioctl at ffffffffc0395ca4 [md_mod]
      
      And the procedure to keep resize bitmap safe is allocate new storage
      space, then quiesce, copy bits, replace bitmap, and re-start.
      
      However the daemon (bitmap_daemon_work) could happen even the array is
      quiesced, which means when bitmap_file_clear_bit is triggered by raid1d,
      then it thinks it should be fine to access store->filemap since
      counts->lock is held, but resize could change the storage without the
      protection of the lock.
      
      Cc: Jack Wang <jinpu.wang@cloud.ionos.com>
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      fadcbd29
    • Dan Carpenter's avatar
      md/raid0: Fix an error message in raid0_make_request() · e3fc3f3d
      Dan Carpenter authored
      The first argument to WARN() is supposed to be a condition.  The
      original code will just print the mdname() instead of the full warning
      message.
      
      Fixes: c84a1372 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      e3fc3f3d
  5. 18 Oct, 2019 1 commit
  6. 07 Oct, 2019 2 commits
  7. 06 Oct, 2019 4 commits
    • Linus Torvalds's avatar
      Linux 5.4-rc2 · da0c9ea1
      Linus Torvalds authored
      da0c9ea1
    • Linus Torvalds's avatar
      elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings · b212921b
      Linus Torvalds authored
      In commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") we
      changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
      executable mappings.
      
      Then, people reported that it broke some binaries that had overlapping
      segments from the same file, and commit ad55eac7 ("elf: enforce
      MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
      overlaying elf segment cases.  But only some - despite the summary line
      of that commit, it only did it when it also does a temporary brk vma for
      one obvious overlapping case.
      
      Now Russell King reports another overlapping case with old 32-bit x86
      binaries, which doesn't trigger that limited case.  End result: we had
      better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
      
      Yes, it's a sign of old binaries generated with old tool-chains, but we
      do pride ourselves on not breaking existing setups.
      
      This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
      and the old load_elf_library() use-cases, because nobody has reported
      breakage for those. Yet.
      
      Note that in all the cases seen so far, the overlapping elf sections
      seem to be just re-mapping of the same executable with different section
      attributes.  We could possibly introduce a new MAP_FIXED_NOFILECHANGE
      flag or similar, which acts like NOREPLACE, but allows just remapping
      the same executable file using different protection flags.
      
      It's not clear that would make a huge difference to anything, but if
      people really hate that "elf remaps over previous maps" behavior, maybe
      at least a more limited form of remapping would alleviate some concerns.
      
      Alternatively, we should take a look at our elf_map() logic to see if we
      end up not mapping things properly the first time.
      
      In the meantime, this is the minimal "don't do that then" patch while
      people hopefully think about it more.
      Reported-by: default avatarRussell King <linux@armlinux.org.uk>
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Fixes: ad55eac7 ("elf: enforce  MAP_FIXED on overlaying elf segments")
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b212921b
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping · 7cdb85df
      Linus Torvalds authored
      Pull dma-mapping regression fix from Christoph Hellwig:
       "Revert an incorret hunk from a patch that caused problems on various
        arm boards (Andrey Smirnov)"
      
      * tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: fix false positive warnings in dma_common_free_remap()
      7cdb85df
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 43b815c6
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A few fixes this time around:
      
         - Fixup of some clock specifications for DRA7 (device-tree fix)
      
         - Removal of some dead/legacy CPU OPP/PM code for OMAP that throws
           warnings at boot
      
         - A few more minor fixups for OMAPs, most around display
      
         - Enable STM32 QSPI as =y since their rootfs sometimes comes from
           there
      
         - Switch CONFIG_REMOTEPROC to =y since it went from tristate to bool
      
         - Fix of thermal zone definition for ux500 (5.4 regression)"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        ARM: multi_v7_defconfig: Fix SPI_STM32_QSPI support
        ARM: dts: ux500: Fix up the CPU thermal zone
        arm64/ARM: configs: Change CONFIG_REMOTEPROC from m to y
        ARM: dts: am4372: Set memory bandwidth limit for DISPC
        ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
        ARM: OMAP2+: Add missing LCDC midlemode for am335x
        ARM: OMAP2+: Fix missing reset done flag for am3 and am43
        ARM: dts: Fix gpio0 flags for am335x-icev2
        ARM: omap2plus_defconfig: Enable more droid4 devices as loadable modules
        ARM: omap2plus_defconfig: Enable DRM_TI_TFP410
        DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again
        ARM: dts: Fix wrong clocks for dra7 mcasp
        clk: ti: dra7: Fix mcasp8 clock bits
      43b815c6
  8. 05 Oct, 2019 5 commits
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.4' of... · 2d00aee2
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - remove unneeded ar-option and KBUILD_ARFLAGS
      
       - remove long-deprecated SUBDIRS
      
       - fix modpost to suppress false-positive warnings for UML builds
      
       - fix namespace.pl to handle relative paths to ${objtree}, ${srctree}
      
       - make setlocalversion work for /bin/sh
      
       - make header archive reproducible
      
       - fix some Makefiles and documents
      
      * tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kheaders: make headers archive reproducible
        kbuild: update compile-test header list for v5.4-rc2
        kbuild: two minor updates for Documentation/kbuild/modules.rst
        scripts/setlocalversion: clear local variable to make it work for sh
        namespace: fix namespace.pl script to support relative paths
        video/logo: do not generate unneeded logo C files
        video/logo: remove unneeded *.o pattern from clean-files
        integrity: remove pointless subdir-$(CONFIG_...)
        integrity: remove unneeded, broken attempt to add -fshort-wchar
        modpost: fix static EXPORT_SYMBOL warnings for UML build
        kbuild: correct formatting of header in kbuild module docs
        kbuild: remove SUBDIRS support
        kbuild: remove ar-option and KBUILD_ARFLAGS
      2d00aee2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 126195c9
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Twelve patches mostly small but obvious fixes or cosmetic but small
        updates"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix Nport ID display value
        scsi: qla2xxx: Fix N2N link up fail
        scsi: qla2xxx: Fix N2N link reset
        scsi: qla2xxx: Optimize NPIV tear down process
        scsi: qla2xxx: Fix stale mem access on driver unload
        scsi: qla2xxx: Fix unbound sleep in fcport delete path.
        scsi: qla2xxx: Silence fwdump template message
        scsi: hisi_sas: Make three functions static
        scsi: megaraid: disable device when probe failed after enabled device
        scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
        scsi: qedf: Remove always false 'tmp_prio < 0' statement
        scsi: ufs: skip shutdown if hba is not powered
        scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF
      126195c9
    • Linus Torvalds's avatar
      Merge branch 'readdir' (readdir speedup and sanity checking) · 4f11918a
      Linus Torvalds authored
      This makes getdents() and getdents64() do sanity checking on the
      pathname that it gives to user space.  And to mitigate the performance
      impact of that, it first cleans up the way it does the user copying, so
      that the code avoids doing the SMAP/PAN updates between each part of the
      dirent structure write.
      
      I really wanted to do this during the merge window, but didn't have
      time.  The conversion of filldir to unsafe_put_user() is something I've
      had around for years now in a private branch, but the extra pathname
      checking finally made me clean it up to the point where it is mergable.
      
      It's worth noting that the filename validity checking really should be a
      bit smarter: it would be much better to delay the error reporting until
      the end of the readdir, so that non-corrupted filenames are still
      returned.  But that involves bigger changes, so let's see if anybody
      actually hits the corrupt directory entry case before worrying about it
      further.
      
      * branch 'readdir':
        Make filldir[64]() verify the directory entry filename is valid
        Convert filldir[64]() from __put_user() to unsafe_put_user()
      4f11918a
    • Linus Torvalds's avatar
      Make filldir[64]() verify the directory entry filename is valid · 8a23eb80
      Linus Torvalds authored
      This has been discussed several times, and now filesystem people are
      talking about doing it individually at the filesystem layer, so head
      that off at the pass and just do it in getdents{64}().
      
      This is partially based on a patch by Jann Horn, but checks for NUL
      bytes as well, and somewhat simplified.
      
      There's also commentary about how it might be better if invalid names
      due to filesystem corruption don't cause an immediate failure, but only
      an error at the end of the readdir(), so that people can still see the
      filenames that are ok.
      
      There's also been discussion about just how much POSIX strictly speaking
      requires this since it's about filesystem corruption.  It's really more
      "protect user space from bad behavior" as pointed out by Jann.  But
      since Eric Biederman looked up the POSIX wording, here it is for context:
      
       "From readdir:
      
         The readdir() function shall return a pointer to a structure
         representing the directory entry at the current position in the
         directory stream specified by the argument dirp, and position the
         directory stream at the next entry. It shall return a null pointer
         upon reaching the end of the directory stream. The structure dirent
         defined in the <dirent.h> header describes a directory entry.
      
        From definitions:
      
         3.129 Directory Entry (or Link)
      
         An object that associates a filename with a file. Several directory
         entries can associate names with the same file.
      
        ...
      
         3.169 Filename
      
         A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
         characters composing the name may be selected from the set of all
         character values excluding the slash character and the null byte. The
         filenames dot and dot-dot have special meaning. A filename is
         sometimes referred to as a 'pathname component'."
      
      Note that I didn't bother adding the checks to any legacy interfaces
      that nobody uses.
      
      Also note that if this ends up being noticeable as a performance
      regression, we can fix that to do a much more optimized model that
      checks for both NUL and '/' at the same time one word at a time.
      
      We haven't really tended to optimize 'memchr()', and it only checks for
      one pattern at a time anyway, and we really _should_ check for NUL too
      (but see the comment about "soft errors" in the code about why it
      currently only checks for '/')
      
      See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
      lookup code looks for pathname terminating characters in parallel.
      
      Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a23eb80
    • Linus Torvalds's avatar
      Convert filldir[64]() from __put_user() to unsafe_put_user() · 9f79b78e
      Linus Torvalds authored
      We really should avoid the "__{get,put}_user()" functions entirely,
      because they can easily be mis-used and the original intent of being
      used for simple direct user accesses no longer holds in a post-SMAP/PAN
      world.
      
      Manually optimizing away the user access range check makes no sense any
      more, when the range check is generally much cheaper than the "enable
      user accesses" code that the __{get,put}_user() functions still need.
      
      So instead of __put_user(), use the unsafe_put_user() interface with
      user_access_{begin,end}() that really does generate better code these
      days, and which is generally a nicer interface.  Under some loads, the
      multiple user writes that filldir() does are actually quite noticeable.
      
      This also makes the dirent name copy use unsafe_put_user() with a couple
      of macros.  We do not want to make function calls with SMAP/PAN
      disabled, and the code this generates is quite good when the
      architecture uses "asm goto" for unsafe_put_user() like x86 does.
      
      Note that this doesn't bother with the legacy cases.  Nobody should use
      them anyway, so performance doesn't really matter there.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f79b78e