1. 28 Jun, 2018 2 commits
  2. 27 Jun, 2018 1 commit
    • Mike Snitzer's avatar
      dm thin: handle running out of data space vs concurrent discard · a685557f
      Mike Snitzer authored
      Discards issued to a DM thin device can complete to userspace (via
      fstrim) _before_ the metadata changes associated with the discards is
      reflected in the thinp superblock (e.g. free blocks).  As such, if a
      user constructs a test that loops repeatedly over these steps, block
      allocation can fail due to discards not having completed yet:
      1) fill thin device via filesystem file
      2) remove file
      3) fstrim
      
      From initial report, here:
      https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html
      
      "The root cause of this issue is that dm-thin will first remove
      mapping and increase corresponding blocks' reference count to prevent
      them from being reused before DISCARD bios get processed by the
      underlying layers. However. increasing blocks' reference count could
      also increase the nr_allocated_this_transaction in struct sm_disk
      which makes smd->old_ll.nr_allocated +
      smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
      In this case, alloc_data_block() will never commit metadata to reset
      the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
      always return an underflow value."
      
      While there is room for improvement to the space-map accounting that
      thinp is making use of: the reality is this test is inherently racey and
      will result in the previous iteration's fstrim's discard(s) completing
      vs concurrent block allocation, via dd, in the next iteration of the
      loop.
      
      No amount of space map accounting improvements will be able to allow
      user's to use a block before a discard of that block has completed.
      
      So the best we can really do is allow DM thinp to gracefully handle such
      aggressive use of all the pool's data by degrading the pool into
      out-of-data-space (OODS) mode.  We _should_ get that behaviour already
      (if space map accounting didn't falsely cause alloc_data_block() to
      believe free space was available).. but short of that we handle the
      current reality that dm_pool_alloc_data_block() can return -ENOSPC.
      Reported-by: default avatarDennis Yang <dennisyang@qnap.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      a685557f
  3. 22 Jun, 2018 5 commits
    • Arnd Bergmann's avatar
      dm raid: don't use 'const' in function return · f2ccaa59
      Arnd Bergmann authored
      A newly introduced function has 'const int' as the return type,
      but as "make W=1" reports, that has no meaning:
      
      drivers/md/dm-raid.c:510:18: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
      
      This changes the return type to plain 'int'.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 33e53f06 ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Fixes: 552aa679 ("dm raid: use rs_is_raid*()")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f2ccaa59
    • Bart Van Assche's avatar
      dm zoned: avoid triggering reclaim from inside dmz_map() · 2d0b2d64
      Bart Van Assche authored
      This patch avoids that lockdep reports the following:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-rc1 #62 Not tainted
      ------------------------------------------------------
      kswapd0/84 is trying to acquire lock:
      00000000c313516d (&xfs_nondir_ilock_class){++++}, at: xfs_free_eofblocks+0xa2/0x1e0
      
      but task is already holding lock:
      00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (fs_reclaim){+.+.}:
        kmem_cache_alloc+0x2c/0x2b0
        radix_tree_node_alloc.constprop.19+0x3d/0xc0
        __radix_tree_create+0x161/0x1c0
        __radix_tree_insert+0x45/0x210
        dmz_map+0x245/0x2d0 [dm_zoned]
        __map_bio+0x40/0x260
        __split_and_process_non_flush+0x116/0x220
        __split_and_process_bio+0x81/0x180
        __dm_make_request.isra.32+0x5a/0x100
        generic_make_request+0x36e/0x690
        submit_bio+0x6c/0x140
        mpage_readpages+0x19e/0x1f0
        read_pages+0x6d/0x1b0
        __do_page_cache_readahead+0x21b/0x2d0
        force_page_cache_readahead+0xc4/0x100
        generic_file_read_iter+0x7c6/0xd20
        __vfs_read+0x102/0x180
        vfs_read+0x9b/0x140
        ksys_read+0x55/0xc0
        do_syscall_64+0x5a/0x1f0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      -> #1 (&dmz->chunk_lock){+.+.}:
        dmz_map+0x133/0x2d0 [dm_zoned]
        __map_bio+0x40/0x260
        __split_and_process_non_flush+0x116/0x220
        __split_and_process_bio+0x81/0x180
        __dm_make_request.isra.32+0x5a/0x100
        generic_make_request+0x36e/0x690
        submit_bio+0x6c/0x140
        _xfs_buf_ioapply+0x31c/0x590
        xfs_buf_submit_wait+0x73/0x520
        xfs_buf_read_map+0x134/0x2f0
        xfs_trans_read_buf_map+0xc3/0x580
        xfs_read_agf+0xa5/0x1e0
        xfs_alloc_read_agf+0x59/0x2b0
        xfs_alloc_pagf_init+0x27/0x60
        xfs_bmap_longest_free_extent+0x43/0xb0
        xfs_bmap_btalloc_nullfb+0x7f/0xf0
        xfs_bmap_btalloc+0x428/0x7c0
        xfs_bmapi_write+0x598/0xcc0
        xfs_iomap_write_allocate+0x15a/0x330
        xfs_map_blocks+0x1cf/0x3f0
        xfs_do_writepage+0x15f/0x7b0
        write_cache_pages+0x1ca/0x540
        xfs_vm_writepages+0x65/0xa0
        do_writepages+0x48/0xf0
        __writeback_single_inode+0x58/0x730
        writeback_sb_inodes+0x249/0x5c0
        wb_writeback+0x11e/0x550
        wb_workfn+0xa3/0x670
        process_one_work+0x228/0x670
        worker_thread+0x3c/0x390
        kthread+0x11c/0x140
        ret_from_fork+0x3a/0x50
      
      -> #0 (&xfs_nondir_ilock_class){++++}:
        down_read_nested+0x43/0x70
        xfs_free_eofblocks+0xa2/0x1e0
        xfs_fs_destroy_inode+0xac/0x270
        dispose_list+0x51/0x80
        prune_icache_sb+0x52/0x70
        super_cache_scan+0x127/0x1a0
        shrink_slab.part.47+0x1bd/0x590
        shrink_node+0x3b5/0x470
        balance_pgdat+0x158/0x3b0
        kswapd+0x1ba/0x600
        kthread+0x11c/0x140
        ret_from_fork+0x3a/0x50
      
      other info that might help us debug this:
      
      Chain exists of:
        &xfs_nondir_ilock_class --> &dmz->chunk_lock --> fs_reclaim
      
      Possible unsafe locking scenario:
      
           CPU0                    CPU1
           ----                    ----
      lock(fs_reclaim);
                                   lock(&dmz->chunk_lock);
                                   lock(fs_reclaim);
      lock(&xfs_nondir_ilock_class);
      
      *** DEADLOCK ***
      
      3 locks held by kswapd0/84:
       #0: 00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
       #1: 000000000f8208f5 (shrinker_rwsem){++++}, at: shrink_slab.part.47+0x3f/0x590
       #2: 00000000cacefa54 (&type->s_umount_key#43){.+.+}, at: trylock_super+0x16/0x50
      
      stack backtrace:
      CPU: 7 PID: 84 Comm: kswapd0 Not tainted 4.18.0-rc1 #62
      Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
      Call Trace:
       dump_stack+0x85/0xcb
       print_circular_bug.isra.36+0x1ce/0x1db
       __lock_acquire+0x124e/0x1310
       lock_acquire+0x9f/0x1f0
       down_read_nested+0x43/0x70
       xfs_free_eofblocks+0xa2/0x1e0
       xfs_fs_destroy_inode+0xac/0x270
       dispose_list+0x51/0x80
       prune_icache_sb+0x52/0x70
       super_cache_scan+0x127/0x1a0
       shrink_slab.part.47+0x1bd/0x590
       shrink_node+0x3b5/0x470
       balance_pgdat+0x158/0x3b0
       kswapd+0x1ba/0x600
       kthread+0x11c/0x140
       ret_from_fork+0x3a/0x50
      Reported-by: default avatarMasato Suzuki <masato.suzuki@wdc.com>
      Fixes: 4218a955 ("dm zoned: use GFP_NOIO in I/O path")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2d0b2d64
    • Kees Cook's avatar
      dm writecache: use 2-factor allocator arguments · 50a7d3ba
      Kees Cook authored
      This adjusts the allocator calls to use the 2-factor argument style, as
      already done treewide for better defense against allocator overflows.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      [snitzer: tweaked code to leave assignment in a test alone]
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      50a7d3ba
    • Mike Snitzer's avatar
      dm thin metadata: remove needless work from __commit_transaction · 7ccdbf85
      Mike Snitzer authored
      Commit 5a32083d ("dm: take care to copy the space map roots before
      locking the superblock") properly removed the calls to dm_sm_root_size()
      from __write_initial_superblock().  But the dm_sm_root_size() calls were
      left dangling in __commit_transaction().
      
      Fixes: 5a32083d ("dm: take care to copy the space map roots before locking the superblock")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      7ccdbf85
    • Mike Snitzer's avatar
      dm: use bio_split() when splitting out the already processed bio · f21c601a
      Mike Snitzer authored
      Use of bio_clone_bioset() is inefficient if there is no need to clone
      the original bio's bio_vec array.  Best to use the bio_clone_fast()
      variant.  Also, just using bio_advance() is only part of what is needed
      to properly setup the clone -- it doesn't account for the various
      bio_integrity() related work that also needs to be performed (see
      bio_split).
      
      Address both of these issues by switching from bio_clone_bioset() to
      bio_split().
      
      Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
      Cc: stable@vger.kernel.org # 4.15+, requires removal of '&' before md->queue->bio_split
      Reported-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f21c601a
  4. 16 Jun, 2018 8 commits
    • Linus Torvalds's avatar
      Linux 4.18-rc1 · ce397d21
      Linus Torvalds authored
      ce397d21
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block · 265c5596
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes that should go into -rc1. This contains:
      
         - bsg_open vs bsg_unregister race fix (Anatoliy)
      
         - NVMe pull request from Christoph, with fixes for regressions in
           this window, FC connect/reconnect path code unification, and a
           trace point addition.
      
         - timeout fix (Christoph)
      
         - remove a few unused functions (Christoph)
      
         - blk-mq tag_set reinit fix (Roman)"
      
      * tag 'for-linus-20180616' of git://git.kernel.dk/linux-block:
        bsg: fix race of bsg_open and bsg_unregister
        block: remov blk_queue_invalidate_tags
        nvme-fabrics: fix and refine state checks in __nvmf_check_ready
        nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
        nvme-fabrics: refactor queue ready check
        blk-mq: remove blk_mq_tagset_iter
        nvme: remove nvme_reinit_tagset
        nvme-fc: fix nulling of queue data on reconnect
        nvme-fc: remove reinit_request routine
        blk-mq: don't time out requests again that are in the timeout handler
        nvme-fc: change controllers first connect to use reconnect path
        nvme: don't rely on the changed namespace list log
        nvmet: free smart-log buffer after use
        nvme-rdma: fix error flow during mapping request data
        nvme: add bio remapping tracepoint
        nvme: fix NULL pointer dereference in nvme_init_subsystem
        blk-mq: reinit q->tag_set_list entry only after grace period
      265c5596
    • Linus Torvalds's avatar
      Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental · 5e7b9212
      Linus Torvalds authored
      Pull documentation fixes from Mauro Carvalho Chehab:
       "This solves a series of broken links for files under Documentation,
        and improves a script meant to detect such broken links (see
        scripts/documentation-file-ref-check).
      
        The changes on this series are:
      
         - can.rst: fix a footnote reference;
      
         - crypto_engine.rst: Fix two parsing warnings;
      
         - Fix a lot of broken references to Documentation/*;
      
         - improve the scripts/documentation-file-ref-check script, in order
           to help detecting/fixing broken references, preventing
           false-positives.
      
        After this patch series, only 33 broken references to doc files are
        detected by scripts/documentation-file-ref-check"
      
      * tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
        fix a series of Documentation/ broken file name references
        Documentation: rstFlatTable.py: fix a broken reference
        ABI: sysfs-devices-system-cpu: remove a broken reference
        devicetree: fix a series of wrong file references
        devicetree: fix name of pinctrl-bindings.txt
        devicetree: fix some bindings file names
        MAINTAINERS: fix location of DT npcm files
        MAINTAINERS: fix location of some display DT bindings
        kernel-parameters.txt: fix pointers to sound parameters
        bindings: nvmem/zii: Fix location of nvmem.txt
        docs: Fix more broken references
        scripts/documentation-file-ref-check: check tools/*/Documentation
        scripts/documentation-file-ref-check: get rid of false-positives
        scripts/documentation-file-ref-check: hint: dash or underline
        scripts/documentation-file-ref-check: add a fix logic for DT
        scripts/documentation-file-ref-check: accept more wildcards at filenames
        scripts/documentation-file-ref-check: fix help message
        media: max2175: fix location of driver's companion documentation
        media: v4l: fix broken video4linux docs locations
        media: dvb: point to the location of the old README.dvb-usb file
        ...
      5e7b9212
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · dbb2816f
      Linus Torvalds authored
      Pull fsnotify updates from Jan Kara:
       "fsnotify cleanups unifying handling of different watch types.
      
        This is the shortened fsnotify series from Amir with the last five
        patches pulled out. Amir has modified those patches to not change
        struct inode but obviously it's too late for those to go into this
        merge window"
      
      * tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fsnotify: add fsnotify_add_inode_mark() wrappers
        fanotify: generalize fanotify_should_send_event()
        fsnotify: generalize send_to_group()
        fsnotify: generalize iteration of marks by object type
        fsnotify: introduce marks iteration helpers
        fsnotify: remove redundant arguments to handle_event()
        fsnotify: use type id to identify connector object type
      dbb2816f
    • Linus Torvalds's avatar
      Merge tag 'fbdev-v4.18' of git://github.com/bzolnier/linux · 644f2639
      Linus Torvalds authored
      Pull fbdev updates from Bartlomiej Zolnierkiewicz:
       "There is nothing really major here, few small fixes, some cleanups and
        dead drivers removal:
      
         - mark omapfb drivers as orphans in MAINTAINERS file (Tomi Valkeinen)
      
         - add missing module license tags to omap/omapfb driver (Arnd
           Bergmann)
      
         - add missing GPIOLIB dependendy to omap2/omapfb driver (Arnd
           Bergmann)
      
         - convert savagefb, aty128fb & radeonfb drivers to use msleep & co.
           (Jia-Ju Bai)
      
         - allow COMPILE_TEST build for viafb driver (media part was reviewed
           by media subsystem Maintainer)
      
         - remove unused MERAM support from sh_mobile_lcdcfb and shmob-drm
           drivers (drm parts were acked by shmob-drm driver Maintainer)
      
         - remove unused auo_k190xfb drivers
      
         - misc cleanups (Souptick Joarder, Wolfram Sang, Markus Elfring, Andy
           Shevchenko, Colin Ian King)"
      
      * tag 'fbdev-v4.18' of git://github.com/bzolnier/linux: (26 commits)
        fb_omap2: add gpiolib dependency
        video/omap: add module license tags
        MAINTAINERS: make omapfb orphan
        video: fbdev: pxafb: match_string() conversion fixup
        video: fbdev: nvidia: fix spelling mistake: "scaleing" -> "scaling"
        video: fbdev: fix spelling mistake: "frambuffer" -> "framebuffer"
        video: fbdev: pxafb: Convert to use match_string() helper
        video: fbdev: via: allow COMPILE_TEST build
        video: fbdev: remove unused sh_mobile_meram driver
        drm: shmobile: remove unused MERAM support
        video: fbdev: sh_mobile_lcdcfb: remove unused MERAM support
        video: fbdev: remove unused auo_k190xfb drivers
        video: omap: Improve a size determination in omapfb_do_probe()
        video: sm501fb: Improve a size determination in sm501fb_probe()
        video: fbdev-MMP: Improve a size determination in path_init()
        video: fbdev-MMP: Delete an error message for a failed memory allocation in two functions
        video: auo_k190x: Delete an error message for a failed memory allocation in auok190x_common_probe()
        video: sh_mobile_lcdcfb: Delete an error message for a failed memory allocation in two functions
        video: sh_mobile_meram: Delete an error message for a failed memory allocation in sh_mobile_meram_probe()
        video: fbdev: sh_mobile_meram: Drop SUPERH platform dependency
        ...
      644f2639
    • Linus Torvalds's avatar
      Merge branch 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 35773c93
      Linus Torvalds authored
      Pull AFS updates from Al Viro:
       "Assorted AFS stuff - ended up in vfs.git since most of that consists
        of David's AFS-related followups to Christoph's procfs series"
      
      * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        afs: Optimise callback breaking by not repeating volume lookup
        afs: Display manually added cells in dynamic root mount
        afs: Enable IPv6 DNS lookups
        afs: Show all of a server's addresses in /proc/fs/afs/servers
        afs: Handle CONFIG_PROC_FS=n
        proc: Make inline name size calculation automatic
        afs: Implement network namespacing
        afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
        afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
        proc: Add a way to make network proc files writable
        afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
        afs: Rearrange fs/afs/proc.c to move the show routines up
        afs: Rearrange fs/afs/proc.c by moving fops and open functions down
        afs: Move /proc management functions to the end of the file
      35773c93
    • Linus Torvalds's avatar
      Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 29d6849d
      Linus Torvalds authored
      Pull compat updates from Al Viro:
       "Some biarch patches - getting rid of assorted (mis)uses of
        compat_alloc_user_space().
      
        Not much in that area this cycle..."
      
      * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        orangefs: simplify compat ioctl handling
        signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
        vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
      29d6849d
    • Linus Torvalds's avatar
      Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a5b729ea
      Linus Torvalds authored
      Pull aio fixes from Al Viro:
       "Assorted AIO followups and fixes"
      
      * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        eventpoll: switch to ->poll_mask
        aio: only return events requested in poll_mask() for IOCB_CMD_POLL
        eventfd: only return events requested in poll_mask()
        aio: mark __aio_sigset::sigmask const
      a5b729ea
  5. 15 Jun, 2018 24 commits