1. 11 Nov, 2019 1 commit
    • Guenter Roeck's avatar
      nvme: Add hardware monitoring support · 400b6a7b
      Guenter Roeck authored
      nvme devices report temperature information in the controller information
      (for limits) and in the smart log. Currently, the only means to retrieve
      this information is the nvme command line interface, which requires
      super-user privileges.
      
      At the same time, it would be desirable to be able to use NVMe temperature
      information for thermal control.
      
      This patch adds support to read NVMe temperatures from the kernel using the
      hwmon API and adds temperature zones for NVMe drives. The thermal subsystem
      can use this information to set thermal policies, and userspace can access
      it using libsensors and/or the "sensors" command.
      
      Example output from the "sensors" command:
      
      nvme0-pci-0100
      Adapter: PCI adapter
      Composite:    +39.0°C  (high = +85.0°C, crit = +85.0°C)
      Sensor 1:     +39.0°C
      Sensor 2:     +41.0°C
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      400b6a7b
  2. 04 Nov, 2019 27 commits
  3. 01 Nov, 2019 1 commit
    • Darrick J. Wong's avatar
      loop: fix no-unmap write-zeroes request behavior · efcfec57
      Darrick J. Wong authored
      Currently, if the loop device receives a WRITE_ZEROES request, it asks
      the underlying filesystem to punch out the range.  This behavior is
      correct if unmapping is allowed.  However, a NOUNMAP request means that
      the caller doesn't want us to free the storage backing the range, so
      punching out the range is incorrect behavior.
      
      To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
      underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
      the fallocate documentation) required to ensure that the entire range is
      backed by real storage, which suffices for our purposes.
      
      Fixes: 19372e27 ("loop: implement REQ_OP_WRITE_ZEROES")
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      efcfec57
  4. 25 Oct, 2019 1 commit
  5. 24 Oct, 2019 5 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · 9c6694bd
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.5/drivers
      
      Pull MD changes from Song.
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: no longer compare spare disk superblock events in super_load
        md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
        md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
        md/raid0: Fix an error message in raid0_make_request()
      9c6694bd
    • Yufen Yu's avatar
      md: no longer compare spare disk superblock events in super_load · 6a5cb53a
      Yufen Yu authored
      We have a test case as follow:
      
        mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] \
      	--assume-clean --bitmap=internal
        mdadm -S /dev/md1
        mdadm -A /dev/md1 /dev/sd[b-c] --run --force
      
        mdadm --zero /dev/sda
        mdadm /dev/md1 -a /dev/sda
      
        echo offline > /sys/block/sdc/device/state
        echo offline > /sys/block/sdb/device/state
        sleep 5
        mdadm -S /dev/md1
      
        echo running > /sys/block/sdb/device/state
        echo running > /sys/block/sdc/device/state
        mdadm -A /dev/md1 /dev/sd[a-c] --run --force
      
      When we readd /dev/sda to the array, it started to do recovery.
      After offline the other two disks in md1, the recovery have
      been interrupted and superblock update info cannot be written
      to the offline disks. While the spare disk (/dev/sda) can continue
      to update superblock info.
      
      After stopping the array and assemble it, we found the array
      run fail, with the follow kernel message:
      
      [  172.986064] md: kicking non-fresh sdb from array!
      [  173.004210] md: kicking non-fresh sdc from array!
      [  173.022383] md/raid1:md1: active with 0 out of 4 mirrors
      [  173.022406] md1: failed to create bitmap (-5)
      [  173.023466] md: md1 stopped.
      
      Since both sdb and sdc have the value of 'sb->events' smaller than
      that in sda, they have been kicked from the array. However, the only
      remained disk sda is in 'spare' state before stop and it cannot be
      added to conf->mirrors[] array. In the end, raid array assemble
      and run fail.
      
      In fact, we can use the older disk sdb or sdc to assemble the array.
      That means we should not choose the 'spare' disk as the fresh disk in
      analyze_sbs().
      
      To fix the problem, we do not compare superblock events when it is
      a spare disk, as same as validate_super.
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      6a5cb53a
    • David Jeffery's avatar
      md: improve handling of bio with REQ_PREFLUSH in md_flush_request() · 775d7831
      David Jeffery authored
      If pers->make_request fails in md_flush_request(), the bio is lost. To
      fix this, pass back a bool to indicate if the original make_request call
      should continue to handle the I/O and instead of assuming the flush logic
      will push it to completion.
      
      Convert md_flush_request to return a bool and no longer calls the raid
      driver's make_request function.  If the return is true, then the md flush
      logic has or will complete the bio and the md make_request call is done.
      If false, then the md make_request function needs to keep processing like
      it is a normal bio. Let the original call to md_handle_request handle any
      need to retry sending the bio to the raid driver's make_request function
      should it be needed.
      
      Also mark md_flush_request and the make_request function pointer as
      __must_check to issue warnings should these critical return values be
      ignored.
      
      Fixes: 2bc13b83 ("md: batch flush requests.")
      Cc: stable@vger.kernel.org # # v4.19+
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      775d7831
    • Guoqing Jiang's avatar
      md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit · fadcbd29
      Guoqing Jiang authored
      We need to move "spin_lock_irq(&bitmap->counts.lock)" before unmap previous
      storage, otherwise panic like belows could happen as follows.
      
      [  902.353802] sdl: detected capacity change from 1077936128 to 3221225472
      [  902.616948] general protection fault: 0000 [#1] SMP
      [snip]
      [  902.618588] CPU: 12 PID: 33698 Comm: md0_raid1 Tainted: G           O    4.14.144-1-pserver #4.14.144-1.1~deb10
      [  902.618870] Hardware name: Supermicro SBA-7142G-T4/BHQGE, BIOS 3.00       10/24/2012
      [  902.619120] task: ffff9ae1860fc600 task.stack: ffffb52e4c704000
      [  902.619301] RIP: 0010:bitmap_file_clear_bit+0x90/0xd0 [md_mod]
      [  902.619464] RSP: 0018:ffffb52e4c707d28 EFLAGS: 00010087
      [  902.619626] RAX: ffe8008b0d061000 RBX: ffff9ad078c87300 RCX: 0000000000000000
      [  902.619792] RDX: ffff9ad986341868 RSI: 0000000000000803 RDI: ffff9ad078c87300
      [  902.619986] RBP: ffff9ad0ed7a8000 R08: 0000000000000000 R09: 0000000000000000
      [  902.620154] R10: ffffb52e4c707ec0 R11: ffff9ad987d1ed44 R12: ffff9ad0ed7a8360
      [  902.620320] R13: 0000000000000003 R14: 0000000000060000 R15: 0000000000000800
      [  902.620487] FS:  0000000000000000(0000) GS:ffff9ad987d00000(0000) knlGS:0000000000000000
      [  902.620738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  902.620901] CR2: 000055ff12aecec0 CR3: 0000001005207000 CR4: 00000000000406e0
      [  902.621068] Call Trace:
      [  902.621256]  bitmap_daemon_work+0x2dd/0x360 [md_mod]
      [  902.621429]  ? find_pers+0x70/0x70 [md_mod]
      [  902.621597]  md_check_recovery+0x51/0x540 [md_mod]
      [  902.621762]  raid1d+0x5c/0xeb0 [raid1]
      [  902.621939]  ? try_to_del_timer_sync+0x4d/0x80
      [  902.622102]  ? del_timer_sync+0x35/0x40
      [  902.622265]  ? schedule_timeout+0x177/0x360
      [  902.622453]  ? call_timer_fn+0x130/0x130
      [  902.622623]  ? find_pers+0x70/0x70 [md_mod]
      [  902.622794]  ? md_thread+0x94/0x150 [md_mod]
      [  902.622959]  md_thread+0x94/0x150 [md_mod]
      [  902.623121]  ? wait_woken+0x80/0x80
      [  902.623280]  kthread+0x119/0x130
      [  902.623437]  ? kthread_create_on_node+0x60/0x60
      [  902.623600]  ret_from_fork+0x22/0x40
      [  902.624225] RIP: bitmap_file_clear_bit+0x90/0xd0 [md_mod] RSP: ffffb52e4c707d28
      
      Because mdadm was running on another cpu to do resize, so bitmap_resize was
      called to replace bitmap as below shows.
      
      PID: 38801  TASK: ffff9ad074a90e00  CPU: 0   COMMAND: "mdadm"
         [exception RIP: queued_spin_lock_slowpath+56]
         [snip]
      -- <NMI exception stack> --
       #5 [ffffb52e60f17c58] queued_spin_lock_slowpath at ffffffff9c0b27b8
       #6 [ffffb52e60f17c58] bitmap_resize at ffffffffc0399877 [md_mod]
       #7 [ffffb52e60f17d30] raid1_resize at ffffffffc0285bf9 [raid1]
       #8 [ffffb52e60f17d50] update_size at ffffffffc038a31a [md_mod]
       #9 [ffffb52e60f17d70] md_ioctl at ffffffffc0395ca4 [md_mod]
      
      And the procedure to keep resize bitmap safe is allocate new storage
      space, then quiesce, copy bits, replace bitmap, and re-start.
      
      However the daemon (bitmap_daemon_work) could happen even the array is
      quiesced, which means when bitmap_file_clear_bit is triggered by raid1d,
      then it thinks it should be fine to access store->filemap since
      counts->lock is held, but resize could change the storage without the
      protection of the lock.
      
      Cc: Jack Wang <jinpu.wang@cloud.ionos.com>
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      fadcbd29
    • Dan Carpenter's avatar
      md/raid0: Fix an error message in raid0_make_request() · e3fc3f3d
      Dan Carpenter authored
      The first argument to WARN() is supposed to be a condition.  The
      original code will just print the mdname() instead of the full warning
      message.
      
      Fixes: c84a1372 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      e3fc3f3d
  6. 18 Oct, 2019 1 commit
  7. 07 Oct, 2019 2 commits
  8. 06 Oct, 2019 2 commits
    • Linus Torvalds's avatar
      Linux 5.4-rc2 · da0c9ea1
      Linus Torvalds authored
      da0c9ea1
    • Linus Torvalds's avatar
      elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings · b212921b
      Linus Torvalds authored
      In commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") we
      changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
      executable mappings.
      
      Then, people reported that it broke some binaries that had overlapping
      segments from the same file, and commit ad55eac7 ("elf: enforce
      MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
      overlaying elf segment cases.  But only some - despite the summary line
      of that commit, it only did it when it also does a temporary brk vma for
      one obvious overlapping case.
      
      Now Russell King reports another overlapping case with old 32-bit x86
      binaries, which doesn't trigger that limited case.  End result: we had
      better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
      
      Yes, it's a sign of old binaries generated with old tool-chains, but we
      do pride ourselves on not breaking existing setups.
      
      This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
      and the old load_elf_library() use-cases, because nobody has reported
      breakage for those. Yet.
      
      Note that in all the cases seen so far, the overlapping elf sections
      seem to be just re-mapping of the same executable with different section
      attributes.  We could possibly introduce a new MAP_FIXED_NOFILECHANGE
      flag or similar, which acts like NOREPLACE, but allows just remapping
      the same executable file using different protection flags.
      
      It's not clear that would make a huge difference to anything, but if
      people really hate that "elf remaps over previous maps" behavior, maybe
      at least a more limited form of remapping would alleviate some concerns.
      
      Alternatively, we should take a look at our elf_map() logic to see if we
      end up not mapping things properly the first time.
      
      In the meantime, this is the minimal "don't do that then" patch while
      people hopefully think about it more.
      Reported-by: default avatarRussell King <linux@armlinux.org.uk>
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Fixes: ad55eac7 ("elf: enforce  MAP_FIXED on overlaying elf segments")
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b212921b