1. 21 Nov, 2019 2 commits
    • Akinobu Mita's avatar
      nvme: hwmon: provide temperature min and max values for each sensor · 52deba0f
      Akinobu Mita authored
      According to the NVMe specification, the over temperature threshold and
      under temperature threshold features shall be implemented for Composite
      Temperature if a non-zero WCTEMP field value is reported in the Identify
      Controller data structure.  The features are also implemented for all
      implemented temperature sensors (i.e., all Temperature Sensor fields that
      report a non-zero value).
      
      This provides the over temperature threshold and under temperature
      threshold for each sensor as temperature min and max values of hwmon
      sysfs attributes.
      
      The WCTEMP is already provided as a temperature max value for Composite
      Temperature, but this change isn't incompatible.  Because the default
      value of the over temperature threshold for Composite Temperature is
      the WCTEMP.
      
      Now the alarm attribute for Composite Temperature indicates one of the
      temperature is outside of a temperature threshold.  Because there is only
      a single bit in Critical Warning field that indicates a temperature is
      outside of a threshold.
      
      Example output from the "sensors" command:
      
      nvme-pci-0100
      Adapter: PCI adapter
      Composite:    +33.9°C  (low  = -273.1°C, high = +69.8°C)
                             (crit = +79.8°C)
      Sensor 1:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)
      Sensor 2:     +31.9°C  (low  = -273.1°C, high = +65261.8°C)
      Sensor 5:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)
      
      This also adds helper macros for kelvin from/to milli Celsius conversion,
      and replaces the repeated code in hwmon.c.
      
      Cc: Keith Busch <kbusch@kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Jean Delvare <jdelvare@suse.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      52deba0f
    • Christoph Hellwig's avatar
      nvmet: add another maintainer · 3aeb6a24
      Christoph Hellwig authored
      Sagi and I have been pretty busy lately, and Chaitanya has been
      helping a lot with target work and agreed to share the load.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      3aeb6a24
  2. 12 Nov, 2019 1 commit
  3. 11 Nov, 2019 1 commit
    • Guenter Roeck's avatar
      nvme: Add hardware monitoring support · 400b6a7b
      Guenter Roeck authored
      nvme devices report temperature information in the controller information
      (for limits) and in the smart log. Currently, the only means to retrieve
      this information is the nvme command line interface, which requires
      super-user privileges.
      
      At the same time, it would be desirable to be able to use NVMe temperature
      information for thermal control.
      
      This patch adds support to read NVMe temperatures from the kernel using the
      hwmon API and adds temperature zones for NVMe drives. The thermal subsystem
      can use this information to set thermal policies, and userspace can access
      it using libsensors and/or the "sensors" command.
      
      Example output from the "sensors" command:
      
      nvme0-pci-0100
      Adapter: PCI adapter
      Composite:    +39.0°C  (high = +85.0°C, crit = +85.0°C)
      Sensor 1:     +39.0°C
      Sensor 2:     +41.0°C
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      400b6a7b
  4. 04 Nov, 2019 27 commits
  5. 01 Nov, 2019 1 commit
    • Darrick J. Wong's avatar
      loop: fix no-unmap write-zeroes request behavior · efcfec57
      Darrick J. Wong authored
      Currently, if the loop device receives a WRITE_ZEROES request, it asks
      the underlying filesystem to punch out the range.  This behavior is
      correct if unmapping is allowed.  However, a NOUNMAP request means that
      the caller doesn't want us to free the storage backing the range, so
      punching out the range is incorrect behavior.
      
      To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
      underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
      the fallocate documentation) required to ensure that the entire range is
      backed by real storage, which suffices for our purposes.
      
      Fixes: 19372e27 ("loop: implement REQ_OP_WRITE_ZEROES")
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      efcfec57
  6. 25 Oct, 2019 1 commit
  7. 24 Oct, 2019 5 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · 9c6694bd
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.5/drivers
      
      Pull MD changes from Song.
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: no longer compare spare disk superblock events in super_load
        md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
        md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
        md/raid0: Fix an error message in raid0_make_request()
      9c6694bd
    • Yufen Yu's avatar
      md: no longer compare spare disk superblock events in super_load · 6a5cb53a
      Yufen Yu authored
      We have a test case as follow:
      
        mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] \
      	--assume-clean --bitmap=internal
        mdadm -S /dev/md1
        mdadm -A /dev/md1 /dev/sd[b-c] --run --force
      
        mdadm --zero /dev/sda
        mdadm /dev/md1 -a /dev/sda
      
        echo offline > /sys/block/sdc/device/state
        echo offline > /sys/block/sdb/device/state
        sleep 5
        mdadm -S /dev/md1
      
        echo running > /sys/block/sdb/device/state
        echo running > /sys/block/sdc/device/state
        mdadm -A /dev/md1 /dev/sd[a-c] --run --force
      
      When we readd /dev/sda to the array, it started to do recovery.
      After offline the other two disks in md1, the recovery have
      been interrupted and superblock update info cannot be written
      to the offline disks. While the spare disk (/dev/sda) can continue
      to update superblock info.
      
      After stopping the array and assemble it, we found the array
      run fail, with the follow kernel message:
      
      [  172.986064] md: kicking non-fresh sdb from array!
      [  173.004210] md: kicking non-fresh sdc from array!
      [  173.022383] md/raid1:md1: active with 0 out of 4 mirrors
      [  173.022406] md1: failed to create bitmap (-5)
      [  173.023466] md: md1 stopped.
      
      Since both sdb and sdc have the value of 'sb->events' smaller than
      that in sda, they have been kicked from the array. However, the only
      remained disk sda is in 'spare' state before stop and it cannot be
      added to conf->mirrors[] array. In the end, raid array assemble
      and run fail.
      
      In fact, we can use the older disk sdb or sdc to assemble the array.
      That means we should not choose the 'spare' disk as the fresh disk in
      analyze_sbs().
      
      To fix the problem, we do not compare superblock events when it is
      a spare disk, as same as validate_super.
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      6a5cb53a
    • David Jeffery's avatar
      md: improve handling of bio with REQ_PREFLUSH in md_flush_request() · 775d7831
      David Jeffery authored
      If pers->make_request fails in md_flush_request(), the bio is lost. To
      fix this, pass back a bool to indicate if the original make_request call
      should continue to handle the I/O and instead of assuming the flush logic
      will push it to completion.
      
      Convert md_flush_request to return a bool and no longer calls the raid
      driver's make_request function.  If the return is true, then the md flush
      logic has or will complete the bio and the md make_request call is done.
      If false, then the md make_request function needs to keep processing like
      it is a normal bio. Let the original call to md_handle_request handle any
      need to retry sending the bio to the raid driver's make_request function
      should it be needed.
      
      Also mark md_flush_request and the make_request function pointer as
      __must_check to issue warnings should these critical return values be
      ignored.
      
      Fixes: 2bc13b83 ("md: batch flush requests.")
      Cc: stable@vger.kernel.org # # v4.19+
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      775d7831
    • Guoqing Jiang's avatar
      md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit · fadcbd29
      Guoqing Jiang authored
      We need to move "spin_lock_irq(&bitmap->counts.lock)" before unmap previous
      storage, otherwise panic like belows could happen as follows.
      
      [  902.353802] sdl: detected capacity change from 1077936128 to 3221225472
      [  902.616948] general protection fault: 0000 [#1] SMP
      [snip]
      [  902.618588] CPU: 12 PID: 33698 Comm: md0_raid1 Tainted: G           O    4.14.144-1-pserver #4.14.144-1.1~deb10
      [  902.618870] Hardware name: Supermicro SBA-7142G-T4/BHQGE, BIOS 3.00       10/24/2012
      [  902.619120] task: ffff9ae1860fc600 task.stack: ffffb52e4c704000
      [  902.619301] RIP: 0010:bitmap_file_clear_bit+0x90/0xd0 [md_mod]
      [  902.619464] RSP: 0018:ffffb52e4c707d28 EFLAGS: 00010087
      [  902.619626] RAX: ffe8008b0d061000 RBX: ffff9ad078c87300 RCX: 0000000000000000
      [  902.619792] RDX: ffff9ad986341868 RSI: 0000000000000803 RDI: ffff9ad078c87300
      [  902.619986] RBP: ffff9ad0ed7a8000 R08: 0000000000000000 R09: 0000000000000000
      [  902.620154] R10: ffffb52e4c707ec0 R11: ffff9ad987d1ed44 R12: ffff9ad0ed7a8360
      [  902.620320] R13: 0000000000000003 R14: 0000000000060000 R15: 0000000000000800
      [  902.620487] FS:  0000000000000000(0000) GS:ffff9ad987d00000(0000) knlGS:0000000000000000
      [  902.620738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  902.620901] CR2: 000055ff12aecec0 CR3: 0000001005207000 CR4: 00000000000406e0
      [  902.621068] Call Trace:
      [  902.621256]  bitmap_daemon_work+0x2dd/0x360 [md_mod]
      [  902.621429]  ? find_pers+0x70/0x70 [md_mod]
      [  902.621597]  md_check_recovery+0x51/0x540 [md_mod]
      [  902.621762]  raid1d+0x5c/0xeb0 [raid1]
      [  902.621939]  ? try_to_del_timer_sync+0x4d/0x80
      [  902.622102]  ? del_timer_sync+0x35/0x40
      [  902.622265]  ? schedule_timeout+0x177/0x360
      [  902.622453]  ? call_timer_fn+0x130/0x130
      [  902.622623]  ? find_pers+0x70/0x70 [md_mod]
      [  902.622794]  ? md_thread+0x94/0x150 [md_mod]
      [  902.622959]  md_thread+0x94/0x150 [md_mod]
      [  902.623121]  ? wait_woken+0x80/0x80
      [  902.623280]  kthread+0x119/0x130
      [  902.623437]  ? kthread_create_on_node+0x60/0x60
      [  902.623600]  ret_from_fork+0x22/0x40
      [  902.624225] RIP: bitmap_file_clear_bit+0x90/0xd0 [md_mod] RSP: ffffb52e4c707d28
      
      Because mdadm was running on another cpu to do resize, so bitmap_resize was
      called to replace bitmap as below shows.
      
      PID: 38801  TASK: ffff9ad074a90e00  CPU: 0   COMMAND: "mdadm"
         [exception RIP: queued_spin_lock_slowpath+56]
         [snip]
      -- <NMI exception stack> --
       #5 [ffffb52e60f17c58] queued_spin_lock_slowpath at ffffffff9c0b27b8
       #6 [ffffb52e60f17c58] bitmap_resize at ffffffffc0399877 [md_mod]
       #7 [ffffb52e60f17d30] raid1_resize at ffffffffc0285bf9 [raid1]
       #8 [ffffb52e60f17d50] update_size at ffffffffc038a31a [md_mod]
       #9 [ffffb52e60f17d70] md_ioctl at ffffffffc0395ca4 [md_mod]
      
      And the procedure to keep resize bitmap safe is allocate new storage
      space, then quiesce, copy bits, replace bitmap, and re-start.
      
      However the daemon (bitmap_daemon_work) could happen even the array is
      quiesced, which means when bitmap_file_clear_bit is triggered by raid1d,
      then it thinks it should be fine to access store->filemap since
      counts->lock is held, but resize could change the storage without the
      protection of the lock.
      
      Cc: Jack Wang <jinpu.wang@cloud.ionos.com>
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      fadcbd29
    • Dan Carpenter's avatar
      md/raid0: Fix an error message in raid0_make_request() · e3fc3f3d
      Dan Carpenter authored
      The first argument to WARN() is supposed to be a condition.  The
      original code will just print the mdname() instead of the full warning
      message.
      
      Fixes: c84a1372 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      e3fc3f3d
  8. 18 Oct, 2019 1 commit
  9. 07 Oct, 2019 1 commit