1. 07 Mar, 2018 1 commit
  2. 01 Mar, 2018 2 commits
    • Ming Lei's avatar
      nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors · 16ccfff2
      Ming Lei authored
      84676c1f ("genirq/affinity: assign vectors to all possible CPUs")
      has switched to do irq vectors spread among all possible CPUs, so
      pass num_possible_cpus() as max vecotrs to be assigned.
      
      For example, in a 8 cores system, 0~3 online, 4~8 offline/not present,
      see 'lscpu':
      
              [ming@box]$lscpu
              Architecture:          x86_64
              CPU op-mode(s):        32-bit, 64-bit
              Byte Order:            Little Endian
              CPU(s):                4
              On-line CPU(s) list:   0-3
              Thread(s) per core:    1
              Core(s) per socket:    2
              Socket(s):             2
              NUMA node(s):          2
              ...
              NUMA node0 CPU(s):     0-3
              NUMA node1 CPU(s):
              ...
      
      1) before this patch, follows the allocated vectors and their affinity:
      	irq 47, cpu list 0,4
      	irq 48, cpu list 1,6
      	irq 49, cpu list 2,5
      	irq 50, cpu list 3,7
      
      2) after this patch, follows the allocated vectors and their affinity:
      	irq 43, cpu list 0
      	irq 44, cpu list 1
      	irq 45, cpu list 2
      	irq 46, cpu list 3
      	irq 47, cpu list 4
      	irq 48, cpu list 6
      	irq 49, cpu list 5
      	irq 50, cpu list 7
      
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      16ccfff2
    • Wen Xiong's avatar
      nvme-pci: Fix EEH failure on ppc · 651438bb
      Wen Xiong authored
      Triggering PPC EEH detection and handling requires a memory mapped read
      failure. The NVMe driver removed the periodic health check MMIO, so
      there's no early detection mechanism to trigger the recovery. Instead,
      the detection now happens when the nvme driver handles an IO timeout
      event. This takes the pci channel offline, so we do not want the driver
      to proceed with escalating its own recovery efforts that may conflict
      with the EEH handler.
      
      This patch ensures the driver will observe the channel was set to offline
      after a failed MMIO read and resets the IO timer so the EEH handler has
      a chance to recover the device.
      Signed-off-by: default avatarWen Xiong <wenxiong@linux.vnet.ibm.com>
      [updated change log]
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      651438bb
  3. 28 Feb, 2018 3 commits
    • Jens Axboe's avatar
      Merge branch 'for-jens' of git://git.infradead.org/nvme into for-linus · 468f0987
      Jens Axboe authored
      Pull NVMe fixes from Keith for 4.16-rc.
      
      * 'for-jens' of git://git.infradead.org/nvme:
        nvmet: fix PSDT field check in command format
        nvme-multipath: fix sysfs dangerously created links
        nvme-pci: Fix nvme queue cleanup if IRQ setup fails
        nvmet-loop: use blk_rq_payload_bytes for sgl selection
        nvme-rdma: use blk_rq_payload_bytes instead of blk_rq_bytes
        nvme-fabrics: don't check for non-NULL module in nvmf_register_transport
      468f0987
    • Max Gurtovoy's avatar
      nvmet: fix PSDT field check in command format · bffd2b61
      Max Gurtovoy authored
      PSDT field section according to NVM_Express-1.3:
      "This field specifies whether PRPs or SGLs are used for any data
      transfer associated with the command. PRPs shall be used for all
      Admin commands for NVMe over PCIe. SGLs shall be used for all Admin
      and I/O commands for NVMe over Fabrics. This field shall be set to
      01b for NVMe over Fabrics 1.0 implementations.
      Suggested-by: default avatarIdan Burstein <idanb@mellanox.com>
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      bffd2b61
    • Baegjae Sung's avatar
      nvme-multipath: fix sysfs dangerously created links · 9bd82b1a
      Baegjae Sung authored
      If multipathing is enabled, each NVMe subsystem creates a head
      namespace (e.g., nvme0n1) and multiple private namespaces
      (e.g., nvme0c0n1 and nvme0c1n1) in sysfs. When creating links for
      private namespaces, links of head namespace are used, so the
      namespace creation order must be followed (e.g., nvme0n1 ->
      nvme0c1n1). If the order is not followed, links of sysfs will be
      incomplete or kernel panic will occur.
      
      The kernel panic was:
        kernel BUG at fs/sysfs/symlink.c:27!
        Call Trace:
          nvme_mpath_add_disk_links+0x5d/0x80 [nvme_core]
          nvme_validate_ns+0x5c2/0x850 [nvme_core]
          nvme_scan_work+0x1af/0x2d0 [nvme_core]
      
      Correct order
      Context A     Context B
      nvme0n1
      nvme0c0n1     nvme0c1n1
      
      Incorrect order
      Context A     Context B
                    nvme0c1n1
      nvme0n1
      nvme0c0n1
      
      The nvme_mpath_add_disk (for creating head namespace) is called
      just before the nvme_mpath_add_disk_links (for creating private
      namespaces). In nvme_mpath_add_disk, the first context acquires
      the lock of subsystem and creates a head namespace, and other
      contexts do nothing by checking GENHD_FL_UP of a head namespace
      after waiting to acquire the lock. We verified the code with or
      without multipathing using three vendors of dual-port NVMe SSDs.
      Signed-off-by: default avatarBaegjae Sung <baegjae@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      9bd82b1a
  4. 27 Feb, 2018 3 commits
    • Gustavo A. R. Silva's avatar
      nbd: fix return value in error handling path · 0979962f
      Gustavo A. R. Silva authored
      It seems that the proper value to return in this particular case is the
      one contained into variable new_index instead of ret.
      
      Addresses-Coverity-ID: 1465148 ("Copy-paste error")
      Fixes: e46c7287 ("nbd: add a basic netlink interface")
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0979962f
    • Tang Junhui's avatar
      bcache: fix kcrashes with fio in RAID5 backend dev · 60eb34ec
      Tang Junhui authored
      Kernel crashed when run fio in a RAID5 backend bcache device, the call
      trace is bellow:
      [  440.012034] kernel BUG at block/blk-ioc.c:146!
      [  440.012696] invalid opcode: 0000 [#1] SMP NOPTI
      [  440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8
      [  440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16
      /2015
      [  440.028615] RIP: 0010:put_io_context+0x8b/0x90
      [  440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246
      [  440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 0000000000
      0f4240
      [  440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c882
      94fca0
      [  440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb8
      0c1700
      [  440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 00000000000
      01000
      [  440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11
      bd1e0
      [  440.035239] FS:  0000000000000000(0000) GS:ffff949cba280000(0000) knlGS:
      0000000000000000
      [  440.060190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001
      606e0
      [  440.110498] Call Trace:
      [  440.135443]  bio_disassociate_task+0x1b/0x60
      [  440.160355]  bio_free+0x1b/0x60
      [  440.184666]  bio_put+0x23/0x30
      [  440.208272]  search_free+0x23/0x40 [bcache]
      [  440.231448]  cached_dev_write_complete+0x31/0x70 [bcache]
      [  440.254468]  closure_put+0xb6/0xd0 [bcache]
      [  440.277087]  request_endio+0x30/0x40 [bcache]
      [  440.298703]  bio_endio+0xa1/0x120
      [  440.319644]  handle_stripe+0x418/0x2270 [raid456]
      [  440.340614]  ? load_balance+0x17b/0x9c0
      [  440.360506]  handle_active_stripes.isra.58+0x387/0x5a0 [raid456]
      [  440.380675]  ? __release_stripe+0x15/0x20 [raid456]
      [  440.400132]  raid5d+0x3ed/0x5d0 [raid456]
      [  440.419193]  ? schedule+0x36/0x80
      [  440.437932]  ? schedule_timeout+0x1d2/0x2f0
      [  440.456136]  md_thread+0x122/0x150
      [  440.473687]  ? wait_woken+0x80/0x80
      [  440.491411]  kthread+0x102/0x140
      [  440.508636]  ? find_pers+0x70/0x70
      [  440.524927]  ? kthread_associate_blkcg+0xa0/0xa0
      [  440.541791]  ret_from_fork+0x35/0x40
      [  440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2
      48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b
      0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41
      [  440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8
      [  440.628575] ---[ end trace a1fd79d85643a73e ]--
      
      All the crash issue happened when a bypass IO coming, in such scenario
      s->iop.bio is pointed to the s->orig_bio. In search_free(), it finishes the
      s->orig_bio by calling bio_complete(), and after that, s->iop.bio became
      invalid, then kernel would crash when calling bio_put(). Maybe its upper
      layer's faulty, since bio should not be freed before we calling bio_put(),
      but we'd better calling bio_put() first before calling bio_complete() to
      notify upper layer ending this bio.
      
      This patch moves bio_complete() under bio_put() to avoid kernel crash.
      
      [mlyle: fixed commit subject for character limits]
      Reported-by: default avatarMatthias Ferdinand <bcache@mfedv.net>
      Tested-by: default avatarMatthias Ferdinand <bcache@mfedv.net>
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      60eb34ec
    • Coly Li's avatar
      bcache: correct flash only vols (check all uuids) · 02aa8a8b
      Coly Li authored
      Commit 2831231d ("bcache: reduce cache_set devices iteration by
      devices_max_used") adds c->devices_max_used to reduce iteration of
      c->uuids elements, this value is updated in bcache_device_attach().
      
      But for flash only volume, when calling flash_devs_run(), the function
      bcache_device_attach() is not called yet and c->devices_max_used is not
      updated. The unexpected result is, the flash only volume won't be run
      by flash_devs_run().
      
      This patch fixes the issue by iterate all c->uuids elements in
      flash_devs_run(). c->devices_max_used will be updated properly when
      bcache_device_attach() gets called.
      
      [mlyle: commit subject edited for character limit]
      
      Fixes: 2831231d ("bcache: reduce cache_set devices iteration by devices_max_used")
      Reported-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      02aa8a8b
  5. 26 Feb, 2018 9 commits
    • Eric Biggers's avatar
      blktrace_api.h: fix comment for struct blk_user_trace_setup · 9c722588
      Eric Biggers authored
      'struct blk_user_trace_setup' is passed to BLKTRACESETUP, not
      BLKTRACESTART.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c722588
    • Jan Kara's avatar
      blockdev: Avoid two active bdev inodes for one device · 560e7cb2
      Jan Kara authored
      When blkdev_open() races with device removal and creation it can happen
      that unhashed bdev inode gets associated with newly created gendisk
      like:
      
      CPU0					CPU1
      blkdev_open()
        bdev = bd_acquire()
      					del_gendisk()
      					  bdev_unhash_inode(bdev);
      					remove device
      					create new device with the same number
        __blkdev_get()
          disk = get_gendisk()
            - gets reference to gendisk of the new device
      
      Now another blkdev_open() will not find original 'bdev' as it got
      unhashed, create a new one and associate it with the same 'disk' at
      which point problems start as we have two independent page caches for
      one device.
      
      Fix the problem by verifying that the bdev inode didn't get unhashed
      before we acquired gendisk reference. That way we make sure gendisk can
      get associated only with visible bdev inodes.
      Tested-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      560e7cb2
    • Jan Kara's avatar
      genhd: Fix BUG in blkdev_open() · 56c0908c
      Jan Kara authored
      When two blkdev_open() calls for a partition race with device removal
      and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
      blkdev_open(). The race can happen as follows:
      
      CPU0				CPU1			CPU2
      							del_gendisk()
      							  bdev_unhash_inode(part1);
      
      blkdev_open(part1, O_EXCL)	blkdev_open(part1, O_EXCL)
        bdev = bd_acquire()		  bdev = bd_acquire()
        blkdev_get(bdev)
          bd_start_claiming(bdev)
            - finds old inode 'whole'
            bd_prepare_to_claim() -> 0
      							  bdev_unhash_inode(whole);
      							<device removed>
      							<new device under same
      							 number created>
      				  blkdev_get(bdev);
      				    bd_start_claiming(bdev)
      				      - finds new inode 'whole'
      				      bd_prepare_to_claim()
      					- this also succeeds as we have
      					  different 'whole' here...
      					- bad things happen now as we
      					  have two exclusive openers of
      					  the same bdev
      
      The problem here is that block device opens can see various intermediate
      states while gendisk is shutting down and then being recreated.
      
      We fix the problem by introducing new lookup_sem in gendisk that
      synchronizes gendisk deletion with get_gendisk() and furthermore by
      making sure that get_gendisk() does not return gendisk that is being (or
      has been) deleted. This makes sure that once we ever manage to look up
      newly created bdev inode, we are also guaranteed that following
      get_gendisk() will either return failure (and we fail open) or it
      returns gendisk for the new device and following bdget_disk() will
      return new bdev inode (i.e., blkdev_open() follows the path as if it is
      completely run after new device is created).
      Reported-and-analyzed-by: default avatarHou Tao <houtao1@huawei.com>
      Tested-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      56c0908c
    • Jan Kara's avatar
      genhd: Fix use after free in __blkdev_get() · 89736653
      Jan Kara authored
      When two blkdev_open() calls race with device removal and recreation,
      __blkdev_get() can use looked up gendisk after it is freed:
      
      CPU0				CPU1			CPU2
      							del_gendisk(disk);
      							  bdev_unhash_inode(inode);
      blkdev_open()			blkdev_open()
        bdev = bd_acquire(inode);
          - creates and returns new inode
      				  bdev = bd_acquire(inode);
      				    - returns the same inode
        __blkdev_get(devt)		  __blkdev_get(devt)
          disk = get_gendisk(devt);
            - got structure of device going away
      							<finish device removal>
      							<new device gets
      							 created under the same
      							 device number>
      				  disk = get_gendisk(devt);
      				    - got new device structure
      				  if (!bdev->bd_openers) {
      				    does the first open
      				  }
          if (!bdev->bd_openers)
            - false
          } else {
            put_disk_and_module(disk)
              - remember this was old device - this was last ref and disk is
                now freed
          }
          disk_unblock_events(disk); -> oops
      
      Fix the problem by making sure we drop reference to disk in
      __blkdev_get() only after we are really done with it.
      Reported-by: default avatarHou Tao <houtao1@huawei.com>
      Tested-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      89736653
    • Jan Kara's avatar
      genhd: Add helper put_disk_and_module() · 9df6c299
      Jan Kara authored
      Add a proper counterpart to get_disk_and_module() -
      put_disk_and_module(). Currently it is opencoded in several places.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9df6c299
    • Jan Kara's avatar
      genhd: Rename get_disk() to get_disk_and_module() · 3079c22e
      Jan Kara authored
      Rename get_disk() to get_disk_and_module() to make sure what the
      function does. It's not a great name but at least it is now clear that
      put_disk() is not it's counterpart.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3079c22e
    • Jan Kara's avatar
      genhd: Fix leaked module reference for NVME devices · d52987b5
      Jan Kara authored
      Commit 8ddcd653 "block: introduce GENHD_FL_HIDDEN" added handling of
      hidden devices to get_gendisk() but forgot to drop module reference
      which is also acquired by get_disk(). Drop the reference as necessary.
      
      Arguably the function naming here is misleading as put_disk() is *not*
      the counterpart of get_disk() but let's fix that in the follow up
      commit since that will be more intrusive.
      
      Fixes: 8ddcd653
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d52987b5
    • Jan Kara's avatar
      direct-io: Fix sleep in atomic due to sync AIO · d9c10e5b
      Jan Kara authored
      Commit e864f395 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
      way for direct IO to become synchronous and thus trigger fsync from the
      IO completion handler. Then commit 9830f4be "fs: Use RWF_* flags for
      AIO operations" allowed these flags to be set for AIO as well. However
      that commit forgot to update the condition checking whether the IO
      completion handling should be defered to a workqueue and thus AIO DIO
      with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
      sleep in atomic.
      
      Fix the problem by checking directly iocb flags (the same way as it is
      done in dio_complete()) instead of checking all conditions that could
      lead to IO being synchronous.
      
      CC: Christoph Hellwig <hch@lst.de>
      CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
      CC: stable@vger.kernel.org
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: 9830f4beSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d9c10e5b
    • Jianchao Wang's avatar
      nvme-pci: Fix nvme queue cleanup if IRQ setup fails · f25a2dfc
      Jianchao Wang authored
      This patch fixes nvme queue cleanup if requesting an IRQ handler for
      the queue's vector fails. It does this by resetting the cq_vector to
      the uninitialized value of -1 so it is ignored for a controller reset.
      Signed-off-by: default avatarJianchao Wang <jianchao.w.wang@oracle.com>
      [changelog updates, removed misc whitespace changes]
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      f25a2dfc
  6. 24 Feb, 2018 2 commits
  7. 23 Feb, 2018 4 commits
  8. 22 Feb, 2018 13 commits
    • Randy Dunlap's avatar
      fs/signalfd: fix build error for BUS_MCEERR_AR · 9026e820
      Randy Dunlap authored
      Fix build error in fs/signalfd.c by using same method that is used in
      kernel/signal.c: separate blocks for different signal si_code values.
      
      ./fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared (first use in this function)
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      9026e820
    • Linus Torvalds's avatar
      Merge tag 'usb-4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · a638af00
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of USB fixes for 4.16-rc3
      
        Nothing major, but a number of different fixes all over the place in
        the USB stack for reported issues. Mostly gadget driver fixes,
        although the typical set of xhci bugfixes are there, along with some
        new quirks additions as well.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'usb-4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (39 commits)
        Revert "usb: musb: host: don't start next rx urb if current one failed"
        usb: musb: fix enumeration after resume
        usb: cdc_acm: prevent race at write to acm while system resumes
        Add delay-init quirk for Corsair K70 RGB keyboards
        usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks()
        usb: host: ehci: always enable interrupt for qtd completion at test mode
        usb: ldusb: add PIDs for new CASSY devices supported by this driver
        usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path
        usb: host: ehci: use correct device pointer for dma ops
        usbip: keep usbip_device sockfd state in sync with tcp_socket
        ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()
        USB: serial: option: Add support for Quectel EP06
        xhci: fix xhci debugfs errors in xhci_stop
        xhci: xhci debugfs device nodes weren't removed after device plugged out
        xhci: Fix xhci debugfs devices node disappearance after hibernation
        xhci: Fix NULL pointer in xhci debugfs
        xhci: Don't print a warning when setting link state for disabled ports
        xhci: workaround for AMD Promontory disabled ports wakeup
        usb: dwc3: core: Fix ULPI PHYs and prevent phy_get/ulpi_init during suspend/resume
        USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe()
        ...
      a638af00
    • Linus Torvalds's avatar
      Merge tag 'staging-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 77f892eb
      Linus Torvalds authored
      Pull staging/IIO fixes from Greg KH:
       "Here are a small number of staging and iio driver fixes for 4.16-rc2.
      
        The IIO fixes are all for reported things, and the android driver
        fixes also resolve some reported problems. The remaining fsl-mc
        Kconfig change resolves a build testing error that Arnd reported.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'staging-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: buffer: check if a buffer has been set up when poll is called
        iio: adis_lib: Initialize trigger before requesting interrupt
        staging: android: ion: Zero CMA allocated memory
        staging: android: ashmem: Fix a race condition in pin ioctls
        staging: fsl-mc: fix build testing on x86
        iio: srf08: fix link error "devm_iio_triggered_buffer_setup" undefined
        staging: iio: ad5933: switch buffer mode to software
        iio: adc: stm32: fix stm32h7_adc_enable error handling
        staging: iio: adc: ad7192: fix external frequency setting
        iio: adc: aspeed: Fix error handling path
      77f892eb
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · bb17186a
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a handful of char/misc driver fixes for 4.16-rc3.
      
        There are some binder driver fixes to resolve reported issues in
        stress testing the recent binder changes, some extcon driver fixes,
        and a few mei driver fixes and new device ids.
      
        All of these, with the exception of the mei driver id additions, have
        been in linux-next for a while. I forgot to push out the mei driver id
        additions to kernel.org until today, but all build tests pass with
        them enabled"
      
      * tag 'char-misc-4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        mei: me: add cannon point device ids for 4th device
        mei: me: add cannon point device ids
        mei: set device client to the disconnected state upon suspend.
        ANDROID: binder: synchronize_rcu() when using POLLFREE.
        binder: replace "%p" with "%pK"
        ANDROID: binder: remove WARN() for redundant txn error
        binder: check for binder_thread allocation failure in binder_poll()
        extcon: int3496: process id-pin first so that we start with the right status
        Revert "extcon: axp288: Redo charger type detection a couple of seconds after probe()"
        extcon: axp288: Constify the axp288_pwr_up_down_info array
      bb17186a
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 004e390d
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "Nothing in this is overly interesting, it's mostly your garden variety
        fixes.
      
        There was some work in this merge cycle around the new ioctl kABI, so
        there are fixes in here related to that (probably with more to come).
      
        We've also recently added new netlink support with a goal of moving
        the primary means of configuring the entire subsystem to netlink
        (eventually, this is a long term project), so there are fixes for
        that.
      
        Then a few bnxt_re driver fixes, and a few minor WARN_ON removals, and
        that covers this pull request. There are already a few more fixes on
        the list as of this morning, so there will certainly be more to come
        in this rc cycle ;-)
      
        Summary:
      
         - Lots of fixes for the new IOCTL interface and general uverbs flow.
           Found through testing and syzkaller
      
         - Bugfixes for the new resource track netlink reporting
      
         - Remove some unneeded WARN_ONs that were triggering for some users
           in IPoIB
      
         - Various fixes for the bnxt_re driver"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (27 commits)
        RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type
        RDMA/bnxt_re: Avoid system hang during device un-reg
        RDMA/bnxt_re: Fix system crash during load/unload
        RDMA/bnxt_re: Synchronize destroy_qp with poll_cq
        RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails
        RDMA/bnxt_re: Disable atomic capability on bnxt_re adapters
        RDMA/restrack: don't use uaccess_kernel()
        RDMA/verbs: Check existence of function prior to accessing it
        RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
        RDMA/uverbs: Sanitize user entered port numbers prior to access it
        RDMA/uverbs: Fix circular locking dependency
        RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcd
        RDMA/restrack: Increment CQ restrack object before committing
        RDMA/uverbs: Protect from command mask overflow
        IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy
        IB/uverbs: Improve lockdep_check
        RDMA/uverbs: Protect from races between lookup and destroy of uobjects
        IB/uverbs: Hold the uobj write lock after allocate
        IB/uverbs: Fix possible oops with duplicate ioctl attributes
        IB/uverbs: Add ioctl support for 32bit processes
        ...
      004e390d
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of... · 24180a60
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V cleanups from Palmer Dabbelt:
       "This contains a handful of small cleanups.
      
        The only functional change is that IRQs are now enabled during
        exception handling, which was found when some warnings triggered with
        `CONFIG_DEBUG_ATOMIC_SLEEP=y`.
      
        The remaining fixes should have no functional change: `sbi_save()` has
        been renamed to `parse_dtb()` reflect what it actually does, and a
        handful of unused Kconfig entries have been removed"
      
      * tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        Rename sbi_save to parse_dtb to improve code readability
        RISC-V: Enable IRQ during exception handling
        riscv: Remove ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select
        riscv: kconfig: Remove RISCV_IRQ_INTC select
        riscv: Remove ARCH_WANT_OPTIONAL_GPIOLIB select
      24180a60
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 238ca357
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: don't defer struct page initialization for Xen pv guests
        lib/Kconfig.debug: enable RUNTIME_TESTING_MENU
        vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems
        selftests/memfd: add run_fuse_test.sh to TEST_FILES
        bug.h: work around GCC PR82365 in BUG()
        mm/swap.c: make functions and their kernel-doc agree (again)
        mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc
        ida: do zeroing in ida_pre_get()
        mm, swap, frontswap: fix THP swap if frontswap enabled
        certs/blacklist_nohashes.c: fix const confusion in certs blacklist
        kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
        mm, mlock, vmscan: no more skipping pagevecs
        mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats
        Kbuild: always define endianess in kconfig.h
        include/linux/sched/mm.h: re-inline mmdrop()
        tools: fix cross-compile var clobbering
      238ca357
    • Luck, Tony's avatar
      efivarfs: Limit the rate for non-root to read files · bef3efbe
      Luck, Tony authored
      Each read from a file in efivarfs results in two calls to EFI
      (one to get the file size, another to get the actual data).
      
      On X86 these EFI calls result in broadcast system management
      interrupts (SMI) which affect performance of the whole system.
      A malicious user can loop performing reads from efivarfs bringing
      the system to its knees.
      
      Linus suggested per-user rate limit to solve this.
      
      So we add a ratelimit structure to "user_struct" and initialize
      it for the root user for no limit. When allocating user_struct for
      other users we set the limit to 100 per second. This could be used
      for other places that want to limit the rate of some detrimental
      user action.
      
      In efivarfs if the limit is exceeded when reading, we take an
      interruptible nap for 50ms and check the rate limit again.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bef3efbe
    • Kees Cook's avatar
      kconfig.h: Include compiler types to avoid missed struct attributes · 28128c61
      Kees Cook authored
      The header files for some structures could get included in such a way
      that struct attributes (specifically __randomize_layout from path.h) would
      be parsed as variable names instead of attributes. This could lead to
      some instances of a structure being unrandomized, causing nasty GPFs, etc.
      
      This patch makes sure the compiler_types.h header is included in
      kconfig.h so that we've always got types and struct attributes defined,
      since kconfig.h is included from the compiler command line.
      Reported-by: default avatarPatrick McLean <chutzpah@gentoo.org>
      Root-caused-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Tested-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Fixes: 3859a271 ("randstruct: Mark various structs for randomization")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      28128c61
    • H.J. Lu's avatar
      x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 · b21ebf2f
      H.J. Lu authored
      On i386, there are 2 types of PLTs, PIC and non-PIC.  PIE and shared
      objects must use PIC PLT.  To use PIC PLT, you need to load
      _GLOBAL_OFFSET_TABLE_ into EBX first.  There is no need for that on
      x86-64 since x86-64 uses PC-relative PLT.
      
      On x86-64, for 32-bit PC-relative branches, we can generate PLT32
      relocation, instead of PC32 relocation, which can also be used as
      a marker for 32-bit PC-relative branches.  Linker can always reduce
      PLT32 relocation to PC32 if function is defined locally.   Local
      functions should use PC32 relocation.  As far as Linux kernel is
      concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32
      since Linux kernel doesn't use PLT.
      
      R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in
      binutils master branch which will become binutils 2.31.
      
      [ hjl is working on having better documentation on this all, but a few
        more notes from him:
      
         "PLT32 relocation is used as marker for PC-relative branches. Because
          of EBX, it looks odd to generate PLT32 relocation on i386 when EBX
          doesn't have GOT.
      
          As for symbol resolution, PLT32 and PC32 relocations are almost
          interchangeable. But when linker sees PLT32 relocation against a
          protected symbol, it can resolved locally at link-time since it is
          used on a branch instruction. Linker can't do that for PC32
          relocation"
      
        but for the kernel use, the two are basically the same, and this
        commit gets things building and working with the current binutils
        master   - Linus ]
      Signed-off-by: default avatarH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b21ebf2f
    • Christoph Hellwig's avatar
      nvmet-loop: use blk_rq_payload_bytes for sgl selection · 796b0b8d
      Christoph Hellwig authored
      blk_rq_bytes does the wrong thing for special payloads like discards and
      might cause the driver to not set up a SGL.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      796b0b8d
    • Christoph Hellwig's avatar
      nvme-rdma: use blk_rq_payload_bytes instead of blk_rq_bytes · 0d309923
      Christoph Hellwig authored
      blk_rq_bytes does the wrong thing for special payloads like discards and
      might cause the driver to not set up a SGL.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      0d309923
    • Christoph Hellwig's avatar
      nvme-fabrics: don't check for non-NULL module in nvmf_register_transport · 5a1e5953
      Christoph Hellwig authored
      THIS_MODULE evaluates to NULL when used from code built into the kernel,
      thus breaking built-in transport modules.  Remove the bogus check.
      
      Fixes: 0de5cd36 ("nvme-fabrics: protect against module unload during create_ctrl")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      5a1e5953
  9. 21 Feb, 2018 3 commits
    • Juergen Gross's avatar
      mm: don't defer struct page initialization for Xen pv guests · 895f7b8e
      Juergen Gross authored
      Commit f7f99100 ("mm: stop zeroing memory during allocation in
      vmemmap") broke Xen pv domains in some configurations, as the "Pinned"
      information in struct page of early page tables could get lost.
      
      This will lead to the kernel trying to write directly into the page
      tables instead of asking the hypervisor to do so.  The result is a crash
      like the following:
      
        BUG: unable to handle kernel paging request at ffff8801ead19008
        IP: xen_set_pud+0x4e/0xd0
        PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
        Oops: 0003 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
        Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
        task: ffffffff81c10480 task.stack: ffffffff81c00000
        RIP: e030:xen_set_pud+0x4e/0xd0
        Call Trace:
         __pmd_alloc+0x128/0x140
         ioremap_page_range+0x3f4/0x410
         __ioremap_caller+0x1c3/0x2e0
         acpi_os_map_iomem+0x175/0x1b0
         acpi_tb_acquire_table+0x39/0x66
         acpi_tb_validate_table+0x44/0x7c
         acpi_tb_verify_temp_table+0x45/0x304
         acpi_reallocate_root_table+0x12d/0x141
         acpi_early_init+0x4d/0x10a
         start_kernel+0x3eb/0x4a1
         xen_start_kernel+0x528/0x532
        Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
        RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
        CR2: ffff8801ead19008
        ---[ end trace 38eca2e56f1b642e ]---
      
      Avoid this problem by not deferring struct page initialization when
      running as Xen pv guest.
      
      Pavel said:
      
      : This is unique for Xen, so this particular issue won't effect other
      : configurations.  I am going to investigate if there is a way to
      : re-enable deferred page initialization on xen guests.
      
      [akpm@linux-foundation.org: explicitly include xen.h]
      Link: http://lkml.kernel.org/r/20180216154101.22865-1-jgross@suse.com
      Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: <stable@vger.kernel.org>	[4.15.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      895f7b8e
    • Anders Roxell's avatar
      lib/Kconfig.debug: enable RUNTIME_TESTING_MENU · 908009e8
      Anders Roxell authored
      Commit d3deafaa ("lib/: make RUNTIME_TESTS a menuconfig to ease
      disabling it all") causes a regression when using runtime tests due to
      it defaults RUNTIME_TESTING_MENU to not set.
      
      Link: http://lkml.kernel.org/r/20180214133015.10090-1-anders.roxell@linaro.org
      Fixes: d3deafaa ("lib/: make RUNTIME_TESTS a menuconfig to easedisabling it all")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Cc: Vincent Legoll <vincent.legoll@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      908009e8
    • Michal Hocko's avatar
      vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems · 698d0831
      Michal Hocko authored
      Kai Heng Feng has noticed that BUG_ON(PageHighMem(pg)) triggers in
      drivers/media/common/saa7146/saa7146_core.c since 19809c2d ("mm,
      vmalloc: use __GFP_HIGHMEM implicitly").
      
      saa7146_vmalloc_build_pgtable uses vmalloc_32 and it is reasonable to
      expect that the resulting page is not in highmem.  The above commit
      aimed to add __GFP_HIGHMEM only for those requests which do not specify
      any zone modifier gfp flag.  vmalloc_32 relies on GFP_VMALLOC32 which
      should do the right thing.  Except it has been missed that GFP_VMALLOC32
      is an alias for GFP_KERNEL on 32b architectures.  Thanks to Matthew to
      notice this.
      
      Fix the problem by unconditionally setting GFP_DMA32 in GFP_VMALLOC32
      for !64b arches (as a bailout).  This should do the right thing and use
      ZONE_NORMAL which should be always below 4G on 32b systems.
      
      Debugged by Matthew Wilcox.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20180212095019.GX21609@dhcp22.suse.cz
      Fixes: 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly”)
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarKai Heng Feng <kai.heng.feng@canonical.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      698d0831