1. 23 Sep, 2016 6 commits
    • Chuck Lever's avatar
      svcrdma: support Remote Invalidation · 25d55296
      Chuck Lever authored
      Support Remote Invalidation. A private message is exchanged with
      the client upon RDMA transport connect that indicates whether
      Send With Invalidation may be used by the server to send RPC
      replies. The invalidate_rkey is arbitrarily chosen from among
      rkeys present in the RPC-over-RDMA header's chunk lists.
      
      Send With Invalidate improves performance only when clients can
      recognize, while processing an RPC reply, that an rkey has already
      been invalidated. That has been submitted as a separate change.
      
      In the future, the RPC-over-RDMA protocol might support Remote
      Invalidation properly. The protocol needs to enable signaling
      between peers to indicate when Remote Invalidation can be used
      for each individual RPC.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      25d55296
    • Chuck Lever's avatar
      svcrdma: Server-side support for rpcrdma_connect_private · cc9d8340
      Chuck Lever authored
      Prepare to receive an RDMA-CM private message when handling a new
      connection attempt, and send a similar message as part of connection
      acceptance.
      
      Both sides can communicate their various implementation limits.
      Implementations that don't support this sideband protocol ignore it.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      cc9d8340
    • Chuck Lever's avatar
      rpcrdma: RDMA/CM private message data structure · 5d487096
      Chuck Lever authored
      Introduce data structure used by both client and server to exchange
      implementation details during RDMA/CM connection establishment.
      
      This is an experimental out-of-band exchange between Linux
      RPC-over-RDMA Version One implementations, replacing the deprecated
      CCP (see RFC 5666bis). The purpose of this extension is to enable
      prototyping of features that might be introduced in a subsequent
      version of RPC-over-RDMA.
      
      Suggested by Christoph Hellwig and Devesh Sharma.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      5d487096
    • Chuck Lever's avatar
      svcrdma: Skip put_page() when send_reply() fails · 9995237b
      Chuck Lever authored
      Message from syslogd@klimt at Aug 18 17:00:37 ...
       kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping:          (null) index:0x0
      Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
      Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      
      Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
      Aug 18 17:00:37 klimt kernel: RIP: 0010:[<ffffffffa05c21c1>] svc_rdma_sendto+0x641/0x820 [rpcrdma]
      
      send_reply() assigns its page argument as the first page of ctxt. On
      error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
      which does a put_page() on that very page. No need to do that again
      as svc_rdma_sendto exits.
      
      Fixes: 3e1eeb98 ("svcrdma: Close connection when a send error occurs")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      9995237b
    • Chuck Lever's avatar
      svcrdma: Tail iovec leaves an orphaned DMA mapping · cace564f
      Chuck Lever authored
      The ctxt's count field is overloaded to mean the number of pages in
      the ctxt->page array and the number of SGEs in the ctxt->sge array.
      Typically these two numbers are the same.
      
      However, when an inline RPC reply is constructed from an xdr_buf
      with a tail iovec, the head and tail often occupy the same page,
      but each are DMA mapped independently. In that case, ->count equals
      the number of pages, but it does not equal the number of SGEs.
      There's one more SGE, for the tail iovec. Hence there is one more
      DMA mapping than there are pages in the ctxt->page array.
      
      This isn't a real problem until the server's iommu is enabled. Then
      each RPC reply that has content in that iovec orphans a DMA mapping
      that consists of real resources.
      
      krb5i and krb5p always populate that tail iovec. After a couple
      million sent krb5i/p RPC replies, the NFS server starts behaving
      erratically. Reboot is needed to clear the problem.
      
      Fixes: 9d11b51c ("svcrdma: Fix send_reply() scatter/gather set-up")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      cace564f
    • Jeff Layton's avatar
      nfsd: fix dprintk in nfsd4_encode_getdeviceinfo · bec782b4
      Jeff Layton authored
      nfserr is big-endian, so we should convert it to host-endian before
      printing it.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      bec782b4
  2. 16 Sep, 2016 2 commits
    • Jeff Layton's avatar
      nfsd: eliminate cb_minorversion field · 89dfdc96
      Jeff Layton authored
      We already have that info in the client pointer. No need to pass around
      a copy.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      89dfdc96
    • Jeff Layton's avatar
      nfsd: don't set a FL_LAYOUT lease for flexfiles layouts · 1983a66f
      Jeff Layton authored
      We currently can hit a deadlock (of sorts) when trying to use flexfiles
      layouts with XFS. XFS will call break_layout when something wants to
      write to the file. In the case of the (super-simple) flexfiles layout
      driver in knfsd, the MDS and DS are the same machine.
      
      The client can get a layout and then issue a v3 write to do its I/O. XFS
      will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
      be issued to the client. The client however can't return the layout
      until the v3 WRITE completes, but XFS won't allow the write to proceed
      until the layout is returned.
      
      Christoph says:
      
          XFS only cares about block-like layouts where the client has direct
          access to the file blocks.  I'd need to look how to propagate the
          flag into break_layout, but in principle we don't need to do any
          recalls on truncate ever for file and flexfile layouts.
      
      If we're never going to recall the layout, then we don't even need to
      set the lease at all. Just skip doing so on flexfiles layouts by
      adding a new flag to struct nfsd4_layout_ops and skipping the lease
      setting and removal when that flag is true.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      1983a66f
  3. 12 Sep, 2016 2 commits
    • Chuck Lever's avatar
      svcauth_gss: Revert 64c59a37 ("Remove unnecessary allocation") · bf2c4b6f
      Chuck Lever authored
      rsc_lookup steals the passed-in memory to avoid doing an allocation of
      its own, so we can't just pass in a pointer to memory that someone else
      is using.
      
      If we really want to avoid allocation there then maybe we should
      preallocate somwhere, or reference count these handles.
      
      For now we should revert.
      
      On occasion I see this on my server:
      
      kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
      kernel: invalid opcode: 0000 [#1] SMP
      kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
      kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
      kernel: Workqueue: events do_cache_clean [sunrpc]
      kernel: task: ffff8808541d8000 task.stack: ffff880854344000
      kernel: RIP: 0010:[<ffffffff811e7075>]  [<ffffffff811e7075>] kfree+0x155/0x180
      kernel: RSP: 0018:ffff880854347d70  EFLAGS: 00010246
      kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
      kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
      kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
      kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
      kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
      kernel: FS:  0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
      kernel: Stack:
      kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
      kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
      kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
      kernel: Call Trace:
      kernel: [<ffffffffa013cf76>] rsc_free+0x16/0x90 [auth_rpcgss]
      kernel: [<ffffffffa013d006>] rsc_put+0x16/0x30 [auth_rpcgss]
      kernel: [<ffffffffa0406f60>] cache_clean+0x2e0/0x300 [sunrpc]
      kernel: [<ffffffffa04073ee>] do_cache_clean+0xe/0x70 [sunrpc]
      kernel: [<ffffffff8109a70f>] process_one_work+0x1ff/0x3b0
      kernel: [<ffffffff8109b15c>] worker_thread+0x2bc/0x4a0
      kernel: [<ffffffff8109aea0>] ? rescuer_thread+0x3a0/0x3a0
      kernel: [<ffffffff810a0ba4>] kthread+0xe4/0xf0
      kernel: [<ffffffff8169c47f>] ret_from_fork+0x1f/0x40
      kernel: [<ffffffff810a0ac0>] ? kthread_stop+0x110/0x110
      kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff <0f> 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
      kernel: RIP  [<ffffffff811e7075>] kfree+0x155/0x180
      kernel: RSP <ffff880854347d70>
      kernel: ---[ end trace 3fdec044969def26 ]---
      
      It seems to be most common after a server reboot where a client has been
      using a Kerberos mount, and reconnects to continue its workload.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      bf2c4b6f
    • Linus Torvalds's avatar
      Linux 4.8-rc6 · 9395452b
      Linus Torvalds authored
      9395452b
  4. 11 Sep, 2016 3 commits
    • Linus Torvalds's avatar
      nvme: make NVME_RDMA depend on BLOCK · bd0b841f
      Linus Torvalds authored
      Commit aa719874 ("nvme: fabrics drivers don't need the nvme-pci
      driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
      depend on the block layer (which used to be an implicit dependency
      through BLK_DEV_NVME).
      
      Otherwise you get various errors from the kbuild test robot random
      config testing when that happens to hit a configuration with BLOCK
      device support disabled.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jay Freyensee <james_p_freyensee@linux.intel.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd0b841f
    • Linus Torvalds's avatar
      Merge tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 2afe669a
      Linus Torvalds authored
      Pull IIO fixes from Greg KH:
       "Here are a few small IIO fixes for 4.8-rc6.
      
        Nothing major, full details are in the shortlog, all of these have
        been in linux-next with no reported issues"
      
      * tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio:core: fix IIO_VAL_FRACTIONAL sign handling
        iio: ensure ret is initialized to zero before entering do loop
        iio: accel: kxsd9: Fix scaling bug
        iio: accel: bmc150: reset chip at init time
        iio: fix pressure data output unit in hid-sensor-attributes
        tools:iio:iio_generic_buffer: fix trigger-less mode
      2afe669a
    • Linus Torvalds's avatar
      Merge tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 61c3dae6
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.
      
        All of these resolve minor issues that have been reported, and all
        have been in linux-next with no reported issues"
      
      * tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
        xhci: fix null pointer dereference in stop command timeout function
        usb: dwc3: pci: fix build warning on !PM_SLEEP
        usb: gadget: prevent potenial null pointer dereference on skb->len
        usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
        usb: phy: phy-generic: Check clk_prepare_enable() error
        usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
        Revert "usb: dwc3: gadget: always decrement by 1"
      61c3dae6
  5. 10 Sep, 2016 10 commits
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 98ac9a60
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "nvdimm fixes for v4.8, two of them are tagged for -stable:
      
         - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
           DAX pmd mappings end up with an uncached pgprot, and unusable
           performance for the device-dax interface.  The device-dax interface
           appeared in 4.7 so this is tagged for -stable.
      
         - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
           understand DAX pmd entries.  This fix is tagged for -stable.
      
         - Fix a mis-merge of the nfit machine-check handler to flip the
           polarity of an if() to match the final version of the patch that
           Vishal sent for 4.8-rc1.  Without this the nfit machine check
           handler never detects / inserts new 'badblocks' entries which
           applications use to identify lost portions of files.
      
         - For test purposes, fix the nvdimm_clear_poison() path to operate on
           legacy / simulated nvdimm memory ranges.  Without this fix a test
           can set badblocks, but never clear them on these ranges.
      
         - Fix the range checking done by dax_dev_pmd_fault().  This is not
           tagged for -stable since this problem is mitigated by specifying
           aligned resources at device-dax setup time.
      
        These patches have appeared in a next release over the past week.  The
        recent rebase you can see in the timestamps was to drop an invalid fix
        as identified by the updated device-dax unit tests [1].  The -mm
        touches have an ack from Andrew"
      
      [1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
         https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm: allow legacy (e820) pmem region to clear bad blocks
        nfit, mce: Fix SPA matching logic in MCE handler
        mm: fix cache mode of dax pmd mappings
        mm: fix show_smap() for zone_device-pmd ranges
        dax: fix mapping size check
      98ac9a60
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · b8db3714
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Mostly driver bugfixes, but also a few cleanups which are nice to have
        out of the way"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: rk3x: Restore clock settings at resume time
        i2c: Spelling s/acknowedge/acknowledge/
        i2c: designware: save the preset value of DW_IC_SDA_HOLD
        Documentation: i2c: slave-interface: add note for driver development
        i2c: mux: demux-pinctrl: run properly with multiple instances
        i2c: bcm-kona: fix inconsistent indenting
        i2c: rcar: use proper device with dma_mapping_error
        i2c: sh_mobile: use proper device with dma_mapping_error
        i2c: mux: demux-pinctrl: invalidate properly when switching fails
      b8db3714
    • Linus Torvalds's avatar
      Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 6905732c
      Linus Torvalds authored
      Pull fscrypto fixes fromTed Ts'o:
       "Fix some brown-paper-bag bugs for fscrypto, including one one which
        allows a malicious user to set an encryption policy on an empty
        directory which they do not own"
      
      * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        fscrypto: require write access to mount to set encryption policy
        fscrypto: only allow setting encryption policy on directories
        fscrypto: add authorization check for setting encryption policy
      6905732c
    • Eric Biggers's avatar
      fscrypto: require write access to mount to set encryption policy · ba63f23d
      Eric Biggers authored
      Since setting an encryption policy requires writing metadata to the
      filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
      Otherwise, a user could cause a write to a frozen or readonly
      filesystem.  This was handled correctly by f2fs but not by ext4.  Make
      fscrypt_process_policy() handle it rather than relying on the filesystem
      to get it right.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Acked-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ba63f23d
    • Eric Biggers's avatar
      fscrypto: only allow setting encryption policy on directories · 002ced4b
      Eric Biggers authored
      The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
      policy on nondirectory files.  This was unintentional, and in the case
      of nonempty regular files did not behave as expected because existing
      data was not actually encrypted by the ioctl.
      
      In the case of ext4, the user could also trigger filesystem errors in
      ->empty_dir(), e.g. due to mismatched "directory" checksums when the
      kernel incorrectly tried to interpret a regular file as a directory.
      
      This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
      kernels v4.6 and later.  It appears that older kernels only permitted
      directories and that the check was accidentally lost during the
      refactoring to share the file encryption code between ext4 and f2fs.
      
      This patch restores the !S_ISDIR() check that was present in older
      kernels.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      002ced4b
    • Eric Biggers's avatar
      fscrypto: add authorization check for setting encryption policy · 163ae1c6
      Eric Biggers authored
      On an ext4 or f2fs filesystem with file encryption supported, a user
      could set an encryption policy on any empty directory(*) to which they
      had readonly access.  This is obviously problematic, since such a
      directory might be owned by another user and the new encryption policy
      would prevent that other user from creating files in their own directory
      (for example).
      
      Fix this by requiring inode_owner_or_capable() permission to set an
      encryption policy.  This means that either the caller must own the file,
      or the caller must have the capability CAP_FOWNER.
      
      (*) Or also on any regular file, for f2fs v4.6 and later and ext4
          v4.8-rc1 and later; a separate bug fix is coming for that.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      163ae1c6
    • Dave Jiang's avatar
      libnvdimm: allow legacy (e820) pmem region to clear bad blocks · 1e8b8d96
      Dave Jiang authored
      Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation
      where legacy pmem is being used or a pmem region created by using memmap
      kernel parameter, the injected bad blocks are not cleared due to
      nvdimm_clear_poison() failing from lack of ndctl function pointer. In
      this case we need to just return as handled and allow the bad blocks to
      be cleared rather than fail.
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      1e8b8d96
    • Vishal Verma's avatar
      nfit, mce: Fix SPA matching logic in MCE handler · 2e21807d
      Vishal Verma authored
      The check for a 'pmem' type SPA in the MCE handler was inverted due to a
      merge/rebase error.
      
      Fixes: 6839a6d9 nfit: do an ARS scrub on hitting a latent media error
      Cc: linux-acpi@vger.kernel.org
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      2e21807d
    • Dan Williams's avatar
      mm: fix cache mode of dax pmd mappings · 9049771f
      Dan Williams authored
      track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
      uncacheable rendering them impractical for application usage.  DAX-pte
      mappings are cached and the goal of establishing DAX-pmd mappings is to
      attain more performance, not dramatically less (3 orders of magnitude).
      
      track_pfn_insert() relies on a previous call to reserve_memtype() to
      establish the expected page_cache_mode for the range.  While memremap()
      arranges for reserve_memtype() to be called, devm_memremap_pages() does
      not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
      tracking without a vma, and arrange for devm_memremap_pages() to
      establish the write-back-cache reservation in the memtype tree.
      
      Cc: <stable@vger.kernel.org>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reported-by: default avatarKai Zhang <kai.ka.zhang@oracle.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      9049771f
    • Dan Williams's avatar
      mm: fix show_smap() for zone_device-pmd ranges · ca120cf6
      Dan Williams authored
      Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
      currently results in the following VM_BUG_ONs:
      
       kernel BUG at mm/huge_memory.c:1105!
       task: ffff88045f16b140 task.stack: ffff88045be14000
       RIP: 0010:[<ffffffff81268f9b>]  [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
       [..]
       Call Trace:
        [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
        [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
        [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
        [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
        [<ffffffff81307656>] show_smap+0xa6/0x2b0
      
       kernel BUG at fs/proc/task_mmu.c:585!
       RIP: 0010:[<ffffffff81306469>]  [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
       Call Trace:
        [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
        [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
        [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
        [<ffffffff81307696>] show_smap+0xa6/0x2b0
      
      These locations are sanity checking page flags that must be set for an
      anonymous transparent huge page, but are not set for the zone_device
      pages associated with dax mappings.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      ca120cf6
  6. 09 Sep, 2016 17 commits