1. 11 Dec, 2019 3 commits
    • Michael S. Tsirkin's avatar
      virtio_balloon: divide/multiply instead of shifts · 63b9b80e
      Michael S. Tsirkin authored
      We managed to get confused about the shift direction at least once.
      Let's switch to division/multiplcation instead. Add a number of pages
      macro for this purpose.  We still keep the order macro around too since
      this is what alloc/free pages want.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarWei Wang <wei.w.wang@intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      63b9b80e
    • Michael S. Tsirkin's avatar
      virtio_balloon: name cleanups · 2a946fa1
      Michael S. Tsirkin authored
      free_page_order is a confusing name. It's not a page order
      actually, it's the order of the block of memory we are hinting.
      Rename to hint_block_order. Also, rename SIZE to BYTES
      to make it clear it's the block size in bytes.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarWei Wang <wei.w.wang@intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      2a946fa1
    • David Hildenbrand's avatar
      virtio-balloon: fix managed page counts when migrating pages between zones · 63341ab0
      David Hildenbrand authored
      In case we have to migrate a ballon page to a newpage of another zone, the
      managed page count of both zones is wrong. Paired with memory offlining
      (which will adjust the managed page count), we can trigger kernel crashes
      and all kinds of different symptoms.
      
      One way to reproduce:
      1. Start a QEMU guest with 4GB, no NUMA
      2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL
      3. Inflate the balloon to 1GB
      4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it)
      5. Observe /proc/zoneinfo
        Node 0, zone   Normal
          pages free     16810
                min      24848885473806
                low      18471592959183339
                high     36918337032892872
                spanned  262144
                present  262144
                managed  18446744073709533486
      6. Do anything that requires some memory (e.g., inflate the balloon some
      more). The OOM goes crazy and the system crashes
        [  238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00
        [  238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
        [  238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.4.0-next-20191204+ #75
        [  238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
        [  238.341121] Call Trace:
        [  238.341337]  dump_stack+0x8f/0xd0
        [  238.341630]  dump_header+0x61/0x5ea
        [  238.341942]  oom_kill_process.cold+0xb/0x10
        [  238.342299]  out_of_memory+0x24d/0x5a0
        [  238.342625]  __alloc_pages_slowpath+0xd12/0x1020
        [  238.343024]  __alloc_pages_nodemask+0x391/0x410
        [  238.343407]  pagecache_get_page+0xc3/0x3a0
        [  238.343757]  filemap_fault+0x804/0xc30
        [  238.344083]  ? ext4_filemap_fault+0x28/0x42
        [  238.344444]  ext4_filemap_fault+0x30/0x42
        [  238.344789]  __do_fault+0x37/0x1a0
        [  238.345087]  __handle_mm_fault+0x104d/0x1ab0
        [  238.345450]  handle_mm_fault+0x169/0x360
        [  238.345790]  do_user_addr_fault+0x20d/0x490
        [  238.346154]  do_page_fault+0x31/0x210
        [  238.346468]  async_page_fault+0x43/0x50
        [  238.346797] RIP: 0033:0x7f47eba4197e
        [  238.347110] Code: Bad RIP value.
        [  238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293
        [  238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e
        [  238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004
        [  238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033
        [  238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
        [  238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0
        [  238.350878] Mem-Info:
        [  238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0
        [  238.351085]  active_file:12 inactive_file:7 isolated_file:0
        [  238.351085]  unevictable:0 dirty:0 writeback:0 unstable:0
        [  238.351085]  slab_reclaimable:5565 slab_unreclaimable:10170
        [  238.351085]  mapped:3 shmem:111 pagetables:155 bounce:0
        [  238.351085]  free:720717 free_pcp:2 free_cma:0
        [  238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss
        [  238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB
        [  238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884
        [  238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B
        [  238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865
        [  238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB
        [  238.364765] lowmem_reserve[]: 0 0 0 0 0
        [  238.365101] Node 0 DMA: 7*4kB (U) 5*8kB (UE) 6*16kB (UME) 2*32kB (UM) 1*64kB (U) 2*128kB (UE) 3*256kB (UME) 2*512B
        [  238.366379] Node 0 DMA32: 0*4kB 1*8kB (U) 2*16kB (UM) 2*32kB (UM) 2*64kB (UM) 1*128kB (U) 1*256kB (U) 1*512kB (U)B
        [  238.367654] Node 0 Normal: 1985*4kB (UME) 1321*8kB (UME) 844*16kB (UME) 524*32kB (UME) 300*64kB (UME) 138*128kB (B
        [  238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
        [  238.369915] 130 total pagecache pages
        [  238.370241] 0 pages in swap cache
        [  238.370533] Swap cache stats: add 0, delete 0, find 0/0
        [  238.370981] Free swap  = 0kB
        [  238.371239] Total swap = 0kB
        [  238.371488] 1048445 pages RAM
        [  238.371756] 0 pages HighMem/MovableOnly
        [  238.372090] 306992 pages reserved
        [  238.372376] 0 pages cma reserved
        [  238.372661] 0 pages hwpoisoned
      
      In another instance (older kernel), I was able to observe this
      (negative page count :/):
        [  180.896971] Offlined Pages 32768
        [  182.667462] Offlined Pages 32768
        [  184.408117] Offlined Pages 32768
        [  186.026321] Offlined Pages 32768
        [  187.684861] Offlined Pages 32768
        [  189.227013] Offlined Pages 32768
        [  190.830303] Offlined Pages 32768
        [  190.833071] Built 1 zonelists, mobility grouping on.  Total pages: -36920272750453009
      
      In another instance (older kernel), I was no longer able to start any
      process:
        [root@vm ~]# [  214.348068] Offlined Pages 32768
        [  215.973009] Offlined Pages 32768
        cat /proc/meminfo
        -bash: fork: Cannot allocate memory
        [root@vm ~]# cat /proc/meminfo
        -bash: fork: Cannot allocate memory
      
      Fix it by properly adjusting the managed page count when migrating if
      the zone changed. The managed page count of the zones now looks after
      unplug of the DIMM (and after deflating the balloon) just like before
      inflating the balloon (and plugging+onlining the DIMM).
      
      We'll temporarily modify the totalram page count. If this ever becomes a
      problem, we can fine tune by providing helpers that don't touch
      the totalram pages (e.g., adjust_zone_managed_page_count()).
      
      Please note that fixing up the managed page count is only necessary when
      we adjusted the managed page count when inflating - only if we
      don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
      managed page count is not touched when inflating/deflating.
      Reported-by: default avatarYumei Huang <yuhuang@redhat.com>
      Fixes: 3dcc0571 ("mm: correctly update zone->managed_pages")
      Cc: <stable@vger.kernel.org> # v3.11+
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      63341ab0
  2. 08 Dec, 2019 12 commits
    • Linus Torvalds's avatar
      Linux 5.5-rc1 · e42617b8
      Linus Torvalds authored
      e42617b8
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 95e6ba51
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) More jumbo frame fixes in r8169, from Heiner Kallweit.
      
       2) Fix bpf build in minimal configuration, from Alexei Starovoitov.
      
       3) Use after free in slcan driver, from Jouni Hogander.
      
       4) Flower classifier port ranges don't work properly in the HW offload
          case, from Yoshiki Komachi.
      
       5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.
      
       6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.
      
       7) Fix flow dissection in dsa TX path, from Alexander Lobakin.
      
       8) Stale syncookie timestampe fixes from Guillaume Nault.
      
      [ Did an evil merge to silence a warning introduced by this pull - Linus ]
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
        r8169: fix rtl_hw_jumbo_disable for RTL8168evl
        net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
        r8169: add missing RX enabling for WoL on RTL8125
        vhost/vsock: accept only packets with the right dst_cid
        net: phy: dp83867: fix hfs boot in rgmii mode
        net: ethernet: ti: cpsw: fix extra rx interrupt
        inet: protect against too small mtu values.
        gre: refetch erspan header from skb->data after pskb_may_pull()
        pppoe: remove redundant BUG_ON() check in pppoe_pernet
        tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
        tcp: tighten acceptance of ACKs not matching a child socket
        tcp: fix rejected syncookies due to stale timestamps
        lpc_eth: kernel BUG on remove
        tcp: md5: fix potential overestimation of TCP option space
        net: sched: allow indirect blocks to bind to clsact in TC
        net: core: rename indirect block ingress cb function
        net-sysfs: Call dev_hold always in netdev_queue_add_kobject
        net: dsa: fix flow dissection on Tx path
        net/tls: Fix return values to avoid ENOTSUPP
        net: avoid an indirect call in ____sys_recvmsg()
        ...
      95e6ba51
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 138f371d
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "Eleven patches, all in drivers (no core changes) that are either minor
        cleanups or small fixes.
      
        They were late arriving, but still safe for -rc1"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: MAINTAINERS: Add the linux-scsi mailing list to the ISCSI entry
        scsi: megaraid_sas: Make poll_aen_lock static
        scsi: sd_zbc: Improve report zones error printout
        scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI
        scsi: qla2xxx: unregister ports after GPN_FT failure
        scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan
        scsi: pm80xx: Remove unused include of linux/version.h
        scsi: pm80xx: fix logic to break out of loop when register value is 2 or 3
        scsi: scsi_transport_sas: Fix memory leak when removing devices
        scsi: lpfc: size cpu map by last cpu id set
        scsi: ibmvscsi_tgt: Remove unneeded variable rc
      138f371d
    • Linus Torvalds's avatar
      Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · a78f7cdd
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Nine cifs/smb3 fixes:
      
         - one fix for stable (oops during oplock break)
      
         - two timestamp fixes including important one for updating mtime at
           close to avoid stale metadata caching issue on dirty files (also
           improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the
           wire)
      
         - two fixes for "modefromsid" mount option for file create (now
           allows mode bits to be set more atomically and accurately on create
           by adding "sd_context" on create when modefromsid specified on
           mount)
      
         - two fixes for multichannel found in testing this week against
           different servers
      
         - two small cleanup patches"
      
      * tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: improve check for when we send the security descriptor context on create
        smb3: fix mode passed in on create for modetosid mount option
        cifs: fix possible uninitialized access and race on iface_list
        cifs: Fix lookup of SMB connections on multichannel
        smb3: query attributes on file close
        smb3: remove unused flag passed into close functions
        cifs: remove redundant assignment to pointer pneg_ctxt
        fs: cifs: Fix atime update check vs mtime
        CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
      a78f7cdd
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5bf9a06a
      Linus Torvalds authored
      Pull misc vfs cleanups from Al Viro:
       "No common topic, just three cleanups".
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        make __d_alloc() static
        fs/namespace: add __user to open_tree and move_mount syscalls
        fs/fnctl: fix missing __user in fcntl_rw_hint()
      5bf9a06a
    • Linus Torvalds's avatar
      Merge tag 'ntb-5.5' of git://github.com/jonmason/ntb · 9455d25f
      Linus Torvalds authored
      Pull NTB update from Jon Mason:
       "Just a simple patch to add a new Hygon Device ID to the AMD NTB device
        driver"
      
      * tag 'ntb-5.5' of git://github.com/jonmason/ntb:
        NTB: Add Hygon Device ID
      9455d25f
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 73721451
      Linus Torvalds authored
      Pull more input updates from Dmitry Torokhov:
      
       - fixups for Synaptics RMI4 driver
      
       - a quirk for Goodinx touchscreen on Teclast tablet
      
       - a new keycode definition for activating privacy screen feature found
         on a few "enterprise" laptops
      
       - updates to snvs_pwrkey driver
      
       - polling uinput device for writing (which is always allowed) now works
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
        Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
        Input: goodix - add upside-down quirk for Teclast X89 tablet
        Input: add privacy screen toggle keycode
        Input: uinput - fix returning EPOLLOUT from uinput_poll
        Input: snvs_pwrkey - remove gratuitous NULL initializers
        Input: snvs_pwrkey - send key events for i.MX6 S, DL and Q
      73721451
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 95207d55
      Linus Torvalds authored
      Pull iomap fixes from Darrick Wong:
       "Fix a race condition and a use-after-free error:
      
         - Fix a UAF when reporting writeback errors
      
         - Fix a race condition when handling page uptodate on fragmented file
           with blocksize < pagesize"
      
      * tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: stop using ioend after it's been freed in iomap_finish_ioend()
        iomap: fix sub-page uptodate handling
      95207d55
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 50caca9d
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Fix a couple of resource management errors and a hang:
      
         - fix a crash in the log setup code when log mounting fails
      
         - fix a hang when allocating space on the realtime device
      
         - fix a block leak when freeing space on the realtime device"
      
      * tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix mount failure crash on invalid iclog memory access
        xfs: don't check for AG deadlock for realtime files in bunmapi
        xfs: fix realtime file data space leak
      50caca9d
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 316933cf
      Linus Torvalds authored
      Pull orangefs update from Mike Marshall:
       "orangefs: posix open permission checking...
      
        Orangefs has no open, and orangefs checks file permissions on each
        file access. Posix requires that file permissions be checked on open
        and nowhere else. Orangefs-through-the-kernel needs to seem posix
        compliant.
      
        The VFS opens files, even if the filesystem provides no method. We can
        see if a file was successfully opened for read and or for write by
        looking at file->f_mode.
      
        When writes are flowing from the page cache, file is no longer
        available. We can trust the VFS to have checked file->f_mode before
        writing to the page cache.
      
        The mode of a file might change between when it is opened and IO
        commences, or it might be created with an arbitrary mode.
      
        We'll make sure we don't hit EACCES during the IO stage by using
        UID 0"
      
      [ This is "posixish", but not a great solution in the long run, since a
        proper secure network server shouldn't really trust the client like this.
        But proper and secure POSIX behavior requires an open method and a
        resulting cookie for IO of some kind, or similar.    - Linus ]
      
      * tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: posix open permission checking...
      316933cf
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux · 911d137a
      Linus Torvalds authored
      Pull nfsd updates from Bruce Fields:
       "This is a relatively quiet cycle for nfsd, mainly various bugfixes.
      
        Possibly most interesting is Trond's fixes for some callback races
        that were due to my incomplete understanding of rpc client shutdown.
        Unfortunately at the last minute I've started noticing a new
        intermittent failure to send callbacks. As the logic seems basically
        correct, I'm leaving Trond's patches in for now, and hope to find a
        fix in the next week so I don't have to revert those patches"
      
      * tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
        nfsd: depend on CRYPTO_MD5 for legacy client tracking
        NFSD fixing possible null pointer derefering in copy offload
        nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
        nfsd: Ensure CLONE persists data and metadata changes to the target file
        SUNRPC: Fix backchannel latency metrics
        nfsd: restore NFSv3 ACL support
        nfsd: v4 support requires CRYPTO_SHA256
        nfsd: Fix cld_net->cn_tfm initialization
        lockd: remove __KERNEL__ ifdefs
        sunrpc: remove __KERNEL__ ifdefs
        race in exportfs_decode_fh()
        nfsd: Drop LIST_HEAD where the variable it declares is never used.
        nfsd: document callback_wq serialization of callback code
        nfsd: mark cb path down on unknown errors
        nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
        nfsd: minor 4.1 callback cleanup
        SUNRPC: Fix svcauth_gss_proxy_init()
        SUNRPC: Trace gssproxy upcall results
        sunrpc: fix crash when cache_head become valid before update
        nfsd: remove private bin2hex implementation
        ...
      911d137a
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · fb9bf40c
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Highlights include:
      
        Features:
      
         - NFSv4.2 now supports cross device offloaded copy (i.e. offloaded
           copy of a file from one source server to a different target
           server).
      
         - New RDMA tracepoints for debugging congestion control and Local
           Invalidate WRs.
      
        Bugfixes and cleanups
      
         - Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for
           layoutreturn
      
         - Handle bad/dead sessions correctly in nfs41_sequence_process()
      
         - Various bugfixes to the delegation return operation.
      
         - Various bugfixes pertaining to delegations that have been revoked.
      
         - Cleanups to the NFS timespec code to avoid unnecessary conversions
           between timespec and timespec64.
      
         - Fix unstable RDMA connections after a reconnect
      
         - Close race between waking an RDMA sender and posting a receive
      
         - Wake pending RDMA tasks if connection fails
      
         - Fix MR list corruption, and clean up MR usage
      
         - Fix another RPCSEC_GSS issue with MIC buffer space"
      
      * tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
        SUNRPC: Capture completion of all RPC tasks
        SUNRPC: Fix another issue with MIC buffer space
        NFS4: Trace lock reclaims
        NFS4: Trace state recovery operation
        NFSv4.2 fix memory leak in nfs42_ssc_open
        NFSv4.2 fix kfree in __nfs42_copy_file_range
        NFS: remove duplicated include from nfs4file.c
        NFSv4: Make _nfs42_proc_copy_notify() static
        NFS: Fallocate should use the nfs4_fattr_bitmap
        NFS: Return -ETXTBSY when attempting to write to a swapfile
        fs: nfs: sysfs: Remove NULL check before kfree
        NFS: remove unneeded semicolon
        NFSv4: add declaration of current_stateid
        NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn
        NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()
        nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list
        SUNRPC: Avoid RPC delays when exiting suspend
        NFS: Add a tracepoint in nfs_fh_to_dentry()
        NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done()
        NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn
        ...
      fb9bf40c
  3. 07 Dec, 2019 25 commits