1. 05 Apr, 2019 40 commits
    • Rafael J. Wysocki's avatar
      PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove() · 9546c366
      Rafael J. Wysocki authored
      [ Upstream commit 95c80bc6 ]
      
      Dongdong reported a deadlock triggered by a hotplug event during a sysfs
      "remove" operation:
      
        pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
        # echo 1 > 0000:00:0c.0/remove
      
        PME and hotplug share an MSI/MSI-X vector.  The sysfs "remove" side is:
      
          remove_store
             pci_stop_and_remove_bus_device_locked
      	 pci_lock_rescan_remove
      	 pci_stop_and_remove_bus_device
      	   ...
      	   pcie_pme_remove
      	     pcie_pme_suspend
      	       synchronize_irq        # wait for hotplug IRQ handler
      	 pci_unlock_rescan_remove
      
        The hotplug side is:
      
          pciehp_ist
             pciehp_handle_presence_or_link_change
      	 pciehp_configure_device
      	   pci_lock_rescan_remove     # wait for pci_unlock_rescan_remove()
      
        INFO: task bash:10913 blocked for more than 120 seconds.
      
        # ps -ax |grep D
         PID TTY      STAT   TIME COMMAND
        10913 ttyAMA0  Ds+    0:00 -bash
        14022 ?        D      0:00 [irq/745-pciehp]
      
        # cat /proc/14022/stack
        __switch_to+0x94/0xd8
        pci_lock_rescan_remove+0x20/0x28
        pciehp_configure_device+0x30/0x140
        pciehp_handle_presence_or_link_change+0x324/0x458
        pciehp_ist+0x1dc/0x1e0
      
        # cat /proc/10913/stack
        __switch_to+0x94/0xd8
        synchronize_irq+0x8c/0xc0
        pcie_pme_suspend+0xa4/0x118
        pcie_pme_remove+0x20/0x40
        pcie_port_remove_service+0x3c/0x58
        ...
        pcie_port_device_remove+0x2c/0x48
        pcie_portdrv_remove+0x68/0x78
        pci_device_remove+0x48/0x120
        ...
        pci_stop_bus_device+0x84/0xc0
        pci_stop_and_remove_bus_device_locked+0x24/0x40
        remove_store+0xa4/0xb8
        dev_attr_store+0x44/0x60
        sysfs_kf_write+0x58/0x80
      
      It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
      reasons.
      
      First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
      native hotplug interrupt handler as well as for the PME one, because they
      share one IRQ (as per the spec).  That may deadlock if hotplug is signaled
      while pcie_pme_remove() is running and the latter calls
      pci_lock_rescan_remove() before the former.
      
      Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
      for the port, it will return without disabling the interrupt as expected by
      pcie_pme_remove() which was overlooked by commit c7b5a4e6 ("PCI / PM:
      Fix native PME handling during system suspend/resume").
      
      To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
      its status and prevent the PME worker function from re-enabling it before
      calling free_irq() on it, which should be sufficient.
      
      Fixes: c7b5a4e6 ("PCI / PM: Fix native PME handling during system suspend/resume")
      Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.comReported-by: default avatarDongdong Liu <liudongdong3@huawei.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bhelgaas: add URL and deadlock details from Dongdong]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9546c366
    • Tony Jones's avatar
      tools lib traceevent: Fix buffer overflow in arg_eval · 224c996e
      Tony Jones authored
      [ Upstream commit 7c5b019e ]
      
      Fix buffer overflow observed when running perf test.
      
      The overflow is when trying to evaluate "1ULL << (64 - 1)" which is
      resulting in -9223372036854775808 which overflows the 20 character
      buffer.
      
      If is possible this bug has been reported before but I still don't see
      any fix checked in:
      
      See: https://www.spinics.net/lists/linux-perf-users/msg07714.htmlReported-by: default avatarMichael Sartain <mikesart@fastmail.com>
      Reported-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Fixes: f7d82350 ("tools/events: Add files to create libtraceevent.a")
      Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.deSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      224c996e
    • Carlos Maiolino's avatar
      fs: fix guard_bio_eod to check for real EOD errors · 83c39533
      Carlos Maiolino authored
      [ Upstream commit dce30ca9 ]
      
      guard_bio_eod() can truncate a segment in bio to allow it to do IO on
      odd last sectors of a device.
      
      It already checks if the IO starts past EOD, but it does not consider
      the possibility of an IO request starting within device boundaries can
      contain more than one segment past EOD.
      
      In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
      underflow bvec->bv_len.
      
      Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
      
      This situation has been found on filesystems such as isofs and vfat,
      which doesn't check the device size before mount, if the device is
      smaller than the filesystem itself, a readahead on such filesystem,
      which spans EOD, can trigger this situation, leading a call to
      zero_user() with a wrong size possibly corrupting memory.
      
      I didn't see any crash, or didn't let the system run long enough to
      check if memory corruption will be hit somewhere, but adding
      instrumentation to guard_bio_end() to check truncated_bytes size, was
      enough to see the error.
      
      The following script can trigger the error.
      
      MNT=/mnt
      IMG=./DISK.img
      DEV=/dev/loop0
      
      mkfs.vfat $IMG
      mount $IMG $MNT
      cp -R /etc $MNT &> /dev/null
      umount $MNT
      
      losetup -D
      
      losetup --find --show --sizelimit 16247280 $IMG
      mount $DEV $MNT
      
      find $MNT -type f -exec cat {} + >/dev/null
      
      Kudos to Eric Sandeen for coming up with the reproducer above
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83c39533
    • luojiajun's avatar
      jbd2: fix invalid descriptor block checksum · 6a817a7a
      luojiajun authored
      [ Upstream commit 6e876c3d ]
      
      In jbd2_journal_commit_transaction(), if we are in abort mode,
      we may flush the buffer without setting descriptor block checksum
      by goto start_journal_io. Then fs is mounted,
      jbd2_descriptor_block_csum_verify() failed.
      
      [  271.379811] EXT4-fs (vdd): shut down requested (2)
      [  271.381827] Aborting journal on device vdd-8.
      [  271.597136] JBD2: Invalid checksum recovering block 22199 in log
      [  271.598023] JBD2: recovery failed
      [  271.598484] EXT4-fs (vdd): error loading journal
      
      Fix this problem by keep setting descriptor block checksum if the
      descriptor buffer is not NULL.
      
      This checksum problem can be reproduced by xfstests generic/388.
      Signed-off-by: default avatarluojiajun <luojiajun3@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a817a7a
    • Florian Westphal's avatar
      netfilter: conntrack: tcp: only close if RST matches exact sequence · ca66f667
      Florian Westphal authored
      [ Upstream commit be0502a3 ]
      
      TCP resets cause instant transition from established to closed state
      provided the reset is in-window.  Endpoints that implement RFC 5961
      require resets to match the next expected sequence number.
      RST segments that are in-window (but that do not match RCV.NXT) are
      ignored, and a "challenge ACK" is sent back.
      
      Main problem for conntrack is that its a middlebox, i.e.  whereas an end
      host might have ACK'd SEQ (and would thus accept an RST with this
      sequence number), conntrack might not have seen this ACK (yet).
      
      Therefore we can't simply flag RSTs with non-exact match as invalid.
      
      This updates RST processing as follows:
      
      1. If the connection is in a state other than ESTABLISHED, nothing is
         changed, RST is subject to normal in-window check.
      
      2. If the RSTs sequence number either matches exactly RCV.NXT,
         connection state moves to CLOSE.
      
      3. The same applies if the RST sequence number aligns with a previous
         packet in the same direction.
      
      In all other cases, the connection remains in ESTABLISHED state.
      If the normal-in-window check passes, the timeout will be lowered
      to that of CLOSE.
      
      If the peer sends a challenge ack, connection timeout will be reset.
      
      If the challenge ACK triggers another RST (RST was valid after all),
      this 2nd RST will match expected sequence and conntrack state changes to
      CLOSE.
      
      If no challenge ACK is received, the connection will time out after
      CLOSE seconds (10 seconds by default), just like without this patch.
      
      Packetdrill test case:
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      // Receive a segment.
      0.210 < P. 1:1001(1000) ack 1 win 46
      0.210 > . 1:1(0) ack 1001
      
      // Application writes 1000 bytes.
      0.250 write(4, ..., 1000) = 1000
      0.250 > P. 1:1001(1000) ack 1001
      
      // First reset, old sequence. Conntrack (correctly) considers this
      // invalid due to failed window validation (regardless of this patch).
      0.260 < R  2:2(0) ack 1001 win 260
      
      // 2nd reset, but too far ahead sequence.  Same: correctly handled
      // as invalid.
      0.270 < R 99990001:99990001(0) ack 1001 win 260
      
      // in-window, but not exact sequence.
      // Current Linux kernels might reply with a challenge ack, and do not
      // remove connection.
      // Without this patch, conntrack state moves to CLOSE.
      // With patch, timeout is lowered like CLOSE, but connection stays
      // in ESTABLISHED state.
      0.280 < R 1010:1010(0) ack 1001 win 260
      
      // Expect challenge ACK
      0.281 > . 1001:1001(0) ack 1001 win 501
      
      // With or without this patch, RST will cause connection
      // to move to CLOSE (sequence number matches)
      // 0.282 < R 1001:1001(0) ack 1001 win 260
      
      // ACK
      0.300 < . 1001:1001(0) ack 1001 win 257
      
      // more data could be exchanged here, connection
      // is still established
      
      // Client closes the connection.
      0.610 < F. 1001:1001(0) ack 1001 win 260
      0.650 > . 1001:1001(0) ack 1002
      
      // Close the connection without reading outstanding data
      0.700 close(4) = 0
      
      // so one more reset.  Will be deemed acceptable with patch as well:
      // connection is already closing.
      0.701 > R. 1001:1001(0) ack 1002 win 501
      // End packetdrill test case.
      
      With patch, this generates following conntrack events:
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      
      Without patch, first RST moves connection to close, whereas socket state
      does not change until FIN is received.
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca66f667
    • Li RongQing's avatar
      netfilter: nf_tables: check the result of dereferencing base_chain->stats · 709aaa09
      Li RongQing authored
      [ Upstream commit a9f5e78c ]
      
      Check the result of dereferencing base_chain->stats, instead of result
      of this_cpu_ptr with NULL.
      
      base_chain->stats maybe be changed to NULL when a chain is updated and a
      new NULL counter can be attached.
      
      And we do not need to check returning of this_cpu_ptr since
      base_chain->stats is from percpu allocator if it is non-NULL,
      this_cpu_ptr returns a valid value.
      
      And fix two sparse error by replacing rcu_access_pointer and
      rcu_dereference with READ_ONCE under rcu_read_lock.
      
      Thanks for Eric's help to finish this patch.
      
      Fixes: 00924094 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarZhang Yu <zhangyu31@baidu.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      709aaa09
    • Yao Liu's avatar
      cifs: Fix NULL pointer dereference of devname · 36a3219e
      Yao Liu authored
      [ Upstream commit 68e2672f ]
      
      There is a NULL pointer dereference of devname in strspn()
      
      The oops looks something like:
      
        CIFS: Attempting to mount (null)
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:strspn+0x0/0x50
        ...
        Call Trace:
         ? cifs_parse_mount_options+0x222/0x1710 [cifs]
         ? cifs_get_volume_info+0x2f/0x80 [cifs]
         cifs_setup_volume_info+0x20/0x190 [cifs]
         cifs_get_volume_info+0x50/0x80 [cifs]
         cifs_smb3_do_mount+0x59/0x630 [cifs]
         ? ida_alloc_range+0x34b/0x3d0
         cifs_do_mount+0x11/0x20 [cifs]
         mount_fs+0x52/0x170
         vfs_kern_mount+0x6b/0x170
         do_mount+0x216/0xdc0
         ksys_mount+0x83/0xd0
         __x64_sys_mount+0x25/0x30
         do_syscall_64+0x65/0x220
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix this by adding a NULL check on devname in cifs_parse_devname()
      Signed-off-by: default avatarYao Liu <yotta.liu@ucloud.cn>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      36a3219e
    • Namjae Jeon's avatar
      cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED · d579b4ea
      Namjae Jeon authored
      [ Upstream commit 969ae8e8 ]
      
      Old windows version or Netapp SMB server will return
      NT_STATUS_NOT_SUPPORTED since they do not allow or implement
      FSCTL_VALIDATE_NEGOTIATE_INFO. The client should accept the response
      provided it's properly signed.
      
      See
      https://blogs.msdn.microsoft.com/openspecification/2012/06/28/smb3-secure-dialect-negotiation/
      
      and
      
      MS-SMB2 validate negotiate response processing:
      https://msdn.microsoft.com/en-us/library/hh880630.aspx
      
      Samba client had already handled it.
      https://bugzilla.samba.org/attachment.cgi?id=13285&action=editSigned-off-by: default avatarNamjae Jeon <linkinjeon@gmail.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d579b4ea
    • Chao Yu's avatar
      f2fs: fix to check inline_xattr_size boundary correctly · 4ab78f4d
      Chao Yu authored
      [ Upstream commit 500e0b28 ]
      
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ab78f4d
    • Jason Cai (Xiang Feng)'s avatar
      dm thin: add sanity checks to thin-pool and external snapshot creation · 8c81fcd3
      Jason Cai (Xiang Feng) authored
      [ Upstream commit 70de2cbd ]
      
      Invoking dm_get_device() twice on the same device path with different
      modes is dangerous.  Because in that case, upgrade_mode() will alloc a
      new 'dm_dev' and free the old one, which may be referenced by a previous
      caller.  Dereferencing the dangling pointer will trigger kernel NULL
      pointer dereference.
      
      The following two cases can reproduce this issue.  Actually, they are
      invalid setups that must be disallowed, e.g.:
      
      1. Creating a thin-pool with read_only mode, and the same device as
      both metadata and data.
      
      dmsetup create thinp --table \
          "0 41943040 thin-pool /dev/vdb /dev/vdb 128 0 1 read_only"
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
      ...
      Call Trace:
       new_read+0xfb/0x110 [dm_bufio]
       dm_bm_read_lock+0x43/0x190 [dm_persistent_data]
       ? kmem_cache_alloc_trace+0x15c/0x1e0
       __create_persistent_data_objects+0x65/0x3e0 [dm_thin_pool]
       dm_pool_metadata_open+0x8c/0xf0 [dm_thin_pool]
       pool_ctr.cold.79+0x213/0x913 [dm_thin_pool]
       ? realloc_argv+0x50/0x70 [dm_mod]
       dm_table_add_target+0x14e/0x330 [dm_mod]
       table_load+0x122/0x2e0 [dm_mod]
       ? dev_status+0x40/0x40 [dm_mod]
       ctl_ioctl+0x1aa/0x3e0 [dm_mod]
       dm_ctl_ioctl+0xa/0x10 [dm_mod]
       do_vfs_ioctl+0xa2/0x600
       ? handle_mm_fault+0xda/0x200
       ? __do_page_fault+0x26c/0x4f0
       ksys_ioctl+0x60/0x90
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x55/0x150
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      2. Creating a external snapshot using the same thin-pool device.
      
      dmsetup create thinp --table \
          "0 41943040 thin-pool /dev/vdc /dev/vdb 128 0 2 ignore_discard"
      dmsetup message /dev/mapper/thinp 0 "create_thin 0"
      dmsetup create snap --table \
                  "0 204800 thin /dev/mapper/thinp 0 /dev/mapper/thinp"
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      ...
      Call Trace:
      ? __alloc_pages_nodemask+0x13c/0x2e0
      retrieve_status+0xa5/0x1f0 [dm_mod]
      ? dm_get_live_or_inactive_table.isra.7+0x20/0x20 [dm_mod]
       table_status+0x61/0xa0 [dm_mod]
       ctl_ioctl+0x1aa/0x3e0 [dm_mod]
       dm_ctl_ioctl+0xa/0x10 [dm_mod]
       do_vfs_ioctl+0xa2/0x600
       ksys_ioctl+0x60/0x90
       ? ksys_write+0x4f/0xb0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x55/0x150
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: default avatarJason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c81fcd3
    • Louis Taylor's avatar
      cifs: use correct format characters · 626d98bb
      Louis Taylor authored
      [ Upstream commit 259594be ]
      
      When compiling with -Wformat, clang emits the following warnings:
      
      fs/cifs/smb1ops.c:312:20: warning: format specifies type 'unsigned
      short' but the argument has type 'unsigned int' [-Wformat]
                               tgt_total_cnt, total_in_tgt);
                                              ^~~~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:289:4: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->flags, ref->server_type);
                       ^~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:289:16: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->flags, ref->server_type);
                                   ^~~~~~~~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:291:4: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->ref_flag, ref->path_consumed);
                       ^~~~~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:291:19: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->ref_flag, ref->path_consumed);
                                      ^~~~~~~~~~~~~~~~~~
      The types of these arguments are unconditionally defined, so this patch
      updates the format character to the correct ones for ints and unsigned
      ints.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: default avatarLouis Taylor <louis@kragniz.eu>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      626d98bb
    • Qian Cai's avatar
      page_poison: play nicely with KASAN · a6c56bf6
      Qian Cai authored
      [ Upstream commit 4117992d ]
      
      KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
      It triggers false positives in the allocation path:
      
        BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
        Read of size 8 at addr ffff88881f800000 by task swapper/0
        CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
        Call Trace:
         dump_stack+0xe0/0x19a
         print_address_description.cold.2+0x9/0x28b
         kasan_report.cold.3+0x7a/0xb5
         __asan_report_load8_noabort+0x19/0x20
         memchr_inv+0x2ea/0x330
         kernel_poison_pages+0x103/0x3d5
         get_page_from_freelist+0x15e7/0x4d90
      
      because KASAN has not yet unpoisoned the shadow page for allocation
      before it checks memchr_inv() but only found a stale poison pattern.
      
      Also, false positives in free path,
      
        BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
        Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1
        CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
        Call Trace:
         dump_stack+0xe0/0x19a
         print_address_description.cold.2+0x9/0x28b
         kasan_report.cold.3+0x7a/0xb5
         check_memory_region+0x22d/0x250
         memset+0x28/0x40
         kernel_poison_pages+0x29e/0x3d5
         __free_pages_ok+0x75f/0x13e0
      
      due to KASAN adds poisoned redzones around slab objects, but the page
      poisoning needs to poison the whole page.
      
      Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a6c56bf6
    • Shuriyc Chu's avatar
      fs/file.c: initialize init_files.resize_wait · d609ecd8
      Shuriyc Chu authored
      [ Upstream commit 5704a068 ]
      
      (Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647)
      
      'get_unused_fd_flags' in kthread cause kernel crash.  It works fine on
      4.1, but causes crash after get 64 fds.  It also cause crash on
      ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the
      same.
      
      The crash message on centos7.5 shows below:
      
        start fd 61
        start fd 62
        start fd 63
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: __wake_up_common+0x2e/0x90
        PGD 0
        Oops: 0000 [#1] SMP
        Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core
         virtio floppy dm_mirror dm_region_hash dm_log dm_mod
        CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.3.3.el7.x86_64 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
        task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000
        RIP: 0010:__wake_up_common+0x2e/0x90
        RSP: 0018:ffff8e94247a2d18  EFLAGS: 00010086
        RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0
        RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8
        R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8
        R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
        FS:  0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0
        Call Trace:
          __wake_up+0x39/0x50
          expand_files+0x131/0x250
          __alloc_fd+0x47/0x170
          get_unused_fd_flags+0x30/0x40
          test_fd+0x12a/0x1c0 [test]
          kthread+0xd1/0xe0
          ret_from_fork_nospec_begin+0x21/0x21
        Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 <48> 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef
        RIP   __wake_up_common+0x2e/0x90
         RSP <ffff8e94247a2d18>
        CR2: 0000000000000000
      
      This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4
      (3.10.0-693.21.1 ) is ok.  Root cause: the item 'resize_wait' is not
      initialized before being used.
      Reported-by: default avatarRichard Zhang <zhang.zijian@h3c.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d609ecd8
    • Sahitya Tummala's avatar
      f2fs: do not use mutex lock in atomic context · 9b4f2766
      Sahitya Tummala authored
      [ Upstream commit 9083977d ]
      
      Fix below warning coming because of using mutex lock in atomic context.
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
      in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
      Preemption disabled at: __radix_tree_preload+0x28/0x130
      Call trace:
       dump_backtrace+0x0/0x2b4
       show_stack+0x20/0x28
       dump_stack+0xa8/0xe0
       ___might_sleep+0x144/0x194
       __might_sleep+0x58/0x8c
       mutex_lock+0x2c/0x48
       f2fs_trace_pid+0x88/0x14c
       f2fs_set_node_page_dirty+0xd0/0x184
      
      Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
      spin_lock() acquired.
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9b4f2766
    • Jia Guo's avatar
      ocfs2: fix a panic problem caused by o2cb_ctl · 20141feb
      Jia Guo authored
      [ Upstream commit cc725ef3 ]
      
      In the process of creating a node, it will cause NULL pointer
      dereference in kernel if o2cb_ctl failed in the interval (mkdir,
      o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
      
      The node num is initialized to 0 in function o2nm_node_group_make_item,
      o2nm_node_group_drop_item will mistake the node number 0 for a valid
      node number when we delete the node before the node number is set
      correctly.  If the local node number of the current host happens to be
      0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
      o2hb_thread still running.  The panic stack is generated as follows:
      
        o2hb_thread
            \-o2hb_do_disk_heartbeat
                \-o2hb_check_own_slot
                    |-slot = &reg->hr_slots[o2nm_this_node()];
                    //o2nm_this_node() return O2NM_INVALID_NODE_NUM
      
      We need to check whether the node number is set when we delete the node.
      
      Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.comSigned-off-by: default avatarJia Guo <guojia12@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Acked-by: default avatarJun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      20141feb
    • Qian Cai's avatar
      mm/slab.c: kmemleak no scan alien caches · f09c424c
      Qian Cai authored
      [ Upstream commit 92d1d07d ]
      
      Kmemleak throws endless warnings during boot due to in
      __alloc_alien_cache(),
      
          alc = kmalloc_node(memsize, gfp, node);
          init_arraycache(&alc->ac, entries, batch);
          kmemleak_no_scan(ac);
      
      Kmemleak does not track the array cache (alc->ac) but the alien cache
      (alc) instead, so let it track the latter by lifting kmemleak_no_scan()
      out of init_arraycache().
      
      There is another place that calls init_arraycache(), but
      alloc_kmem_cache_cpus() uses the percpu allocation where will never be
      considered as a leak.
      
        kmemleak: Found object by alias at 0xffff8007b9aa7e38
        CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
        Call trace:
         dump_backtrace+0x0/0x168
         show_stack+0x24/0x30
         dump_stack+0x88/0xb0
         lookup_object+0x84/0xac
         find_and_get_object+0x84/0xe4
         kmemleak_no_scan+0x74/0xf4
         setup_kmem_cache_node+0x2b4/0x35c
         __do_tune_cpucache+0x250/0x2d4
         do_tune_cpucache+0x4c/0xe4
         enable_cpucache+0xc8/0x110
         setup_cpu_cache+0x40/0x1b8
         __kmem_cache_create+0x240/0x358
         create_cache+0xc0/0x198
         kmem_cache_create_usercopy+0x158/0x20c
         kmem_cache_create+0x50/0x64
         fsnotify_init+0x58/0x6c
         do_one_initcall+0x194/0x388
         kernel_init_freeable+0x668/0x688
         kernel_init+0x18/0x124
         ret_from_fork+0x10/0x18
        kmemleak: Object 0xffff8007b9aa7e00 (size 256):
        kmemleak:   comm "swapper/0", pid 1, jiffies 4294697137
        kmemleak:   min_count = 1
        kmemleak:   count = 0
        kmemleak:   flags = 0x1
        kmemleak:   checksum = 0
        kmemleak:   backtrace:
             kmemleak_alloc+0x84/0xb8
             kmem_cache_alloc_node_trace+0x31c/0x3a0
             __kmalloc_node+0x58/0x78
             setup_kmem_cache_node+0x26c/0x35c
             __do_tune_cpucache+0x250/0x2d4
             do_tune_cpucache+0x4c/0xe4
             enable_cpucache+0xc8/0x110
             setup_cpu_cache+0x40/0x1b8
             __kmem_cache_create+0x240/0x358
             create_cache+0xc0/0x198
             kmem_cache_create_usercopy+0x158/0x20c
             kmem_cache_create+0x50/0x64
             fsnotify_init+0x58/0x6c
             do_one_initcall+0x194/0x388
             kernel_init_freeable+0x668/0x688
             kernel_init+0x18/0x124
        kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38
        CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
        Call trace:
         dump_backtrace+0x0/0x168
         show_stack+0x24/0x30
         dump_stack+0x88/0xb0
         kmemleak_no_scan+0x90/0xf4
         setup_kmem_cache_node+0x2b4/0x35c
         __do_tune_cpucache+0x250/0x2d4
         do_tune_cpucache+0x4c/0xe4
         enable_cpucache+0xc8/0x110
         setup_cpu_cache+0x40/0x1b8
         __kmem_cache_create+0x240/0x358
         create_cache+0xc0/0x198
         kmem_cache_create_usercopy+0x158/0x20c
         kmem_cache_create+0x50/0x64
         fsnotify_init+0x58/0x6c
         do_one_initcall+0x194/0x388
         kernel_init_freeable+0x668/0x688
         kernel_init+0x18/0x124
         ret_from_fork+0x10/0x18
      
      Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw
      Fixes: 1fe00d50 ("slab: factor out initialization of array cache")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f09c424c
    • Uladzislau Rezki (Sony)'s avatar
      mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! · 8a0fc62e
      Uladzislau Rezki (Sony) authored
      [ Upstream commit afd07389 ]
      
      One of the vmalloc stress test case triggers the kernel BUG():
      
        <snip>
        [60.562151] ------------[ cut here ]------------
        [60.562154] kernel BUG at mm/vmalloc.c:512!
        [60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI
        [60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161
        [60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        [60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390
        <snip>
      
      it can happen due to big align request resulting in overflowing of
      calculated address, i.e.  it becomes 0 after ALIGN()'s fixup.
      
      Fix it by checking if calculated address is within vstart/vend range.
      
      Link: http://lkml.kernel.org/r/20190124115648.9433-2-urezki@gmail.comSigned-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8a0fc62e
    • Vlastimil Babka's avatar
      mm, mempolicy: fix uninit memory access · 67abbb9c
      Vlastimil Babka authored
      [ Upstream commit 2e25644e ]
      
      Syzbot with KMSAN reports (excerpt):
      
      ==================================================================
      BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:353 [inline]
      BUG: KMSAN: uninit-value in mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
      CPU: 1 PID: 17420 Comm: syz-executor4 Not tainted 4.20.0-rc7+ #15
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x173/0x1d0 lib/dump_stack.c:113
        kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
        __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:295
        mpol_rebind_policy mm/mempolicy.c:353 [inline]
        mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
        update_tasks_nodemask+0x608/0xca0 kernel/cgroup/cpuset.c:1120
        update_nodemasks_hier kernel/cgroup/cpuset.c:1185 [inline]
        update_nodemask kernel/cgroup/cpuset.c:1253 [inline]
        cpuset_write_resmask+0x2a98/0x34b0 kernel/cgroup/cpuset.c:1728
      
      ...
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
        kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
        kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
        kmem_cache_alloc+0x572/0xb90 mm/slub.c:2777
        mpol_new mm/mempolicy.c:276 [inline]
        do_mbind mm/mempolicy.c:1180 [inline]
        kernel_mbind+0x8a7/0x31a0 mm/mempolicy.c:1347
        __do_sys_mbind mm/mempolicy.c:1354 [inline]
      
      As it's difficult to report where exactly the uninit value resides in
      the mempolicy object, we have to guess a bit.  mm/mempolicy.c:353
      contains this part of mpol_rebind_policy():
      
              if (!mpol_store_user_nodemask(pol) &&
                  nodes_equal(pol->w.cpuset_mems_allowed, *newmask))
      
      "mpol_store_user_nodemask(pol)" is testing pol->flags, which I couldn't
      ever see being uninitialized after leaving mpol_new().  So I'll guess
      it's actually about accessing pol->w.cpuset_mems_allowed on line 354,
      but still part of statement starting on line 353.
      
      For w.cpuset_mems_allowed to be not initialized, and the nodes_equal()
      reachable for a mempolicy where mpol_set_nodemask() is called in
      do_mbind(), it seems the only possibility is a MPOL_PREFERRED policy
      with empty set of nodes, i.e.  MPOL_LOCAL equivalent, with MPOL_F_LOCAL
      flag.  Let's exclude such policies from the nodes_equal() check.  Note
      the uninit access should be benign anyway, as rebinding this kind of
      policy is always a no-op.  Therefore no actual need for stable
      inclusion.
      
      Link: http://lkml.kernel.org/r/a71997c3-e8ae-a787-d5ce-3db05768b27c@suse.cz
      Link: http://lkml.kernel.org/r/73da3e9c-cc84-509e-17d9-0c434bb9967d@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: syzbot+b19c2dc2c990ea657a71@syzkaller.appspotmail.com
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yisheng Xie <xieyisheng1@huawei.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      67abbb9c
    • Tetsuo Handa's avatar
      memcg: killed threads should not invoke memcg OOM killer · 9d785b92
      Tetsuo Handa authored
      [ Upstream commit 7775face ]
      
      If a memory cgroup contains a single process with many threads
      (including different process group sharing the mm) then it is possible
      to trigger a race when the oom killer complains that there are no oom
      elible tasks and complain into the log which is both annoying and
      confusing because there is no actual problem.  The race looks as
      follows:
      
      P1				oom_reaper		P2
      try_charge						try_charge
        mem_cgroup_out_of_memory
          mutex_lock(oom_lock)
            out_of_memory
              oom_kill_process(P1,P2)
               wake_oom_reaper
          mutex_unlock(oom_lock)
          				oom_reap_task
      							  mutex_lock(oom_lock)
      							    select_bad_process # no victim
      
      The problem is more visible with many threads.
      
      Fix this by checking for fatal_signal_pending from
      mem_cgroup_out_of_memory when the oom_lock is already held.
      
      The oom bypass is safe because we do the same early in the try_charge
      path already.  The situation migh have changed in the mean time.  It
      should be safe to check for fatal_signal_pending and tsk_is_oom_victim
      but for a better code readability abstract the current charge bypass
      condition into should_force_charge and reuse it from that path.  "
      
      Link: http://lkml.kernel.org/r/01370f70-e1f6-ebe4-b95e-0df21a0bc15e@i-love.sakura.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9d785b92
    • Tetsuo Handa's avatar
      mm,oom: don't kill global init via memory.oom.group · eed3ca0a
      Tetsuo Handa authored
      [ Upstream commit d342a0b3 ]
      
      Since setting global init process to some memory cgroup is technically
      possible, oom_kill_memcg_member() must check it.
      
        Tasks in /test1 are going to be killed due to memory.oom.group set
        Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
        oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
      
      #include <stdio.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      
      int main(int argc, char *argv[])
      {
      	static char buffer[10485760];
      	static int pipe_fd[2] = { EOF, EOF };
      	unsigned int i;
      	int fd;
      	char buf[64] = { };
      	if (pipe(pipe_fd))
      		return 1;
      	if (chdir("/sys/fs/cgroup/"))
      		return 1;
      	fd = open("cgroup.subtree_control", O_WRONLY);
      	write(fd, "+memory", 7);
      	close(fd);
      	mkdir("test1", 0755);
      	fd = open("test1/memory.oom.group", O_WRONLY);
      	write(fd, "1", 1);
      	close(fd);
      	fd = open("test1/cgroup.procs", O_WRONLY);
      	write(fd, "1", 1);
      	snprintf(buf, sizeof(buf) - 1, "%d", getpid());
      	write(fd, buf, strlen(buf));
      	close(fd);
      	snprintf(buf, sizeof(buf) - 1, "%lu", sizeof(buffer) * 5);
      	fd = open("test1/memory.max", O_WRONLY);
      	write(fd, buf, strlen(buf));
      	close(fd);
      	for (i = 0; i < 10; i++)
      		if (fork() == 0) {
      			char c;
      			close(pipe_fd[1]);
      			read(pipe_fd[0], &c, 1);
      			memset(buffer, 0, sizeof(buffer));
      			sleep(3);
      			_exit(0);
      		}
      	close(pipe_fd[0]);
      	close(pipe_fd[1]);
      	sleep(3);
      	return 0;
      }
      
      [   37.052923][ T9185] a.out invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
      [   37.056169][ T9185] CPU: 4 PID: 9185 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
      [   37.059205][ T9185] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
      [   37.062954][ T9185] Call Trace:
      [   37.063976][ T9185]  dump_stack+0x67/0x95
      [   37.065263][ T9185]  dump_header+0x51/0x570
      [   37.066619][ T9185]  ? trace_hardirqs_on+0x3f/0x110
      [   37.068171][ T9185]  ? _raw_spin_unlock_irqrestore+0x3d/0x70
      [   37.069967][ T9185]  oom_kill_process+0x18d/0x210
      [   37.071515][ T9185]  out_of_memory+0x11b/0x380
      [   37.072936][ T9185]  mem_cgroup_out_of_memory+0xb6/0xd0
      [   37.074601][ T9185]  try_charge+0x790/0x820
      [   37.076021][ T9185]  mem_cgroup_try_charge+0x42/0x1d0
      [   37.077629][ T9185]  mem_cgroup_try_charge_delay+0x11/0x30
      [   37.079370][ T9185]  do_anonymous_page+0x105/0x5e0
      [   37.080939][ T9185]  __handle_mm_fault+0x9cb/0x1070
      [   37.082485][ T9185]  handle_mm_fault+0x1b2/0x3a0
      [   37.083819][ T9185]  ? handle_mm_fault+0x47/0x3a0
      [   37.085181][ T9185]  __do_page_fault+0x255/0x4c0
      [   37.086529][ T9185]  do_page_fault+0x28/0x260
      [   37.087788][ T9185]  ? page_fault+0x8/0x30
      [   37.088978][ T9185]  page_fault+0x1e/0x30
      [   37.090142][ T9185] RIP: 0033:0x7f8b183aefe0
      [   37.091433][ T9185] Code: 20 f3 44 0f 7f 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f 84 00 00 00 00 00 <66> 44 0f 7f 01 66 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41
      [   37.096917][ T9185] RSP: 002b:00007fffc5d329e8 EFLAGS: 00010206
      [   37.098615][ T9185] RAX: 00000000006010e0 RBX: 0000000000000008 RCX: 0000000000c30000
      [   37.100905][ T9185] RDX: 00000000010010c0 RSI: 0000000000000000 RDI: 00000000006010e0
      [   37.103349][ T9185] RBP: 0000000000000000 R08: 00007f8b188f4740 R09: 0000000000000000
      [   37.105797][ T9185] R10: 00007fffc5d32420 R11: 00007f8b183aef40 R12: 0000000000000005
      [   37.108228][ T9185] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000
      [   37.110840][ T9185] memory: usage 51200kB, limit 51200kB, failcnt 125
      [   37.113045][ T9185] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
      [   37.115808][ T9185] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
      [   37.117660][ T9185] Memory cgroup stats for /test1: cache:0KB rss:49484KB rss_huge:30720KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:49700KB inactive_file:0KB active_file:0KB unevictable:0KB
      [   37.123371][ T9185] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=9188,uid=0
      [   37.128158][ T9185] Memory cgroup out of memory: Killed process 9188 (a.out) total-vm:14456kB, anon-rss:10324kB, file-rss:504kB, shmem-rss:0kB
      [   37.132710][ T9185] Tasks in /test1 are going to be killed due to memory.oom.group set
      [   37.132833][   T54] oom_reaper: reaped process 9188 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.135498][ T9185] Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
      [   37.143434][ T9185] Memory cgroup out of memory: Killed process 9182 (a.out) total-vm:14456kB, anon-rss:76kB, file-rss:588kB, shmem-rss:0kB
      [   37.144328][   T54] oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.147585][ T9185] Memory cgroup out of memory: Killed process 9183 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
      [   37.157222][ T9185] Memory cgroup out of memory: Killed process 9184 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:508kB, shmem-rss:0kB
      [   37.157259][ T9185] Memory cgroup out of memory: Killed process 9185 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
      [   37.157291][ T9185] Memory cgroup out of memory: Killed process 9186 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:508kB, shmem-rss:0kB
      [   37.157306][   T54] oom_reaper: reaped process 9183 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.157328][ T9185] Memory cgroup out of memory: Killed process 9187 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
      [   37.157452][ T9185] Memory cgroup out of memory: Killed process 9189 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
      [   37.158733][ T9185] Memory cgroup out of memory: Killed process 9190 (a.out) total-vm:14456kB, anon-rss:552kB, file-rss:512kB, shmem-rss:0kB
      [   37.160083][   T54] oom_reaper: reaped process 9186 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.160187][   T54] oom_reaper: reaped process 9189 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.206941][   T54] oom_reaper: reaped process 9185 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.212300][ T9185] Memory cgroup out of memory: Killed process 9191 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
      [   37.212317][   T54] oom_reaper: reaped process 9190 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.218860][ T9185] Memory cgroup out of memory: Killed process 9192 (a.out) total-vm:14456kB, anon-rss:1080kB, file-rss:512kB, shmem-rss:0kB
      [   37.227667][   T54] oom_reaper: reaped process 9192 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [   37.292323][ T9193] abrt-hook-ccpp (9193) used greatest stack depth: 10480 bytes left
      [   37.351843][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
      [   37.354833][    T1] CPU: 7 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
      [   37.357876][    T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
      [   37.361685][    T1] Call Trace:
      [   37.363239][    T1]  dump_stack+0x67/0x95
      [   37.365010][    T1]  panic+0xfc/0x2b0
      [   37.366853][    T1]  do_exit+0xd55/0xd60
      [   37.368595][    T1]  do_group_exit+0x47/0xc0
      [   37.370415][    T1]  get_signal+0x32a/0x920
      [   37.372449][    T1]  ? _raw_spin_unlock_irqrestore+0x3d/0x70
      [   37.374596][    T1]  do_signal+0x32/0x6e0
      [   37.376430][    T1]  ? exit_to_usermode_loop+0x26/0x9b
      [   37.378418][    T1]  ? prepare_exit_to_usermode+0xa8/0xd0
      [   37.380571][    T1]  exit_to_usermode_loop+0x3e/0x9b
      [   37.382588][    T1]  prepare_exit_to_usermode+0xa8/0xd0
      [   37.384594][    T1]  ? page_fault+0x8/0x30
      [   37.386453][    T1]  retint_user+0x8/0x18
      [   37.388160][    T1] RIP: 0033:0x7f42c06974a8
      [   37.389922][    T1] Code: Bad RIP value.
      [   37.391788][    T1] RSP: 002b:00007ffc3effd388 EFLAGS: 00010213
      [   37.394075][    T1] RAX: 000000000000000e RBX: 00007ffc3effd390 RCX: 0000000000000000
      [   37.396963][    T1] RDX: 000000000000002a RSI: 00007ffc3effd390 RDI: 0000000000000004
      [   37.399550][    T1] RBP: 00007ffc3effd680 R08: 0000000000000000 R09: 0000000000000000
      [   37.402334][    T1] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000001
      [   37.404890][    T1] R13: ffffffffffffffff R14: 0000000000000884 R15: 000056460b1ac3b0
      
      Link: http://lkml.kernel.org/r/201902010336.x113a4EO027170@www262.sakura.ne.jp
      Fixes: 3d8b38eb ("mm, oom: introduce memory.oom.group")
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eed3ca0a
    • Daniel Jordan's avatar
      mm, swap: bounds check swap_info array accesses to avoid NULL derefs · ed3345a6
      Daniel Jordan authored
      [ Upstream commit c10d38cc ]
      
      Dan Carpenter reports a potential NULL dereference in
      get_swap_page_of_type:
      
        Smatch complains that the NULL checks on "si" aren't consistent.  This
        seems like a real bug because we have not ensured that the type is
        valid and so "si" can be NULL.
      
      Add the missing check for NULL, taking care to use a read barrier to
      ensure CPU1 observes CPU0's updates in the correct order:
      
           CPU0                           CPU1
           alloc_swap_info()              if (type >= nr_swapfiles)
             swap_info[type] = p              /* handle invalid entry */
             smp_wmb()                    smp_rmb()
             ++nr_swapfiles               p = swap_info[type]
      
      Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
      CPU0's write to swap_info[type] and read NULL from swap_info[type].
      
      Ying Huang noticed other places in swapfile.c don't order these reads
      properly.  Introduce swap_type_to_swap_info to encourage correct usage.
      
      Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
      (see tools/memory-model/Documentation/explanation.txt).
      
      This ordering need not be enforced in places where swap_lock is held
      (e.g.  si_swapinfo) because swap_lock serializes updates to nr_swapfiles
      and the swap_info array.
      
      Link: http://lkml.kernel.org/r/20190131024410.29859-1-daniel.m.jordan@oracle.com
      Fixes: ec8acf20 ("swap: add per-partition lock for swapfile")
      Signed-off-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Suggested-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarAndrea Parri <andrea.parri@amarulasolutions.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed3345a6
    • Qian Cai's avatar
      mm/page_ext.c: fix an imbalance with kmemleak · 4c6d7dc7
      Qian Cai authored
      [ Upstream commit 0c815854 ]
      
      After offlining a memory block, kmemleak scan will trigger a crash, as
      it encounters a page ext address that has already been freed during
      memory offlining.  At the beginning in alloc_page_ext(), it calls
      kmemleak_alloc(), but it does not call kmemleak_free() in
      free_page_ext().
      
          BUG: unable to handle kernel paging request at ffff888453d00000
          PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
          Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
          CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
          Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
          RIP: 0010:scan_block+0xb5/0x290
          Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
          RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
          RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
          RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
          RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
          R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
          R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
          FS:  00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
          Call Trace:
           scan_gray_list+0x269/0x430
           kmemleak_scan+0x5a8/0x10f0
           kmemleak_write+0x541/0x6ca
           full_proxy_write+0xf8/0x190
           __vfs_write+0xeb/0x980
           vfs_write+0x15a/0x4f0
           ksys_write+0xd2/0x1b0
           __x64_sys_write+0x73/0xb0
           do_syscall_64+0xeb/0xaaa
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f6c0dad73b8
          Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
          RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
          RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
          RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
          RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
          R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
          R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
          Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
          CR2: ffff888453d00000
          ---[ end trace ccf646c7456717c5 ]---
          Kernel panic - not syncing: Fatal exception
          Shutting down cpus with NMI
          Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
          0xffffffff80000000-0xffffffffbfffffff)
          ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4c6d7dc7
    • Peng Fan's avatar
      mm/cma.c: cma_declare_contiguous: correct err handling · f555b008
      Peng Fan authored
      [ Upstream commit 0d3bd18a ]
      
      In case cma_init_reserved_mem failed, need to free the memblock
      allocated by memblock_reserve or memblock_alloc_range.
      
      Quote Catalin's comments:
        https://lkml.org/lkml/2019/2/26/482
      
      Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
      ignores the memblock_reserve() as a memblock_alloc() implementation
      detail. It is, however, tolerant to memblock_free() being called on
      a sub-range or just a different range from a previous memblock_alloc().
      So the original patch looks fine to me. FWIW:
      
      Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.comSigned-off-by: default avatarPeng Fan <peng.fan@nxp.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f555b008
    • Qian Cai's avatar
      mm/sparse: fix a bad comparison · 7b287c47
      Qian Cai authored
      [ Upstream commit d778015a ]
      
      next_present_section_nr() could only return an unsigned number -1, so
      just check it specifically where compilers will convert -1 to unsigned
      if needed.
      
        mm/sparse.c: In function 'sparse_init_nid':
        mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
               ((section_nr >= 0) &&    \
                            ^~
        mm/sparse.c:478:2: note: in expansion of macro
        'for_each_present_section_nr'
          for_each_present_section_nr(pnum_begin, pnum) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
        mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
               ((section_nr >= 0) &&    \
                            ^~
        mm/sparse.c:497:2: note: in expansion of macro
        'for_each_present_section_nr'
          for_each_present_section_nr(pnum_begin, pnum) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
        mm/sparse.c: In function 'sparse_init':
        mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
               ((section_nr >= 0) &&    \
                            ^~
        mm/sparse.c:520:2: note: in expansion of macro
        'for_each_present_section_nr'
          for_each_present_section_nr(pnum_begin + 1, pnum_end) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
      Fixes: c4e1be9e ("mm, sparsemem: break out of loops early")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7b287c47
    • Jiri Olsa's avatar
      perf c2c: Fix c2c report for empty numa node · aea8c971
      Jiri Olsa authored
      [ Upstream commit e34c9402 ]
      
      Ravi Bangoria reported that we fail with an empty NUMA node with the
      following message:
      
        $ lscpu
        NUMA node0 CPU(s):
        NUMA node1 CPU(s):   0-4
      
        $ sudo ./perf c2c report
        node/cpu topology bugFailed setup nodes
      
      Fix this by detecting the empty node and keeping its CPU set empty.
      Reported-by: default avatarNageswara R Sastry <nasastry@in.ibm.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190305152536.21035-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      aea8c971
    • Kairui Song's avatar
      x86/hyperv: Fix kernel panic when kexec on HyperV · c3f28d59
      Kairui Song authored
      [ Upstream commit 179fb36a ]
      
      After commit 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments"),
      kexec fails with a kernel panic:
      
      kexec_core: Starting new kernel
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
      RIP: 0010:0xffffc9000001d000
      
      Call Trace:
       ? __send_ipi_mask+0x1c6/0x2d0
       ? hv_send_ipi_mask_allbutself+0x6d/0xb0
       ? mp_save_irq+0x70/0x70
       ? __ioapic_read_entry+0x32/0x50
       ? ioapic_read_entry+0x39/0x50
       ? clear_IO_APIC_pin+0xb8/0x110
       ? native_stop_other_cpus+0x6e/0x170
       ? native_machine_shutdown+0x22/0x40
       ? kernel_kexec+0x136/0x156
      
      That happens if hypercall based IPIs are used because the hypercall page is
      reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
      which invokes the hypercall and dereferences the unusable page.
      
      To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
      any misuse, IPI sending will fall back to the non hypercall based
      method. This only happens on kexec / kdump so just setting the pointer to
      NULL is good enough.
      
      Fixes: 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments")
      Signed-off-by: default avatarKairui Song <kasong@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: devel@linuxdriverproject.org
      Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      c3f28d59
    • Linus Torvalds's avatar
      iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver · 3e8d6221
      Linus Torvalds authored
      [ Upstream commit e0f0ae83 ]
      
      The pm8xxx_get_channel() implementation is unclear, and causes gcc to
      suddenly generate odd warnings.  The trigger for the warning (at least
      for me) was the entirely unrelated commit 79a4e91d ("device.h: Add
      __cold to dev_<level> logging functions"), which apparently changes gcc
      code generation in the caller function enough to cause this:
      
        drivers/iio/adc/qcom-pm8xxx-xoadc.c: In function ‘pm8xxx_xoadc_probe’:
        drivers/iio/adc/qcom-pm8xxx-xoadc.c:633:8: warning: ‘ch’ may be used uninitialized in this function [-Wmaybe-uninitialized]
          ret = pm8xxx_read_channel_rsv(adc, ch, AMUX_RSV4,
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                   &read_nomux_rsv4, true);
                   ~~~~~~~~~~~~~~~~~~~~~~~
        drivers/iio/adc/qcom-pm8xxx-xoadc.c:426:27: note: ‘ch’ was declared here
          struct pm8xxx_chan_info *ch;
                                   ^~
      
      because gcc for some reason then isn't able to see that the termination
      condition for the "for( )" loop in that function is also the condition
      for returning NULL.
      
      So it's not _actually_ uninitialized, but the function is admittedly
      just unnecessarily oddly written.
      
      Simplify and clarify the function, making gcc also see that it always
      returns a valid initialized value.
      
      Cc: Joe Perches <joe@perches.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Andy Gross <andy.gross@linaro.org>
      Cc: David Brown <david.brown@linaro.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Hartmut Knaack <knaack.h@gmx.de>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3e8d6221
    • Xiang Chen's avatar
      scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO · e27cced3
      Xiang Chen authored
      [ Upstream commit 47905957 ]
      
      For internal IO and SMP IO, there is a time-out timer for them. In the
      timer handler, it checks whether IO is done according to the flag
      task->task_state_lock.
      
      There is an issue which may cause system suspended: internal IO or SMP IO
      is sent, but at that time because of hardware exception (such as inject
      2Bit ECC error), so IO is not completed and also not timeout. But, at that
      time, the SAS controller reset occurs to recover system. It will release
      the resource and set the status of IO to be SAS_TASK_STATE_DONE, so when IO
      timeout, it will never complete the completion of IO and wait for ever.
      
      [  729.123632] Call trace:
      [  729.126791] [<ffff00000808655c>] __switch_to+0x94/0xa8
      [  729.133106] [<ffff000008d96e98>] __schedule+0x1e8/0x7fc
      [  729.138975] [<ffff000008d974e0>] schedule+0x34/0x8c
      [  729.144401] [<ffff000008d9b000>] schedule_timeout+0x1d8/0x3cc
      [  729.150690] [<ffff000008d98218>] wait_for_common+0xdc/0x1a0
      [  729.157101] [<ffff000008d98304>] wait_for_completion+0x28/0x34
      [  729.165973] [<ffff000000dcefb4>] hisi_sas_internal_task_abort+0x2a0/0x424 [hisi_sas_test_main]
      [  729.176447] [<ffff000000dd18f4>] hisi_sas_abort_task+0x244/0x2d8 [hisi_sas_test_main]
      [  729.185258] [<ffff000008971714>] sas_eh_handle_sas_errors+0x1c8/0x7b8
      [  729.192391] [<ffff000008972774>] sas_scsi_recover_host+0x130/0x398
      [  729.199237] [<ffff00000894d8a8>] scsi_error_handler+0x148/0x5c0
      [  729.206009] [<ffff0000080f4118>] kthread+0x10c/0x138
      [  729.211563] [<ffff0000080855dc>] ret_from_fork+0x10/0x18
      
      To solve the issue, callback function task_done of those IOs need to be
      called when on SAS controller reset.
      Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e27cced3
    • John Garry's avatar
      scsi: hisi_sas: Set PHY linkrate when disconnected · fce6aeaf
      John Garry authored
      [ Upstream commit efdcad62 ]
      
      When the PHY comes down, we currently do not set the negotiated linkrate:
      
      root@(none)$ pwd
      /sys/class/sas_phy/phy-0:0
      root@(none)$ more enable
      1
      root@(none)$ more negotiated_linkrate
      12.0 Gbit
      root@(none)$ echo 0 > enable
      root@(none)$ more negotiated_linkrate
      12.0 Gbit
      root@(none)$
      
      This patch fixes the driver code to set it properly when the PHY comes
      down.
      
      If the PHY had been enabled, then set unknown; otherwise, flag as disabled.
      
      The logical place to set the negotiated linkrate for this scenario is PHY
      down routine, which is called from the PHY down ISR.
      
      However, it is not possible to know if the PHY comes down due to PHY
      disable or loss of link, as sas_phy.enabled member is not set until after
      the transport disable routine is complete, which races with the PHY down
      ISR.
      
      As an imperfect solution, use sas_phy_data.enable as the flag to know if
      the PHY is down due to disable. It's imperfect, as sas_phy_data is internal
      to libsas.
      
      I can't see another way without adding a new field to hisi_sas_phy and
      managing it, or changing SCSI SAS transport.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fce6aeaf
    • Stanislav Fomichev's avatar
      libbpf: force fixdep compilation at the start of the build · e21f655c
      Stanislav Fomichev authored
      [ Upstream commit 8e268887 ]
      
      libbpf targets don't explicitly depend on fixdep target, so when
      we do 'make -j$(nproc)', there is a high probability, that some
      objects will be built before fixdep binary is available.
      
      Fix this by running sub-make; this makes sure that fixdep dependency
      is properly accounted for.
      
      For the same issue in perf, see commit abb26210 ("perf tools: Force
      fixdep compilation at the start of the build").
      
      Before:
      
      $ rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld -C tools/lib/bpf/
      
      Auto-detecting system features:
      ...                        libelf: [ on  ]
      ...                           bpf: [ on  ]
      
        HOSTCC   /tmp/bld/fixdep.o
        CC       /tmp/bld/libbpf.o
        CC       /tmp/bld/bpf.o
        CC       /tmp/bld/btf.o
        CC       /tmp/bld/nlattr.o
        CC       /tmp/bld/libbpf_errno.o
        CC       /tmp/bld/str_error.o
        CC       /tmp/bld/netlink.o
        CC       /tmp/bld/bpf_prog_linfo.o
        CC       /tmp/bld/libbpf_probes.o
        CC       /tmp/bld/xsk.o
        HOSTLD   /tmp/bld/fixdep-in.o
        LINK     /tmp/bld/fixdep
        LD       /tmp/bld/libbpf-in.o
        LINK     /tmp/bld/libbpf.a
        LINK     /tmp/bld/libbpf.so
        LINK     /tmp/bld/test_libbpf
      
      $ head /tmp/bld/.libbpf.o.cmd
       # cannot find fixdep (/usr/local/google/home/sdf/src/linux/xxx//fixdep)
       # using basic dep data
      
      /tmp/bld/libbpf.o: libbpf.c /usr/include/stdc-predef.h \
       /usr/include/stdlib.h /usr/include/features.h \
       /usr/include/x86_64-linux-gnu/sys/cdefs.h \
       /usr/include/x86_64-linux-gnu/bits/wordsize.h \
       /usr/include/x86_64-linux-gnu/gnu/stubs.h \
       /usr/include/x86_64-linux-gnu/gnu/stubs-64.h \
       /usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h \
      
      After:
      
      $ rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld -C tools/lib/bpf/
      
      Auto-detecting system features:
      ...                        libelf: [ on  ]
      ...                           bpf: [ on  ]
      
        HOSTCC   /tmp/bld/fixdep.o
        HOSTLD   /tmp/bld/fixdep-in.o
        LINK     /tmp/bld/fixdep
        CC       /tmp/bld/libbpf.o
        CC       /tmp/bld/bpf.o
        CC       /tmp/bld/nlattr.o
        CC       /tmp/bld/btf.o
        CC       /tmp/bld/libbpf_errno.o
        CC       /tmp/bld/str_error.o
        CC       /tmp/bld/netlink.o
        CC       /tmp/bld/bpf_prog_linfo.o
        CC       /tmp/bld/libbpf_probes.o
        CC       /tmp/bld/xsk.o
        LD       /tmp/bld/libbpf-in.o
        LINK     /tmp/bld/libbpf.a
        LINK     /tmp/bld/libbpf.so
        LINK     /tmp/bld/test_libbpf
      
      $ head /tmp/bld/.libbpf.o.cmd
      cmd_/tmp/bld/libbpf.o := gcc -Wp,-MD,/tmp/bld/.libbpf.o.d -Wp,-MT,/tmp/bld/libbpf.o -g -Wall -DHAVE_LIBELF_MMAP_SUPPORT -DCOMPAT_NEED_REALLOCARRAY -Wbad-function-cast -Wdeclaration-after-statement -Wformat-security -Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat -Wstrict-aliasing=3 -Werror -Wall -fPIC -I. -I/usr/local/google/home/sdf/src/linux/tools/include -I/usr/local/google/home/sdf/src/linux/tools/arch/x86/include/uapi -I/usr/local/google/home/sdf/src/linux/tools/include/uapi -fvisibility=hidden -D"BUILD_STR(s)=$(pound)s" -c -o /tmp/bld/libbpf.o libbpf.c
      
      source_/tmp/bld/libbpf.o := libbpf.c
      
      deps_/tmp/bld/libbpf.o := \
        /usr/include/stdc-predef.h \
        /usr/include/stdlib.h \
        /usr/include/features.h \
        /usr/include/x86_64-linux-gnu/sys/cdefs.h \
        /usr/include/x86_64-linux-gnu/bits/wordsize.h \
      
      Fixes: 7c422f55 ("tools build: Build fixdep helper from perf and basic libs")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e21f655c
    • Arnd Bergmann's avatar
      enic: fix build warning without CONFIG_CPUMASK_OFFSTACK · 60483306
      Arnd Bergmann authored
      [ Upstream commit 43d28166 ]
      
      The enic driver relies on the CONFIG_CPUMASK_OFFSTACK feature to
      dynamically allocate a struct member, but this is normally intended for
      local variables.
      
      Building with clang, I get a warning for a few locations that check the
      address of the cpumask_var_t:
      
      drivers/net/ethernet/cisco/enic/enic_main.c:122:22: error: address of array 'enic->msix[i].affinity_mask' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
      
      As far as I can tell, the code is still correct, as the truth value of
      the pointer is what we need in this configuration. To get rid of
      the warning, use cpumask_available() instead of checking the
      pointer directly.
      
      Fixes: 322cf7e3 ("enic: assign affinity hint to interrupts")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      60483306
    • Nathan Chancellor's avatar
      net: stmmac: Avoid sometimes uninitialized Clang warnings · 9ec4860d
      Nathan Chancellor authored
      [ Upstream commit df103170 ]
      
      When building with -Wsometimes-uninitialized, Clang warns:
      
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:495:3: warning: variable 'ns' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:495:3: warning: variable 'ns' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:532:3: warning: variable 'ns' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:532:3: warning: variable 'ns' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:741:3: warning: variable 'sec_inc' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:741:3: warning: variable 'sec_inc' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
      
      Clang is concerned with the use of stmmac_do_void_callback (which
      stmmac_get_timestamp and stmmac_config_sub_second_increment wrap),
      as it may fail to initialize these values if the if condition was ever
      false (meaning the callbacks don't exist). It's not wrong because the
      callbacks (get_timestamp and config_sub_second_increment respectively)
      are the ones that initialize the variables. While it's unlikely that the
      callbacks are ever going to disappear and make that condition false, we
      can easily avoid this warning by zero initialize the variables.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/384Suggested-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9ec4860d
    • Christian Brauner's avatar
      sysctl: handle overflow for file-max · b227f157
      Christian Brauner authored
      [ Upstream commit 32a5ad9c ]
      
      Currently, when writing
      
        echo 18446744073709551616 > /proc/sys/fs/file-max
      
      /proc/sys/fs/file-max will overflow and be set to 0.  That quickly
      crashes the system.
      
      This commit sets the max and min value for file-max.  The max value is
      set to long int.  Any higher value cannot currently be used as the
      percpu counters are long ints and not unsigned integers.
      
      Note that the file-max value is ultimately parsed via
      __do_proc_doulongvec_minmax().  This function does not report error when
      min or max are exceeded.  Which means if a value largen that long int is
      written userspace will not receive an error instead the old value will be
      kept.  There is an argument to be made that this should be changed and
      __do_proc_doulongvec_minmax() should return an error when a dedicated min
      or max value are exceeded.  However this has the potential to break
      userspace so let's defer this to an RFC patch.
      
      Link: http://lkml.kernel.org/r/20190107222700.15954-3-christian@brauner.ioSigned-off-by: default avatarChristian Brauner <christian@brauner.io>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      [christian@brauner.io: v4]
        Link: http://lkml.kernel.org/r/20190210203943.8227-3-christian@brauner.ioSigned-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b227f157
    • Luc Van Oostenryck's avatar
      include/linux/relay.h: fix percpu annotation in struct rchan · d6ad08aa
      Luc Van Oostenryck authored
      [ Upstream commit 62461ac2 ]
      
      The percpu member of this structure is declared as:
      	struct ... ** __percpu member;
      So its type is:
      	__percpu pointer to pointer to struct ...
      
      But looking at how it's used, its type should be:
      	pointer to __percpu pointer to struct ...
      and it should thus be declared as:
      	struct ... * __percpu *member;
      
      So fix the placement of '__percpu' in the definition of this
      structures.
      
      This silents a few Sparse's warnings like:
      	warning: incorrect type in initializer (different address spaces)
      	  expected void const [noderef] <asn:3> *__vpp_verify
      	  got struct sched_domain **
      
      Link: http://lkml.kernel.org/r/20190118144902.79065-1-luc.vanoostenryck@gmail.com
      Fixes: 017c59c0 ("relay: Use per CPU constructs for the relay channel buffer pointers")
      Signed-off-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6ad08aa
    • Russell King's avatar
      gpio: gpio-omap: fix level interrupt idling · 4c96500e
      Russell King authored
      [ Upstream commit d01849f7 ]
      
      Tony notes that the GPIO module does not idle when level interrupts are
      in use, as the wakeup appears to get stuck.
      
      After extensive investigation, it appears that the wakeup will only be
      cleared if the interrupt status register is cleared while the interrupt
      is enabled. However, we are currently clearing it with the interrupt
      disabled for level-based interrupts.
      
      It is acknowledged that this observed behaviour conflicts with a
      statement in the TRM:
      
      CAUTION
        After servicing the interrupt, the status bit in the interrupt status
        register (GPIOi.GPIO_IRQSTATUS_0 or GPIOi.GPIO_IRQSTATUS_1) must be
        reset and the interrupt line released (by setting the corresponding
        bit of the interrupt status register to 1) before enabling an
        interrupt for the GPIO channel in the interrupt-enable register
        (GPIOi.GPIO_IRQSTATUS_SET_0 or GPIOi.GPIO_IRQSTATUS_SET_1) to prevent
        the occurrence of unexpected interrupts when enabling an interrupt
        for the GPIO channel.
      
      However, this does not appear to be a practical problem.
      
      Further, as reported by Grygorii Strashko <grygorii.strashko@ti.com>,
      the TI Android kernel tree has an earlier similar patch as "GPIO: OMAP:
      Fix the sequence to clear the IRQ status" saying:
      
       if the status is cleared after disabling the IRQ then sWAKEUP will not
       be cleared and gates the module transition
      
      When we unmask the level interrupt after the interrupt has been handled,
      enable the interrupt and only then clear the interrupt. If the interrupt
      is still pending, the hardware will re-assert the interrupt status.
      
      Should the caution note in the TRM prove to be a problem, we could
      use a clear-enable-clear sequence instead.
      
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Cc: Keerthy <j-keerthy@ti.com>
      Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      [tony@atomide.com: updated comments based on an earlier TI patch]
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4c96500e
    • Tonghao Zhang's avatar
      net/mlx5: Avoid panic when setting vport mac, getting vport config · 8c50ab86
      Tonghao Zhang authored
      [ Upstream commit 6e77c413 ]
      
      If we try to set VFs mac address on a VF (not PF) net device,
      the kernel will be crash. The commands are show as below:
      
      $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
      $ ip link set $MLX_VF0 vf 0 mac 00:11:22:33:44:00
      
      [exception RIP: mlx5_eswitch_set_vport_mac+41]
      [ffffb8b7079e3688] do_setlink at ffffffff8f67f85b
      [ffffb8b7079e37a8] __rtnl_newlink at ffffffff8f683778
      [ffffb8b7079e3b68] rtnl_newlink at ffffffff8f683a63
      [ffffb8b7079e3b90] rtnetlink_rcv_msg at ffffffff8f67d812
      [ffffb8b7079e3c10] netlink_rcv_skb at ffffffff8f6b88ab
      [ffffb8b7079e3c60] netlink_unicast at ffffffff8f6b808f
      [ffffb8b7079e3ca0] netlink_sendmsg at ffffffff8f6b8412
      [ffffb8b7079e3d18] sock_sendmsg at ffffffff8f6452f6
      [ffffb8b7079e3d30] ___sys_sendmsg at ffffffff8f645860
      [ffffb8b7079e3eb0] __sys_sendmsg at ffffffff8f647a38
      [ffffb8b7079e3f38] do_syscall_64 at ffffffff8f00401b
      [ffffb8b7079e3f50] entry_SYSCALL_64_after_hwframe at ffffffff8f80008c
      
      and
      
      [exception RIP: mlx5_eswitch_get_vport_config+12]
      [ffffa70607e57678] mlx5e_get_vf_config at ffffffffc03c7f8f [mlx5_core]
      [ffffa70607e57688] do_setlink at ffffffffbc67fa59
      [ffffa70607e577a8] __rtnl_newlink at ffffffffbc683778
      [ffffa70607e57b68] rtnl_newlink at ffffffffbc683a63
      [ffffa70607e57b90] rtnetlink_rcv_msg at ffffffffbc67d812
      [ffffa70607e57c10] netlink_rcv_skb at ffffffffbc6b88ab
      [ffffa70607e57c60] netlink_unicast at ffffffffbc6b808f
      [ffffa70607e57ca0] netlink_sendmsg at ffffffffbc6b8412
      [ffffa70607e57d18] sock_sendmsg at ffffffffbc6452f6
      [ffffa70607e57d30] ___sys_sendmsg at ffffffffbc645860
      [ffffa70607e57eb0] __sys_sendmsg at ffffffffbc647a38
      [ffffa70607e57f38] do_syscall_64 at ffffffffbc00401b
      [ffffa70607e57f50] entry_SYSCALL_64_after_hwframe at ffffffffbc80008c
      
      Fixes: a8d70a05 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if not being esw manager")
      Cc: Eli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c50ab86
    • Tonghao Zhang's avatar
      net/mlx5: Avoid panic when setting vport rate · 3bddc614
      Tonghao Zhang authored
      [ Upstream commit 24319258 ]
      
      If we try to set VFs rate on a VF (not PF) net device, the kernel
      will be crash. The commands are show as below:
      
      $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
      $ ip link set $MLX_VF0 vf 0 max_tx_rate 2 min_tx_rate 1
      
      If not applied the first patch ("net/mlx5: Avoid panic when setting
      vport mac, getting vport config"), the command:
      
      $ ip link set $MLX_VF0 vf 0 rate 100
      
      can also crash the kernel.
      
      [ 1650.006388] RIP: 0010:mlx5_eswitch_set_vport_rate+0x1f/0x260 [mlx5_core]
      [ 1650.007092]  do_setlink+0x982/0xd20
      [ 1650.007129]  __rtnl_newlink+0x528/0x7d0
      [ 1650.007374]  rtnl_newlink+0x43/0x60
      [ 1650.007407]  rtnetlink_rcv_msg+0x2a2/0x320
      [ 1650.007484]  netlink_rcv_skb+0xcb/0x100
      [ 1650.007519]  netlink_unicast+0x17f/0x230
      [ 1650.007554]  netlink_sendmsg+0x2d2/0x3d0
      [ 1650.007592]  sock_sendmsg+0x36/0x50
      [ 1650.007625]  ___sys_sendmsg+0x280/0x2a0
      [ 1650.007963]  __sys_sendmsg+0x58/0xa0
      [ 1650.007998]  do_syscall_64+0x5b/0x180
      [ 1650.009438]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: c9497c98 ("net/mlx5: Add support for setting VF min rate")
      Cc: Mohamad Haj Yahia <mohamad@mellanox.com>
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3bddc614
    • Douglas Anderson's avatar
      tracing: kdb: Fix ftdump to not sleep · b73c7d02
      Douglas Anderson authored
      [ Upstream commit 31b265b3 ]
      
      As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
      BUG for "sleeping function called from invalid context".
      
      kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
      atomic context.  A very simple solution for this is to add allocation
      flags to ring_buffer_read_prepare() so kdb can call it without
      triggering the allocation error.  This patch does that.
      
      Note that in the original email thread about this, it was suggested
      that perhaps the solution for kdb was to either preallocate the buffer
      ahead of time or create our own iterator.  I'm hoping that this
      alternative of adding allocation flags to ring_buffer_read_prepare()
      can be considered since it means I don't need to duplicate more of the
      core trace code into "trace_kdb.c" (for either creating my own
      iterator or re-preparing a ring allocator whose memory was already
      allocated).
      
      NOTE: another option for kdb is to actually figure out how to make it
      reuse the existing ftrace_dump() function and totally eliminate the
      duplication.  This sounds very appealing and actually works (the "sr
      z" command can be seen to properly dump the ftrace buffer).  The
      downside here is that ftrace_dump() fully consumes the trace buffer.
      Unless that is changed I'd rather not use it because it means "ftdump
      | grep xyz" won't be very useful to search the ftrace buffer since it
      will throw away the whole trace on the first grep.  A future patch to
      dump only the last few lines of the buffer will also be hard to
      implement.
      
      [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
      
      Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.orgReported-by: default avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b73c7d02
    • Chao Yu's avatar
      f2fs: fix to avoid deadlock in f2fs_read_inline_dir() · d7391962
      Chao Yu authored
      [ Upstream commit aadcef64 ]
      
      As Jiqun Li reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202883
      
      sometimes, dead lock when make system call SYS_getdents64 with fsync() is
      called by another process.
      
      monkey running on android9.0
      
      1.  task 9785 held sbi->cp_rwsem and waiting lock_page()
      2.  task 10349 held mm_sem and waiting sbi->cp_rwsem
      3. task 9709 held lock_page() and waiting mm_sem
      
      so this is a dead lock scenario.
      
      task stack is show by crash tools as following
      
      crash_arm64> bt ffffffc03c354080
      PID: 9785   TASK: ffffffc03c354080  CPU: 1   COMMAND: "RxIoScheduler-3"
      >> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8
      
      crash-arm64> bt 10349
      PID: 10349  TASK: ffffffc018b83080  CPU: 1   COMMAND: "BUGLY_ASYNC_UPL"
      >> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc
           PC: 00000033  LR: 00000000  SP: 00000000  PSTATE: ffffffffffffffff
      
      crash-arm64> bt 9709
      PID: 9709   TASK: ffffffc03e7f3080  CPU: 1   COMMAND: "IntentService[A"
      >> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc
      >> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4
           PC: ffffff8008274114  [compat_filldir64+120]
           LR: ffffff80083584d4  [f2fs_fill_dentries+448]
           SP: ffffffc001e67b80  PSTATE: 80400145
          X29: ffffffc001e67b80  X28: 0000000000000000  X27: 000000000000001a
          X26: 00000000000093d7  X25: ffffffc070d52480  X24: 0000000000000008
          X23: 0000000000000028  X22: 00000000d43dfd60  X21: ffffffc001e67e90
          X20: 0000000000000011  X19: ffffff80093a4000  X18: 0000000000000000
          X17: 0000000000000000  X16: 0000000000000000  X15: 0000000000000000
          X14: ffffffffffffffff  X13: 0000000000000008  X12: 0101010101010101
          X11: 7f7f7f7f7f7f7f7f  X10: 6a6a6a6a6a6a6a6a   X9: 7f7f7f7f7f7f7f7f
           X8: 0000000080808000   X7: ffffff800827409c   X6: 0000000080808000
           X5: 0000000000000008   X4: 00000000000093d7   X3: 000000000000001a
           X2: 0000000000000011   X1: ffffffc070d52480   X0: 0000000000800238
      >> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0
           PC: 0000003c  LR: 00000000  SP: 00000000  PSTATE: 000000d9
          X12: f48a02ff X11: d4678960 X10: d43dfc00  X9: d4678ae4
           X8: 00000058  X7: d4678994  X6: d43de800  X5: 000000d9
           X4: d43dfc0c  X3: d43dfc10  X2: d46799c8  X1: 00000000
           X0: 00001068
      
      Below potential deadlock will happen between three threads:
      Thread A		Thread B		Thread C
      - f2fs_do_sync_file
       - f2fs_write_checkpoint
        - down_write(&sbi->node_change) -- 1)
      			- do_page_fault
      			 - down_write(&mm->mmap_sem) -- 2)
      			  - do_wp_page
      			   - f2fs_vm_page_mkwrite
      						- getdents64
      						 - f2fs_read_inline_dir
      						  - lock_page -- 3)
        - f2fs_sync_node_pages
         - lock_page -- 3)
      			    - __do_map_lock
      			     - down_read(&sbi->node_change) -- 1)
      						  - f2fs_fill_dentries
      						   - dir_emit
      						    - compat_filldir64
      						     - do_page_fault
      						      - down_read(&mm->mmap_sem) -- 2)
      
      Since f2fs_readdir is protected by inode.i_rwsem, there should not be
      any updates in inode page, we're safe to lookup dents in inode page
      without its lock held, so taking off the lock to improve concurrency
      of readdir and avoid potential deadlock.
      Reported-by: default avatarJiqun Li <jiqun.li@unisoc.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d7391962
    • Chao Yu's avatar
      f2fs: fix to adapt small inline xattr space in __find_inline_xattr() · 198c9985
      Chao Yu authored
      [ Upstream commit 2c28aba8 ]
      
      With below testcase, we will fail to find existed xattr entry:
      
      1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
      2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
      3. touch /mnt/f2fs/file
      4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
      5. getfattr -n "user.name" /mnt/f2fs/file
      
      /mnt/f2fs/file: user.name: No such attribute
      
      The reason is for inode which has very small inline xattr size,
      __find_inline_xattr() will fail to traverse any entry due to first
      entry may not be loaded from xattr node yet, later, we may skip to
      check entire xattr datas in __find_xattr(), result in such wrong
      condition.
      
      This patch adds condition to check such case to avoid this issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      198c9985