1. 29 Mar, 2020 2 commits
    • Waiman Long's avatar
      KEYS: Avoid false positive ENOMEM error on key read · 4f088249
      Waiman Long authored
      By allocating a kernel buffer with a user-supplied buffer length, it
      is possible that a false positive ENOMEM error may be returned because
      the user-supplied length is just too large even if the system do have
      enough memory to hold the actual key data.
      
      Moreover, if the buffer length is larger than the maximum amount of
      memory that can be returned by kmalloc() (2^(MAX_ORDER-1) number of
      pages), a warning message will also be printed.
      
      To reduce this possibility, we set a threshold (PAGE_SIZE) over which we
      do check the actual key length first before allocating a buffer of the
      right size to hold it. The threshold is arbitrary, it is just used to
      trigger a buffer length check. It does not limit the actual key length
      as long as there is enough memory to satisfy the memory request.
      
      To further avoid large buffer allocation failure due to page
      fragmentation, kvmalloc() is used to allocate the buffer so that vmapped
      pages can be used when there is not a large enough contiguous set of
      pages available for allocation.
      
      In the extremely unlikely scenario that the key keeps on being changed
      and made longer (still <= buflen) in between 2 __keyctl_read_key()
      calls, the __keyctl_read_key() calling loop in keyctl_read_key() may
      have to be iterated a large number of times, but definitely not infinite.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4f088249
    • Waiman Long's avatar
      KEYS: Don't write out to userspace while holding key semaphore · d3ec10aa
      Waiman Long authored
      A lockdep circular locking dependency report was seen when running a
      keyutils test:
      
      [12537.027242] ======================================================
      [12537.059309] WARNING: possible circular locking dependency detected
      [12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE    --------- -  -
      [12537.125253] ------------------------------------------------------
      [12537.153189] keyctl/25598 is trying to acquire lock:
      [12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
      [12537.208365]
      [12537.208365] but task is already holding lock:
      [12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
      [12537.270476]
      [12537.270476] which lock already depends on the new lock.
      [12537.270476]
      [12537.307209]
      [12537.307209] the existing dependency chain (in reverse order) is:
      [12537.340754]
      [12537.340754] -> #3 (&type->lock_class){++++}:
      [12537.367434]        down_write+0x4d/0x110
      [12537.385202]        __key_link_begin+0x87/0x280
      [12537.405232]        request_key_and_link+0x483/0xf70
      [12537.427221]        request_key+0x3c/0x80
      [12537.444839]        dns_query+0x1db/0x5a5 [dns_resolver]
      [12537.468445]        dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
      [12537.496731]        cifs_reconnect+0xe04/0x2500 [cifs]
      [12537.519418]        cifs_readv_from_socket+0x461/0x690 [cifs]
      [12537.546263]        cifs_read_from_socket+0xa0/0xe0 [cifs]
      [12537.573551]        cifs_demultiplex_thread+0x311/0x2db0 [cifs]
      [12537.601045]        kthread+0x30c/0x3d0
      [12537.617906]        ret_from_fork+0x3a/0x50
      [12537.636225]
      [12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
      [12537.664525]        __mutex_lock+0x105/0x11f0
      [12537.683734]        request_key_and_link+0x35a/0xf70
      [12537.705640]        request_key+0x3c/0x80
      [12537.723304]        dns_query+0x1db/0x5a5 [dns_resolver]
      [12537.746773]        dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
      [12537.775607]        cifs_reconnect+0xe04/0x2500 [cifs]
      [12537.798322]        cifs_readv_from_socket+0x461/0x690 [cifs]
      [12537.823369]        cifs_read_from_socket+0xa0/0xe0 [cifs]
      [12537.847262]        cifs_demultiplex_thread+0x311/0x2db0 [cifs]
      [12537.873477]        kthread+0x30c/0x3d0
      [12537.890281]        ret_from_fork+0x3a/0x50
      [12537.908649]
      [12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
      [12537.935225]        __mutex_lock+0x105/0x11f0
      [12537.954450]        cifs_call_async+0x102/0x7f0 [cifs]
      [12537.977250]        smb2_async_readv+0x6c3/0xc90 [cifs]
      [12538.000659]        cifs_readpages+0x120a/0x1e50 [cifs]
      [12538.023920]        read_pages+0xf5/0x560
      [12538.041583]        __do_page_cache_readahead+0x41d/0x4b0
      [12538.067047]        ondemand_readahead+0x44c/0xc10
      [12538.092069]        filemap_fault+0xec1/0x1830
      [12538.111637]        __do_fault+0x82/0x260
      [12538.129216]        do_fault+0x419/0xfb0
      [12538.146390]        __handle_mm_fault+0x862/0xdf0
      [12538.167408]        handle_mm_fault+0x154/0x550
      [12538.187401]        __do_page_fault+0x42f/0xa60
      [12538.207395]        do_page_fault+0x38/0x5e0
      [12538.225777]        page_fault+0x1e/0x30
      [12538.243010]
      [12538.243010] -> #0 (&mm->mmap_sem){++++}:
      [12538.267875]        lock_acquire+0x14c/0x420
      [12538.286848]        __might_fault+0x119/0x1b0
      [12538.306006]        keyring_read_iterator+0x7e/0x170
      [12538.327936]        assoc_array_subtree_iterate+0x97/0x280
      [12538.352154]        keyring_read+0xe9/0x110
      [12538.370558]        keyctl_read_key+0x1b9/0x220
      [12538.391470]        do_syscall_64+0xa5/0x4b0
      [12538.410511]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
      [12538.435535]
      [12538.435535] other info that might help us debug this:
      [12538.435535]
      [12538.472829] Chain exists of:
      [12538.472829]   &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
      [12538.472829]
      [12538.524820]  Possible unsafe locking scenario:
      [12538.524820]
      [12538.551431]        CPU0                    CPU1
      [12538.572654]        ----                    ----
      [12538.595865]   lock(&type->lock_class);
      [12538.613737]                                lock(root_key_user.cons_lock);
      [12538.644234]                                lock(&type->lock_class);
      [12538.672410]   lock(&mm->mmap_sem);
      [12538.687758]
      [12538.687758]  *** DEADLOCK ***
      [12538.687758]
      [12538.714455] 1 lock held by keyctl/25598:
      [12538.732097]  #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
      [12538.770573]
      [12538.770573] stack backtrace:
      [12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
      [12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
      [12538.881963] Call Trace:
      [12538.892897]  dump_stack+0x9a/0xf0
      [12538.907908]  print_circular_bug.isra.25.cold.50+0x1bc/0x279
      [12538.932891]  ? save_trace+0xd6/0x250
      [12538.948979]  check_prev_add.constprop.32+0xc36/0x14f0
      [12538.971643]  ? keyring_compare_object+0x104/0x190
      [12538.992738]  ? check_usage+0x550/0x550
      [12539.009845]  ? sched_clock+0x5/0x10
      [12539.025484]  ? sched_clock_cpu+0x18/0x1e0
      [12539.043555]  __lock_acquire+0x1f12/0x38d0
      [12539.061551]  ? trace_hardirqs_on+0x10/0x10
      [12539.080554]  lock_acquire+0x14c/0x420
      [12539.100330]  ? __might_fault+0xc4/0x1b0
      [12539.119079]  __might_fault+0x119/0x1b0
      [12539.135869]  ? __might_fault+0xc4/0x1b0
      [12539.153234]  keyring_read_iterator+0x7e/0x170
      [12539.172787]  ? keyring_read+0x110/0x110
      [12539.190059]  assoc_array_subtree_iterate+0x97/0x280
      [12539.211526]  keyring_read+0xe9/0x110
      [12539.227561]  ? keyring_gc_check_iterator+0xc0/0xc0
      [12539.249076]  keyctl_read_key+0x1b9/0x220
      [12539.266660]  do_syscall_64+0xa5/0x4b0
      [12539.283091]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
      
      One way to prevent this deadlock scenario from happening is to not
      allow writing to userspace while holding the key semaphore. Instead,
      an internal buffer is allocated for getting the keys out from the
      read method first before copying them out to userspace without holding
      the lock.
      
      That requires taking out the __user modifier from all the relevant
      read methods as well as additional changes to not use any userspace
      write helpers. That is,
      
        1) The put_user() call is replaced by a direct copy.
        2) The copy_to_user() call is replaced by memcpy().
        3) All the fault handling code is removed.
      
      Compiling on a x86-64 system, the size of the rxrpc_read() function is
      reduced from 3795 bytes to 2384 bytes with this patch.
      
      Fixes: ^1da177e4 ("Linux-2.6.12-rc2")
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d3ec10aa
  2. 25 Mar, 2020 12 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1b649e0b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix deadlock in bpf_send_signal() from Yonghong Song.
      
       2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.
      
       3) Add missing locking in iwlwifi mvm code, from Avraham Stern.
      
       4) Fix MSG_WAITALL handling in rxrpc, from David Howells.
      
       5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
          Wang.
      
       6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.
      
       7) cls_route removes the wrong filter during change operations, from
          Cong Wang.
      
       8) Reject unrecognized request flags in ethtool netlink code, from
          Michal Kubecek.
      
       9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
          Doug Berger.
      
      10) Don't leak ct zone template in act_ct during replace, from Paul
          Blakey.
      
      11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
          Blakey.
      
      12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
          Lakkireddy.
      
      13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric
          Dumazet.
      
      14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well,
          also from Eric Dumazet.
      
      15) Restrict macsec to ethernet devices, from Willem de Bruijn.
      
      16) Fix reference leak in some ethtool *_SET handlers, from Michal
          Kubecek.
      
      17) Fix accidental disabling of MSI for some r8169 chips, from Heiner
          Kallweit.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
        net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
        net: ena: Add PCI shutdown handler to allow safe kexec
        selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
        selftests/net: add missing tests to Makefile
        r8169: re-enable MSI on RTL8168c
        net: phy: mdio-bcm-unimac: Fix clock handling
        cxgb4/ptp: pass the sign of offset delta in FW CMD
        net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
        net: cbs: Fix software cbs to consider packet sending time
        net/mlx5e: Do not recover from a non-fatal syndrome
        net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
        net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
        net/mlx5e: Enhance ICOSQ WQE info fields
        net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
        selftests: netfilter: add nfqueue test case
        netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
        netfilter: nft_fwd_netdev: validate family and chain type
        netfilter: nft_set_rbtree: Detect partial overlaps on insertion
        netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
        netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion
        ...
      1b649e0b
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 1dfb642b
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
      
       - One core quirk by myself to fix the .irq_disable() semantics when the
         gpiolib core takes over this callback.
      
       - The rest is an elaborate series of four patches fixing Intel laptop
         ACPI wakeup quirks.
      
      * tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
        gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
        gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
        gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
        gpiolib: Fix irq_disable() semantics
      1dfb642b
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2020-03-25' of... · 2910594f
      David S. Miller authored
      Merge tag 'wireless-drivers-2020-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.6
      
      Fourth, and last, set of fixes for v5.6. Just two important fixes to
      iwlwifi regressions.
      
      iwlwifi
      
      * fix GEO_TX_POWER_LIMIT command on certain devices which caused
        firmware to crash during initialisation
      
      * add back device ids for three devices which were accidentally
        removed
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2910594f
    • Pablo Neira Ayuso's avatar
      net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build · 2c64605b
      Pablo Neira Ayuso authored
      net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
          net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
            pkt->skb->tc_redirected = 1;
                    ^~
          net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
            pkt->skb->tc_from_ingress = 1;
                    ^~
      
      To avoid a direct dependency with tc actions from netfilter, wrap the
      redirect bits around CONFIG_NET_REDIRECT and move helpers to
      include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
      only existing client of these bits in the tree.
      
      This patch adds skb_set_redirected() that sets on the redirected bit
      on the skbuff, it specifies if the packet was redirect from ingress
      and resets the timestamp (timestamp reset was originally missing in the
      netfilter bugfix).
      
      Fixes: bcfabee1 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
      Reported-by: noreply@ellerman.id.au
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c64605b
    • Guilherme G. Piccoli's avatar
      net: ena: Add PCI shutdown handler to allow safe kexec · 428c4913
      Guilherme G. Piccoli authored
      Currently ENA only provides the PCI remove() handler, used during rmmod
      for example. This is not called on shutdown/kexec path; we are potentially
      creating a failure scenario on kexec:
      
      (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
      instead pci_device_shutdown() clears the master bit of the PCI device,
      stopping all DMA transactions;
      
      (b) Kexec reboot happens and the device gets enabled again, likely having
      its FW with that DMA transaction buffered; then it may trigger the (now
      invalid) memory operation in the new kernel, corrupting kernel memory area.
      
      This patch aims to prevent this, by implementing a shutdown() handler
      quite similar to the remove() one - the difference being the handling
      of the netdev, which is unregistered on remove(), but following the
      convention observed in other drivers, it's only detached on shutdown().
      
      This prevents an odd issue in AWS Nitro instances, in which after the 2nd
      kexec the next one will fail with an initrd corruption, caused by a wild
      DMA write to invalid kernel memory. The lspci output for the adapter
      present in my instance is:
      
      00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
      Adapter (ENA) [1d0f:ec20]
      Suggested-by: default avatarGavin Shan <gshan@redhat.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@canonical.com>
      Acked-by: default avatarSameeh Jubran <sameehj@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      428c4913
    • Hangbin Liu's avatar
      selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED · c085dbfb
      Hangbin Liu authored
      The lib files should not be defined as TEST_PROGS, or we will run them
      in run_kselftest.sh.
      
      Also remove ethtool_lib.sh exec permission.
      
      Fixes: 81573b18 ("selftests/net/forwarding: add Makefile to install tests")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c085dbfb
    • Hangbin Liu's avatar
      selftests/net: add missing tests to Makefile · 919a23e9
      Hangbin Liu authored
      Find some tests are missed in Makefile by running:
      for file in $(ls *.sh); do grep -q $file Makefile || echo $file; done
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      919a23e9
    • Linus Torvalds's avatar
      Merge tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · e2cf67f6
      Linus Torvalds authored
      Pull zonefs fix from Damien Le Moal:
       "A single fix from me to correctly handle the size of read-only zone
        files"
      
      * tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonfs: Fix handling of read-only zones
      e2cf67f6
    • Damien Le Moal's avatar
      zonfs: Fix handling of read-only zones · ccf4ad7d
      Damien Le Moal authored
      The write pointer of zones in the read-only consition is defined as
      invalid by the SCSI ZBC and ATA ZAC specifications. It is thus not
      possible to determine the correct size of a read-only zone file on
      mount. Fix this by handling read-only zones in the same manner as
      offline zones by disabling all accesses to the zone (read and write)
      and initializing the inode size of the read-only zone to 0).
      
      For zones found to be in the read-only condition at runtime, only
      disable write access to the zone and keep the size of the zone file to
      its last updated value to allow the user to recover previously written
      data.
      
      Also fix zonefs documentation file to reflect this change.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      ccf4ad7d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 6f000f98
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) A new selftest for nf_queue, from Florian Westphal. This test
         covers two recent fixes: 07f8e4d0 ("tcp: also NULL skb->dev
         when copy was needed") and b738a185 ("tcp: ensure skb->dev is
         NULL before leaving TCP stack").
      
      2) The fwd action breaks with ifb. For safety in next extensions,
         make sure the fwd action only runs from ingress until it is extended
         to be used from a different hook.
      
      3) The pipapo set type now reports EEXIST in case of subrange overlaps.
         Update the rbtree set to validate range overlaps, so far this
         validation is only done only from userspace. From Stefano Brivio.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f000f98
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 7e566df6
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2020-03-24
      
      This series introduces some fixes to mlx5 driver.
      
      From Aya, Fixes to the RX error recovery flows
      From Leon, Fix IB capability mask
      
      Please pull and let me know if there is any problem.
      
      For -stable v5.5
       ('net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure')
      
      For -stable v5.4
       ('net/mlx5e: Fix ICOSQ recovery flow with Striding RQ')
       ('net/mlx5e: Do not recover from a non-fatal syndrome')
       ('net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset')
       ('net/mlx5e: Enhance ICOSQ WQE info fields')
      
      The above patch ('net/mlx5e: Enhance ICOSQ WQE info fields')
      will fail to apply cleanly on v5.4 due to a trivial contextual conflict,
      but it is an important fix, do I need to do something about it or just
      assume Greg will know how to handle this ?
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e566df6
    • Heiner Kallweit's avatar
      r8169: re-enable MSI on RTL8168c · f13bc681
      Heiner Kallweit authored
      The original change fixed an issue on RTL8168b by mimicking the vendor
      driver behavior to disable MSI on chip versions before RTL8168d.
      This however now caused an issue on a system with RTL8168c, see [0].
      Therefore leave MSI disabled on RTL8168b, but re-enable it on RTL8168c.
      
      [0] https://bugzilla.redhat.com/show_bug.cgi?id=1792839
      
      Fixes: 003bd5b4 ("r8169: don't use MSI before RTL8168d")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f13bc681
  3. 24 Mar, 2020 26 commits