1. 26 Apr, 2021 1 commit
    • Michael Chan's avatar
      bnxt_en: Fix RX consumer index logic in the error path. · bbd6f0a9
      Michael Chan authored
      In bnxt_rx_pkt(), the RX buffers are expected to complete in order.
      If the RX consumer index indicates an out of order buffer completion,
      it means we are hitting a hardware bug and the driver will abort all
      remaining RX packets and reset the RX ring.  The RX consumer index
      that we pass to bnxt_discard_rx() is not correct.  We should be
      passing the current index (tmp_raw_cons) instead of the old index
      (raw_cons).  This bug can cause us to be at the wrong index when
      trying to abort the next RX packet.  It can crash like this:
      
       #0 [ffff9bbcdf5c39a8] machine_kexec at ffffffff9b05e007
       #1 [ffff9bbcdf5c3a00] __crash_kexec at ffffffff9b111232
       #2 [ffff9bbcdf5c3ad0] panic at ffffffff9b07d61e
       #3 [ffff9bbcdf5c3b50] oops_end at ffffffff9b030978
       #4 [ffff9bbcdf5c3b78] no_context at ffffffff9b06aaf0
       #5 [ffff9bbcdf5c3bd8] __bad_area_nosemaphore at ffffffff9b06ae2e
       #6 [ffff9bbcdf5c3c28] bad_area_nosemaphore at ffffffff9b06af24
       #7 [ffff9bbcdf5c3c38] __do_page_fault at ffffffff9b06b67e
       #8 [ffff9bbcdf5c3cb0] do_page_fault at ffffffff9b06bb12
       #9 [ffff9bbcdf5c3ce0] page_fault at ffffffff9bc015c5
          [exception RIP: bnxt_rx_pkt+237]
          RIP: ffffffffc0259cdd  RSP: ffff9bbcdf5c3d98  RFLAGS: 00010213
          RAX: 000000005dd8097f  RBX: ffff9ba4cb11b7e0  RCX: ffffa923cf6e9000
          RDX: 0000000000000fff  RSI: 0000000000000627  RDI: 0000000000001000
          RBP: ffff9bbcdf5c3e60   R8: 0000000000420003   R9: 000000000000020d
          R10: ffffa923cf6ec138  R11: ffff9bbcdf5c3e83  R12: ffff9ba4d6f928c0
          R13: ffff9ba4cac28080  R14: ffff9ba4cb11b7f0  R15: ffff9ba4d5a30000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      
      Fixes: a1b0e4e6 ("bnxt_en: Improve RX consumer index validity check.")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbd6f0a9
  2. 23 Apr, 2021 3 commits
    • Mat Martineau's avatar
      mptcp: Retransmit DATA_FIN · 6477dd39
      Mat Martineau authored
      With this change, the MPTCP-level retransmission timer is used to resend
      DATA_FIN. The retranmit timer is not stopped while waiting for a
      MPTCP-level ACK of DATA_FIN, and retransmitted DATA_FINs are sent on all
      subflows. The retry interval starts at TCP_RTO_MIN and then doubles on
      each attempt, up to TCP_RTO_MAX.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/146
      Fixes: 43b54c6e ("mptcp: Use full MPTCP-level disconnect state machine")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6477dd39
    • Phillip Potter's avatar
      net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb · d13f048d
      Phillip Potter authored
      Modify the header size check in geneve6_xmit_skb and geneve_xmit_skb
      to use pskb_inet_may_pull rather than pskb_network_may_pull. This fixes
      two kernel selftest failures introduced by the commit introducing the
      checks:
      IPv4 over geneve6: PMTU exceptions
      IPv4 over geneve6: PMTU exceptions - nexthop objects
      
      It does this by correctly accounting for the fact that IPv4 packets may
      transit over geneve IPv6 tunnels (and vice versa), and still fixes the
      uninit-value bug fixed by the original commit.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Fixes: 6628ddfe ("net: geneve: check skb is large enough for IPv4/IPv6 header")
      Suggested-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarPhillip Potter <phil@philpotter.co.uk>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d13f048d
    • Ilya Maximets's avatar
      openvswitch: meter: remove rate from the bucket size calculation · 7d742b50
      Ilya Maximets authored
      Implementation of meters supposed to be a classic token bucket with 2
      typical parameters: rate and burst size.
      
      Burst size in this schema is the maximum number of bytes/packets that
      could pass without being rate limited.
      
      Recent changes to userspace datapath made meter implementation to be
      in line with the kernel one, and this uncovered several issues.
      
      The main problem is that maximum bucket size for unknown reason
      accounts not only burst size, but also the numerical value of rate.
      This creates a lot of confusion around behavior of meters.
      
      For example, if rate is configured as 1000 pps and burst size set to 1,
      this should mean that meter will tolerate bursts of 1 packet at most,
      i.e. not a single packet above the rate should pass the meter.
      However, current implementation calculates maximum bucket size as
      (rate + burst size), so the effective bucket size will be 1001.  This
      means that first 1000 packets will not be rate limited and average
      rate might be twice as high as the configured rate.  This also makes
      it practically impossible to configure meter that will have burst size
      lower than the rate, which might be a desirable configuration if the
      rate is high.
      
      Inability to configure low values of a burst size and overall inability
      for a user to predict what will be a maximum and average rate from the
      configured parameters of a meter without looking at the OVS and kernel
      code might be also classified as a security issue, because drop meters
      are frequently used as a way of protection from DoS attacks.
      
      This change removes rate from the calculation of a bucket size, making
      it in line with the classic token bucket algorithm and essentially
      making the rate and burst tolerance being predictable from a users'
      perspective.
      
      Same change proposed for the userspace implementation.
      
      Fixes: 96fbc13d ("openvswitch: Add meter infrastructure")
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d742b50
  3. 22 Apr, 2021 6 commits
  4. 21 Apr, 2021 15 commits
    • Chinmay Agarwal's avatar
      neighbour: Prevent Race condition in neighbour subsytem · eefb45ee
      Chinmay Agarwal authored
      Following Race Condition was detected:
      
      <CPU A, t0>: Executing: __netif_receive_skb() ->__netif_receive_skb_core()
      -> arp_rcv() -> arp_process().arp_process() calls __neigh_lookup() which
      takes a reference on neighbour entry 'n'.
      Moves further along, arp_process() and calls neigh_update()->
      __neigh_update(). Neighbour entry is unlocked just before a call to
      neigh_update_gc_list.
      
      This unlocking paves way for another thread that may take a reference on
      the same and mark it dead and remove it from gc_list.
      
      <CPU B, t1> - neigh_flush_dev() is under execution and calls
      neigh_mark_dead(n) marking the neighbour entry 'n' as dead. Also n will be
      removed from gc_list.
      Moves further along neigh_flush_dev() and calls
      neigh_cleanup_and_release(n), but since reference count increased in t1,
      'n' couldn't be destroyed.
      
      <CPU A, t3>- Code hits neigh_update_gc_list, with neighbour entry
      set as dead.
      
      <CPU A, t4> - arp_process() finally calls neigh_release(n), destroying
      the neighbour entry and we have a destroyed ntry still part of gc_list.
      
      Fixes: eb4e8fac("neighbour: Prevent a dead entry from updating gc_list")
      Signed-off-by: default avatarChinmay Agarwal <chinagar@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eefb45ee
    • jinyiting's avatar
      bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine · 83d686a6
      jinyiting authored
      The bond works in mode 4, and performs down/up operations on the bond
      that is normally negotiated. The probability of bond-> slave_arr is NULL
      
      Test commands:
         ifconfig bond1 down
         ifconfig bond1 up
      
      The conflict occurs in the following process:
      
      __dev_open (CPU A)
      --bond_open
        --queue_delayed_work(bond->wq,&bond->ad_work,0);
        --bond_update_slave_arr
          --bond_3ad_get_active_agg_info
      
      ad_work(CPU B)
      --bond_3ad_state_machine_handler
        --ad_agg_selection_logic
      
      ad_work runs on cpu B. In the function ad_agg_selection_logic, all
      agg->is_active will be cleared. Before the new active aggregator is
      selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A,
      bond->slave_arr will be set to NULL. The best aggregator in
      ad_agg_selection_logic has not changed, no need to update slave arr.
      
      The conflict occurred in that ad_agg_selection_logic clears
      agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr
      is inspecting agg->is_active outside the lock.
      
      Also, bond_update_slave_arr is normal for potential sleep when
      allocating memory, so replace the WARN_ON with a call to might_sleep.
      Signed-off-by: default avatarjinyiting <jinyiting@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83d686a6
    • Bjorn Andersson's avatar
      net: qrtr: Avoid potential use after free in MHI send · 47a017f3
      Bjorn Andersson authored
      It is possible that the MHI ul_callback will be invoked immediately
      following the queueing of the skb for transmission, leading to the
      callback decrementing the refcount of the associated sk and freeing the
      skb.
      
      As such the dereference of skb and the increment of the sk refcount must
      happen before the skb is queued, to avoid the skb to be used after free
      and potentially the sk to drop its last refcount..
      
      Fixes: 6e728f32 ("net: qrtr: Add MHI transport layer")
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47a017f3
    • Martin Schiller's avatar
      net: phy: intel-xway: enable integrated led functions · 357a07c2
      Martin Schiller authored
      The Intel xway phys offer the possibility to deactivate the integrated
      LED function and to control the LEDs manually.
      If this was set by the bootloader, it must be ensured that the
      integrated LED function is enabled for all LEDs when loading the driver.
      
      Before commit 6e2d85ec ("net: phy: Stop with excessive soft reset")
      the LEDs were enabled by a soft-reset of the PHY (using
      genphy_soft_reset). Initialize the XWAY_MDIO_LED with it's default
      value (which is applied during a soft reset) instead of adding back
      the soft reset. This brings back the default LED configuration while
      still preventing an excessive amount of soft resets.
      
      Fixes: 6e2d85ec ("net: phy: Stop with excessive soft reset")
      Signed-off-by: default avatarMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      357a07c2
    • Yoshihiro Shimoda's avatar
      net: renesas: ravb: Fix a stuck issue when a lot of frames are received · 5718458b
      Yoshihiro Shimoda authored
      When a lot of frames were received in the short term, the driver
      caused a stuck of receiving until a new frame was received. For example,
      the following command from other device could cause this issue.
      
          $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>
      
      The previous code always cleared the interrupt flag of RX but checks
      the interrupt flags in ravb_poll(). So, ravb_poll() could not call
      ravb_rx() in the next time until a new RX frame was received if
      ravb_rx() returned true. To fix the issue, always calls ravb_rx()
      regardless the interrupt flags condition.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5718458b
    • Ong Boon Leong's avatar
      net: stmmac: fix TSO and TBS feature enabling during driver open · 5e6038b8
      Ong Boon Leong authored
      TSO and TBS cannot co-exist and current implementation requires two
      fixes:
      
       1) stmmac_open() does not need to call stmmac_enable_tbs() because
          the MAC is reset in stmmac_init_dma_engine() anyway.
       2) Inside stmmac_hw_setup(), we should call stmmac_enable_tso() for
          TX Q that is _not_ configured for TBS.
      
      Fixes: 579a25a8 ("net: stmmac: Initial support for TBS")
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e6038b8
    • Yinjun Zhang's avatar
      nfp: devlink: initialize the devlink port attribute "lanes" · 90b669d6
      Yinjun Zhang authored
      The number of lanes of devlink port should be correctly initialized
      when registering the port, so that the input check when running
      "devlink port split <port> count <N>" can pass.
      
      Fixes: a21cf0a8 ("devlink: Add a new devlink port lanes attribute and pass to netlink")
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90b669d6
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2021-04-21' of... · 542c4095
      David S. Miller authored
      Merge tag 'wireless-drivers-2021-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.12
      
      As there was -rc8 release, one more important fix for v5.12.
      
      iwlwifi
      
      * fix spinlock warning in gen2 devices
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      542c4095
    • Colin Ian King's avatar
      net: davinci_emac: Fix incorrect masking of tx and rx error channel · d83b8aa5
      Colin Ian King authored
      The bit-masks used for the TXERRCH and RXERRCH (tx and rx error channels)
      are incorrect and always lead to a zero result. The mask values are
      currently the incorrect post-right shifted values, fix this by setting
      them to the currect values.
      
      (I double checked these against the TMS320TCI6482 data sheet, section
      5.30, page 127 to ensure I had the correct mask values for the TXERRCH
      and RXERRCH fields in the MACSTATUS register).
      
      Addresses-Coverity: ("Operands don't affect result")
      Fixes: a6286ee6 ("net: Add TI DaVinci EMAC driver")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d83b8aa5
    • Vadym Kochan's avatar
      net: marvell: prestera: fix port event handling on init · 33398048
      Vadym Kochan authored
      For some reason there might be a crash during ports creation if port
      events are handling at the same time  because fw may send initial
      port event with down state.
      
      The crash points to cancel_delayed_work() which is called when port went
      is down.  Currently I did not find out the real cause of the issue, so
      fixed it by cancel port stats work only if previous port's state was up
      & runnig.
      
      The following is the crash which can be triggered:
      
      [   28.311104] Unable to handle kernel paging request at virtual address
      000071775f776600
      [   28.319097] Mem abort info:
      [   28.321914]   ESR = 0x96000004
      [   28.324996]   EC = 0x25: DABT (current EL), IL = 32 bits
      [   28.330350]   SET = 0, FnV = 0
      [   28.333430]   EA = 0, S1PTW = 0
      [   28.336597] Data abort info:
      [   28.339499]   ISV = 0, ISS = 0x00000004
      [   28.343362]   CM = 0, WnR = 0
      [   28.346354] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100bf7000
      [   28.352842] [000071775f776600] pgd=0000000000000000,
      p4d=0000000000000000
      [   28.359695] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [   28.365310] Modules linked in: prestera_pci(+) prestera
      uio_pdrv_genirq
      [   28.372005] CPU: 0 PID: 1291 Comm: kworker/0:1H Not tainted
      5.11.0-rc4 #1
      [   28.378846] Hardware name: DNI AmazonGo1 A7040 board (DT)
      [   28.384283] Workqueue: prestera_fw_wq prestera_fw_evt_work_fn
      [prestera_pci]
      [   28.391413] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
      [   28.397468] pc : get_work_pool+0x48/0x60
      [   28.401442] lr : try_to_grab_pending+0x6c/0x1b0
      [   28.406018] sp : ffff80001391bc60
      [   28.409358] x29: ffff80001391bc60 x28: 0000000000000000
      [   28.414725] x27: ffff000104fc8b40 x26: ffff80001127de88
      [   28.420089] x25: 0000000000000000 x24: ffff000106119760
      [   28.425452] x23: ffff00010775dd60 x22: ffff00010567e000
      [   28.430814] x21: 0000000000000000 x20: ffff80001391bcb0
      [   28.436175] x19: ffff00010775deb8 x18: 00000000000000c0
      [   28.441537] x17: 0000000000000000 x16: 000000008d9b0e88
      [   28.446898] x15: 0000000000000001 x14: 00000000000002ba
      [   28.452261] x13: 80a3002c00000002 x12: 00000000000005f4
      [   28.457622] x11: 0000000000000030 x10: 000000000000000c
      [   28.462985] x9 : 000000000000000c x8 : 0000000000000030
      [   28.468346] x7 : ffff800014400000 x6 : ffff000106119758
      [   28.473708] x5 : 0000000000000003 x4 : ffff00010775dc60
      [   28.479068] x3 : 0000000000000000 x2 : 0000000000000060
      [   28.484429] x1 : 000071775f776600 x0 : ffff00010775deb8
      [   28.489791] Call trace:
      [   28.492259]  get_work_pool+0x48/0x60
      [   28.495874]  cancel_delayed_work+0x38/0xb0
      [   28.500011]  prestera_port_handle_event+0x90/0xa0 [prestera]
      [   28.505743]  prestera_evt_recv+0x98/0xe0 [prestera]
      [   28.510683]  prestera_fw_evt_work_fn+0x180/0x228 [prestera_pci]
      [   28.516660]  process_one_work+0x1e8/0x360
      [   28.520710]  worker_thread+0x44/0x480
      [   28.524412]  kthread+0x154/0x160
      [   28.527670]  ret_from_fork+0x10/0x38
      [   28.531290] Code: a8c17bfd d50323bf d65f03c0 9278dc21 (f9400020)
      [   28.537429] ---[ end trace 5eced933df3a080b ]---
      
      Fixes: 501ef306 ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
      Signed-off-by: default avatarVadym Kochan <vkochan@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33398048
    • Stefano Garzarella's avatar
      vsock/virtio: free queued packets when closing socket · 8432b811
      Stefano Garzarella authored
      As reported by syzbot [1], there is a memory leak while closing the
      socket. We partially solved this issue with commit ac03046e
      ("vsock/virtio: free packets during the socket release"), but we
      forgot to drain the RX queue when the socket is definitely closed by
      the scheduled work.
      
      To avoid future issues, let's use the new virtio_transport_remove_sock()
      to drain the RX queue before removing the socket from the af_vsock lists
      calling vsock_remove_sock().
      
      [1] https://syzkaller.appspot.com/bug?extid=24452624fc4c571eedd9
      
      Fixes: ac03046e ("vsock/virtio: free packets during the socket release")
      Reported-and-tested-by: syzbot+24452624fc4c571eedd9@syzkaller.appspotmail.com
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8432b811
    • David S. Miller's avatar
      Merge branch 'sfc-txq-lookups' · eeddfd8e
      David S. Miller authored
      Edward Cree says:
      
      ====================
      sfc: fix TXQ lookups
      
      The TXQ handling changes in 12804793 ("sfc: decouple TXQ type from label")
       which were made as part of the support for encap offloads on EF10 caused some
       breakage on Siena (5000- and 6000-series) NICs, which caused null-dereference
       kernel panics.
      This series fixes those issues, and also a similarly incorrect code-path on
       EF10 which worked by chance.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeddfd8e
    • Edward Cree's avatar
      sfc: ef10: fix TX queue lookup in TX event handling · 172e269e
      Edward Cree authored
      We're starting from a TXQ label, not a TXQ type, so
       efx_channel_get_tx_queue() is inappropriate.  This worked by chance,
       because labels and types currently match on EF10, but we shouldn't
       rely on that.
      
      Fixes: 12804793 ("sfc: decouple TXQ type from label")
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      172e269e
    • Edward Cree's avatar
      sfc: farch: fix TX queue lookup in TX event handling · 83b09a18
      Edward Cree authored
      We're starting from a TXQ label, not a TXQ type, so
       efx_channel_get_tx_queue() is inappropriate (and could return NULL,
       leading to panics).
      
      Fixes: 12804793 ("sfc: decouple TXQ type from label")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83b09a18
    • Edward Cree's avatar
      sfc: farch: fix TX queue lookup in TX flush done handling · 5b1faa92
      Edward Cree authored
      We're starting from a TXQ instance number ('qid'), not a TXQ type, so
       efx_get_tx_queue() is inappropriate (and could return NULL, leading
       to panics).
      
      Fixes: 12804793 ("sfc: decouple TXQ type from label")
      Reported-by: default avatarTrevor Hemsley <themsley@voiceflex.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b1faa92
  5. 19 Apr, 2021 7 commits
  6. 17 Apr, 2021 6 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 88a5af94
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.12-rc8, including fixes from netfilter, and
        bpf. BPF verifier changes stand out, otherwise things have slowed
        down.
      
        Current release - regressions:
      
         - gro: ensure frag0 meets IP header alignment
      
         - Revert "net: stmmac: re-init rx buffers when mac resume back"
      
         - ethernet: macb: fix the restore of cmp registers
      
        Previous releases - regressions:
      
         - ixgbe: Fix NULL pointer dereference in ethtool loopback test
      
         - ixgbe: fix unbalanced device enable/disable in suspend/resume
      
         - phy: marvell: fix detection of PHY on Topaz switches
      
         - make tcp_allowed_congestion_control readonly in non-init netns
      
         - xen-netback: Check for hotplug-status existence before watching
      
        Previous releases - always broken:
      
         - bpf: mitigate a speculative oob read of up to map value size by
           tightening the masking window
      
         - sctp: fix race condition in sctp_destroy_sock
      
         - sit, ip6_tunnel: Unregister catch-all devices
      
         - netfilter: nftables: clone set element expression template
      
         - netfilter: flowtable: fix NAT IPv6 offload mangling
      
         - net: geneve: check skb is large enough for IPv4/IPv6 header
      
         - netlink: don't call ->netlink_bind with table lock held"
      
      * tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
        netlink: don't call ->netlink_bind with table lock held
        MAINTAINERS: update my email
        bpf: Update selftests to reflect new error states
        bpf: Tighten speculative pointer arithmetic mask
        bpf: Move sanitize_val_alu out of op switch
        bpf: Refactor and streamline bounds check into helper
        bpf: Improve verifier error messages for users
        bpf: Rework ptr_limit into alu_limit and add common error path
        bpf: Ensure off_reg has no mixed signed bounds for all types
        bpf: Move off_reg into sanitize_ptr_alu
        bpf: Use correct permission flag for mixed signed bounds arithmetic
        ch_ktls: do not send snd_una update to TCB in middle
        ch_ktls: tcb close causes tls connection failure
        ch_ktls: fix device connection close
        ch_ktls: Fix kernel panic
        i40e: fix the panic when running bpf in xdpdrv mode
        net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta
        net/mlx5e: Fix setting of RS FEC mode
        net/mlx5: Fix setting of devlink traps in switchdev mode
        Revert "net: stmmac: re-init rx buffers when mac resume back"
        ...
      88a5af94
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-for-5.12-rc8' of... · bdfd99e6
      Linus Torvalds authored
      Merge tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
      
      Pull libnvdimm fixes from Dan Williams:
       "The largest change is for a regression that landed during -rc1 for
        block-device read-only handling. Vaibhav found a new use for the
        ability (originally introduced by virtio_pmem) to call back to the
        platform to flush data, but also found an original bug in that
        implementation. Lastly, Arnd cleans up some compile warnings in dax.
      
        This has all appeared in -next with no reported issues.
      
        Summary:
      
         - Fix a regression of read-only handling in the pmem driver
      
         - Fix a compile warning
      
         - Fix support for platform cache flush commands on powerpc/papr"
      
      * tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC
        libnvdimm: Notify disk drivers to revalidate region read-only
        dax: avoid -Wempty-body warnings
      bdfd99e6
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 7c226774
      Linus Torvalds authored
      Pull CXL memory class fixes from Dan Williams:
       "A collection of fixes for the CXL memory class driver introduced in
        this release cycle.
      
        The driver was primarily developed on a work-in-progress QEMU
        emulation of the interface and we have since found a couple places
        where it hid spec compliance bugs in the driver, or had a spec
        implementation bug itself.
      
        The biggest change here is replacing a percpu_ref with an rwsem to
        cleanup a couple bugs in the error unwind path during ioctl device
        init. Lastly there were some minor cleanups to not export the
        power-management sysfs-ABI for the ioctl device, use the proper sysfs
        helper for emitting values, and prevent subtle bugs as new
        administration commands are added to the supported list.
      
        The bulk of it has appeared in -next save for the top commit which was
        found today and validated on a fixed-up QEMU model.
      
        Summary:
      
         - Fix support for CXL memory devices with registers offset from the
           BAR base.
      
         - Fix the reporting of device capacity.
      
         - Fix the driver commands list definition to be disconnected from the
           UAPI command list.
      
         - Replace percpu_ref with rwsem to fix initialization error path.
      
         - Fix leaks in the driver initialization error path.
      
         - Drop the power/ directory from CXL device sysfs.
      
         - Use the recommended sysfs helper for attribute 'show'
           implementations"
      
      * tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/mem: Fix memory device capacity probing
        cxl/mem: Fix register block offset calculation
        cxl/mem: Force array size of mem_commands[] to CXL_MEM_COMMAND_ID_MAX
        cxl/mem: Disable cxl device power management
        cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
        cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
        cxl/mem: Use sysfs_emit() for attribute show routines
      7c226774
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · fdb5d6ca
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "12 patches.
      
        Subsystems affected by this patch series: mm (documentation, kasan,
        and pagemap), csky, ia64, gcov, and lib"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib: remove "expecting prototype" kernel-doc warnings
        gcov: clang: fix clang-11+ build
        mm: ptdump: fix build failure
        mm/mapping_dirty_helpers: guard hugepage pud's usage
        ia64: tools: remove duplicate definition of ia64_mf() on ia64
        ia64: tools: remove inclusion of ia64-specific version of errno.h header
        ia64: fix discontig.c section mismatches
        ia64: remove duplicate entries in generic_defconfig
        csky: change a Kconfig symbol name to fix e1000 build error
        kasan: remove redundant config option
        kasan: fix hwasan build for gcc
        mm: eliminate "expecting prototype" kernel-doc warnings
      fdb5d6ca
    • Dan Williams's avatar
      cxl/mem: Fix memory device capacity probing · fae8817a
      Dan Williams authored
      The CXL Identify Memory Device output payload emits capacity in 256MB
      units. The driver is treating the capacity field as bytes. This was
      missed because QEMU reports bytes when it should report bytes / 256MB.
      
      Fixes: 8adaf747 ("cxl/mem: Find device capabilities")
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Cc: Ben Widawsky <ben.widawsky@intel.com>
      Link: https://lore.kernel.org/r/161862021044.3259705.7008520073059739760.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      fae8817a
    • Florian Westphal's avatar
      netlink: don't call ->netlink_bind with table lock held · f2764bd4
      Florian Westphal authored
      When I added support to allow generic netlink multicast groups to be
      restricted to subscribers with CAP_NET_ADMIN I was unaware that a
      genl_bind implementation already existed in the past.
      
      It was reverted due to ABBA deadlock:
      
      1. ->netlink_bind gets called with the table lock held.
      2. genetlink bind callback is invoked, it grabs the genl lock.
      
      But when a new genl subsystem is (un)registered, these two locks are
      taken in reverse order.
      
      One solution would be to revert again and add a comment in genl
      referring 1e82a62f, "genetlink: remove genl_bind").
      
      This would need a second change in mptcp to not expose the raw token
      value anymore, e.g.  by hashing the token with a secret key so userspace
      can still associate subflow events with the correct mptcp connection.
      
      However, Paolo Abeni reminded me to double-check why the netlink table is
      locked in the first place.
      
      I can't find one.  netlink_bind() is already called without this lock
      when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt.
      Same holds for the netlink_unbind operation.
      
      Digging through the history, commit f7736080
      ("netlink: access nlk groups safely in netlink bind and getname")
      expanded the lock scope.
      
      commit 3a20773b ("net: netlink: cap max groups which will be considered in netlink_bind()")
      ... removed the nlk->ngroups access that the lock scope
      extension was all about.
      
      Reduce the lock scope again and always call ->netlink_bind without
      the table lock.
      
      The Fixes tag should be vs. the patch mentioned in the link below,
      but that one got squash-merged into the patch that came earlier in the
      series.
      
      Fixes: 4d54cc32 ("mptcp: avoid lock_fast usage in accept path")
      Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/T/#u
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Sean Tranchetti <stranche@codeaurora.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2764bd4
  7. 16 Apr, 2021 2 commits
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.12-2021-04-16' of git://git.kernel.dk/linux-block · 9cdbf646
      Linus Torvalds authored
      Pull io_uring fix from Jens Axboe:
       "Fix for a potential hang at exit with SQPOLL from Pavel"
      
      * tag 'io_uring-5.12-2021-04-16' of git://git.kernel.dk/linux-block:
        io_uring: fix early sqd_list removal sqpoll hangs
      9cdbf646
    • Randy Dunlap's avatar
      lib: remove "expecting prototype" kernel-doc warnings · c95c2d32
      Randy Dunlap authored
      Fix various kernel-doc warnings in lib/ due to missing or erroneous
      function names.
      
      Add kernel-doc for some function parameters that was missing.  Use
      kernel-doc "Return:" notation in earlycpio.c.
      
      Quietens the following warnings:
      
        lib/earlycpio.c:61: warning: expecting prototype for cpio_data find_cpio_data(). Prototype was for find_cpio_data() instead
      
        lib/lru_cache.c:640: warning: expecting prototype for lc_dump(). Prototype was for lc_seq_dump_details() instead
        lru_cache.c:90: warning: Function parameter or member 'cache' not described in 'lc_create'
      
        lib/parman.c:368: warning: expecting prototype for parman_item_del(). Prototype was for parman_item_remove() instead
        parman.c:309: warning: Excess function parameter 'prority' description in 'parman_prio_init'
      
        lib/radix-tree.c:703: warning: expecting prototype for __radix_tree_insert(). Prototype was for radix_tree_insert() instead
        radix-tree.c:180: warning: Excess function parameter 'addr' description in 'radix_tree_find_next_bit'
        radix-tree.c:180: warning: Excess function parameter 'size' description in 'radix_tree_find_next_bit'
        radix-tree.c:931: warning: Function parameter or member 'iter' not described in 'radix_tree_iter_replace'
      
      Link: https://lkml.kernel.org/r/20210411221756.15461-1-rdunlap@infradead.orgSigned-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jiri Pirko <jiri@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c95c2d32