1. 26 Nov, 2021 40 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c5c17547
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from netfilter.
      
        Current release - regressions:
      
         - r8169: fix incorrect mac address assignment
      
         - vlan: fix underflow for the real_dev refcnt when vlan creation
           fails
      
         - smc: avoid warning of possible recursive locking
      
        Current release - new code bugs:
      
         - vsock/virtio: suppress used length validation
      
         - neigh: fix crash in v6 module initialization error path
      
        Previous releases - regressions:
      
         - af_unix: fix change in behavior in read after shutdown
      
         - igb: fix netpoll exit with traffic, avoid warning
      
         - tls: fix splice_read() when starting mid-record
      
         - lan743x: fix deadlock in lan743x_phy_link_status_change()
      
         - marvell: prestera: fix bridge port operation
      
        Previous releases - always broken:
      
         - tcp_cubic: fix spurious Hystart ACK train detections for
           not-cwnd-limited flows
      
         - nexthop: fix refcount issues when replacing IPv6 groups
      
         - nexthop: fix null pointer dereference when IPv6 is not enabled
      
         - phylink: force link down and retrigger resolve on interface change
      
         - mptcp: fix delack timer length calculation and incorrect early
           clearing
      
         - ieee802154: handle iftypes as u32, prevent shift-out-of-bounds
      
         - nfc: virtual_ncidev: change default device permissions
      
         - netfilter: ctnetlink: fix error codes and flags used for kernel
           side filtering of dumps
      
         - netfilter: flowtable: fix IPv6 tunnel addr match
      
         - ncsi: align payload to 32-bit to fix dropped packets
      
         - iavf: fix deadlock and loss of config during VF interface reset
      
         - ice: avoid bpf_prog refcount underflow
      
         - ocelot: fix broken PTP over IP and PTP API violations
      
        Misc:
      
         - marvell: mvpp2: increase MTU limit when XDP enabled"
      
      * tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
        net: dsa: microchip: implement multi-bridge support
        net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
        net: mscc: ocelot: set up traps for PTP packets
        net: ptp: add a definition for the UDP port for IEEE 1588 general messages
        net: mscc: ocelot: create a function that replaces an existing VCAP filter
        net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
        net: hns3: fix incorrect components info of ethtool --reset command
        net: hns3: fix one incorrect value of page pool info when queried by debugfs
        net: hns3: add check NULL address for page pool
        net: hns3: fix VF RSS failed problem after PF enable multi-TCs
        net: qed: fix the array may be out of bound
        net/smc: Don't call clcsock shutdown twice when smc shutdown
        net: vlan: fix underflow for the real_dev refcnt
        ptp: fix filter names in the documentation
        ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
        nfc: virtual_ncidev: change default device permissions
        net/sched: sch_ets: don't peek at classes beyond 'nbands'
        net: stmmac: Disable Tx queues when reconfiguring the interface
        selftests: tls: test for correct proto_ops
        tls: fix replacing proto_ops
        ...
      c5c17547
    • Oleksij Rempel's avatar
      net: dsa: microchip: implement multi-bridge support · b3612ccd
      Oleksij Rempel authored
      Current driver version is able to handle only one bridge at time.
      Configuring two bridges on two different ports would end up shorting this
      bridges by HW. To reproduce it:
      
      	ip l a name br0 type bridge
      	ip l a name br1 type bridge
      	ip l s dev br0 up
      	ip l s dev br1 up
      	ip l s lan1 master br0
      	ip l s dev lan1 up
      	ip l s lan2 master br1
      	ip l s dev lan2 up
      
      	Ping on lan1 and get response on lan2, which should not happen.
      
      This happened, because current driver version is storing one global "Port VLAN
      Membership" and applying it to all ports which are members of any
      bridge.
      To solve this issue, we need to handle each port separately.
      
      This patch is dropping the global port member storage and calculating
      membership dynamically depending on STP state and bridge participation.
      
      Note: STP support was broken before this patch and should be fixed
      separately.
      
      Fixes: c2e86691 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b3612ccd
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 5367cf1c
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix a NULL pointer dereference in the CPPC library code and a
        locking issue related to printing the names of ACPI device nodes in
        the device properties framework.
      
        Specifics:
      
         - Fix NULL pointer dereference in the CPPC library code occuring on
           hybrid systems without CPPC support (Rafael Wysocki).
      
         - Avoid attempts to acquire a semaphore with interrupts off when
           printing the names of ACPI device nodes and clean up code on top of
           that fix (Sakari Ailus)"
      
      * tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
        ACPI: Make acpi_node_get_parent() local
        ACPI: Get acpi_device's parent from the parent field
      5367cf1c
    • Linus Torvalds's avatar
      Merge tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0ce629b1
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These address three issues in the intel_pstate driver and fix two
        problems related to hibernation.
      
        Specifics:
      
         - Make intel_pstate work correctly on Ice Lake server systems with
           out-of-band performance control enabled (Adamos Ttofari).
      
         - Fix EPP handling in intel_pstate during CPU offline and online in
           the active mode (Rafael Wysocki).
      
         - Make intel_pstate support ITMT on asymmetric systems with
           overclocking enabled (Srinivas Pandruvada).
      
         - Fix hibernation image saving when using the user space interface
           based on the snapshot special device file (Evan Green).
      
         - Make the hibernation code release the snapshot block device using
           the same mode that was used when acquiring it (Thomas Zeitlhofer)"
      
      * tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: hibernate: Fix snapshot partial write lengths
        PM: hibernate: use correct mode for swsusp_close()
        cpufreq: intel_pstate: ITMT support for overclocked system
        cpufreq: intel_pstate: Fix active mode offline/online EPP handling
        cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
      0ce629b1
    • Linus Torvalds's avatar
      Merge tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 925c9437
      Linus Torvalds authored
      Pull fuse fix from Miklos Szeredi:
       "Fix a regression caused by a bugfix in the previous release. The
        symptom is a VM_BUG_ON triggered from splice to the fuse device.
      
        Unfortunately the original bugfix was already backported to a number
        of stable releases, so this fix-fix will need to be backported as
        well"
      
      * tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: release pipe buf after last use
      925c9437
    • Jakub Kicinski's avatar
      Merge branch 'fix-broken-ptp-over-ip-on-ocelot-switches' · 32c54497
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fix broken PTP over IP on Ocelot switches
      
      Changes in v2: added patch 5, added Richard's ack for the whole series
      sans patch 5 which is new.
      
      Po Liu reported recently that timestamping PTP over IPv4 is broken using
      the felix driver on NXP LS1028A. This has been known for a while, of
      course, since it has always been broken. The reason is because IP PTP
      packets are currently treated as unknown IP multicast, which is not
      flooded to the CPU port in the ocelot driver design, so packets don't
      reach the ptp4l program.
      
      The series solves the problem by installing packet traps per port when
      the timestamping ioctl is called, depending on the RX filter selected
      (L2, L4 or both).
      ====================
      
      Link: https://lore.kernel.org/r/20211126172845.3149260-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      32c54497
    • Vladimir Oltean's avatar
      net: mscc: ocelot: correctly report the timestamping RX filters in ethtool · c49a35ee
      Vladimir Oltean authored
      The driver doesn't support RX timestamping for non-PTP packets, but it
      declares that it does. Restrict the reported RX filters to PTP v2 over
      L2 and over L4.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c49a35ee
    • Vladimir Oltean's avatar
      net: mscc: ocelot: set up traps for PTP packets · 96ca08c0
      Vladimir Oltean authored
      IEEE 1588 support was declared too soon for the Ocelot switch. Out of
      reset, this switch does not apply any special treatment for PTP packets,
      i.e. when an event message is received, the natural tendency is to
      forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
      is under a bridge, since user space application stacks (written
      primarily for endpoint ports, not switches) like ptp4l expect that PTP
      messages are always received on AF_PACKET / AF_INET sockets (depending
      on the PTP transport being used), and never being autonomously
      forwarded. Any forwarding, if necessary (for example in Transparent
      Clock mode) is handled in software by ptp4l. Having the hardware forward
      these packets too will cause duplicates which will confuse endpoints
      connected to these switches.
      
      So PTP over L2 barely works, in the sense that PTP packets reach the CPU
      port, but they reach it via flooding, and therefore reach lots of other
      unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
      This is because the Ocelot switch have a separate destination port mask
      for unknown IP multicast (which PTP over IP is) flooding compared to
      unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
      the driver allows the CPU port to be in the PGID_MC port group, but not
      in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
      Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
      switches is not very powerful at all, so every penny they could save by
      not allowing flooding to the CPU port module matters. Unknown IP
      multicast did not make it.
      
      The de facto consensus is that when a switch is PTP-aware and an
      application stack for PTP is running, switches should have some sort of
      trapping mechanism for PTP packets, to extract them from the hardware
      data path. This avoids both problems:
      (a) PTP packets are no longer flooded to unwanted destinations
      (b) PTP over IP packets are no longer denied from reaching the CPU since
          they arrive there via a trap and not via flooding
      
      It is not the first time when this change is attempted. Last time, the
      feedback from Allan Nielsen and Andrew Lunn was that the traps should
      not be installed by default, and that PTP-unaware switching may be
      desired for some use cases:
      https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
      
      To address that feedback, the present patch adds the necessary packet
      traps according to the RX filter configuration transmitted by user space
      through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
      keep 5 filters, which are amended each time RX timestamping is enabled
      or disabled on a port:
      - 1 for PTP over L2
      - 2 for PTP over IPv4 (UDP ports 319 and 320)
      - 2 for PTP over IPv6 (UDP ports 319 and 320)
      
      The cookie by which these filters (invisible to tc) are identified is
      strategically chosen such that it does not collide with the filters used
      for the ocelot-8021q tagging protocol by the Felix driver, or with the
      MRP traps set up by the Ocelot library.
      
      Other alternatives were considered, like patching user space to do
      something, but there are so many ways in which PTP packets could be made
      to reach the CPU, generically speaking, that "do what?" is a very valid
      question. The ptp4l program from the linuxptp stack already attempts to
      do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
      PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
      a dev_mc_add() on the interface, in the kernel:
      https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
      https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c
      
      Reality shows that this is not sufficient in case the interface belongs
      to a switchdev driver, as dev_mc_add() does not show the intention to
      trap a packet to the CPU, but rather the intention to not drop it (it is
      strictly for RX filtering, same as promiscuous does not mean to send all
      traffic to the CPU, but to not drop traffic with unknown MAC DA). This
      topic is a can of worms in itself, and it would be great if user space
      could just stay out of it.
      
      On the other hand, setting up PTP traps privately within the driver is
      not new by any stretch of the imagination:
      https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
      https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
      https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21
      
      So this is the approach taken here as well. The difference here being
      that we prepare and destroy the traps per port, dynamically at runtime,
      as opposed to driver init time, because apparently, PTP-unaware
      forwarding is a use case.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Reported-by: default avatarPo Liu <po.liu@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96ca08c0
    • Vladimir Oltean's avatar
      net: ptp: add a definition for the UDP port for IEEE 1588 general messages · ec15baec
      Vladimir Oltean authored
      As opposed to event messages (Sync, PdelayReq etc) which require
      timestamping, general messages (Announce, FollowUp etc) do not.
      In PTP they are part of different streams of data.
      
      IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
      destination port assigned by IANA is 319 for event messages, and 320 for
      general messages. Yet the kernel seems to be missing the definition for
      general messages. This patch adds it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec15baec
    • Vladimir Oltean's avatar
      net: mscc: ocelot: create a function that replaces an existing VCAP filter · 95706be1
      Vladimir Oltean authored
      VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
      tc flower offload on ocelot, among other things. The ingress port mask
      on which VCAP rules match is present as a bit field in the actual key of
      the rule. This means that it is possible for a rule to be shared among
      multiple source ports. When the rule is added one by one on each desired
      port, that the ingress port mask of the key must be edited and rewritten
      to hardware.
      
      But the API in ocelot_vcap.c does not allow for this. For one thing,
      ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
      because ocelot_vcap_filter_add() works with a preallocated and
      prepopulated filter and programs it to hardware, and
      ocelot_vcap_filter_del() does both the job of removing the specified
      filter from hardware, as well as kfreeing it. That is to say, the only
      option of editing a filter in place, which is to delete it, modify the
      structure and add it back, does not work because it results in
      use-after-free.
      
      This patch introduces ocelot_vcap_filter_replace, which trivially
      reprograms a VCAP entry to hardware, at the exact same index at which it
      existed before, without modifying any list or allocating any memory.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95706be1
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP · 8a075464
      Vladimir Oltean authored
      The ocelot driver, when asked to timestamp all receiving packets, 1588
      v1 or NTP, says "nah, here's 1588 v2 for you".
      
      According to this discussion:
      https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
      drivers that downgrade from a wider request to a narrower response (or
      even a response where the intersection with the request is empty) are
      buggy, and should return -ERANGE instead. This patch fixes that.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Suggested-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a075464
    • Jakub Kicinski's avatar
      Merge branch 'net-hns3-add-some-fixes-for-net' · b32e521e
      Jakub Kicinski authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add some fixes for -net
      
      This series adds some fixes for the HNS3 ethernet driver.
      ====================
      
      Link: https://lore.kernel.org/r/20211126120318.33921-1-huangguangbin2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b32e521e
    • Jie Wang's avatar
      net: hns3: fix incorrect components info of ethtool --reset command · 82229c4d
      Jie Wang authored
      Currently, HNS3 driver doesn't clear the reset flags of components after
      successfully executing reset, it causes userspace info of
      "Components reset" and "Components not reset" is incorrect.
      
      So fix this problem by clear corresponding reset flag after reset process.
      
      Fixes: ddccc5e3 ("net: hns3: add support for triggering reset by ethtool")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      82229c4d
    • Hao Chen's avatar
      net: hns3: fix one incorrect value of page pool info when queried by debugfs · 9c147917
      Hao Chen authored
      Currently, when user queries page pool info by debugfs command
      "cat page_pool_info", the cnt of allocated page for page pool may be
      incorrect because of memory inconsistency problem caused by compiler
      optimization.
      
      So this patch uses READ_ONCE() to read value of pages_state_hold_cnt to
      fix this problem.
      
      Fixes: 850bfb91 ("net: hns3: debugfs add support dumping page pool info")
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9c147917
    • Hao Chen's avatar
      net: hns3: add check NULL address for page pool · b8af344c
      Hao Chen authored
      When page pool is not enabled, its address value is still NULL and page
      pool should not be accessed, so add a check for it.
      
      Fixes: 850bfb91 ("net: hns3: debugfs add support dumping page pool info")
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b8af344c
    • Guangbin Huang's avatar
      net: hns3: fix VF RSS failed problem after PF enable multi-TCs · 8d2ad993
      Guangbin Huang authored
      When PF is set to multi-TCs and configured mapping relationship between
      priorities and TCs, the hardware will active these settings for this PF
      and its VFs.
      
      In this case when VF just uses one TC and its rx packets contain priority,
      and if the priority is not mapped to TC0, as other TCs of VF is not valid,
      hardware always put this kind of packets to the queue 0. It cause this kind
      of packets of VF can not be used RSS function.
      
      To fix this problem, set tc mode of all unused TCs of VF to the setting of
      TC0, then rx packet with priority which map to unused TC will be direct to
      TC0.
      
      Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d2ad993
    • zhangyue's avatar
      net: qed: fix the array may be out of bound · 0435a4d0
      zhangyue authored
      If the variable 'p_bit->flags' is always 0,
      the loop condition is always 0.
      
      The variable 'j' may be greater than or equal to 32.
      
      At this time, the array 'p_aeu->bits[32]' may be out
      of bound.
      Signed-off-by: default avatarzhangyue <zhangyue1@kylinos.cn>
      Link: https://lore.kernel.org/r/20211125113610.273841-1-zhangyue1@kylinos.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0435a4d0
    • Linus Torvalds's avatar
      Merge tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 7e635452
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "One more fix to the lzo code, a missing put_page causing memory leaks
        when some error branches are taken"
      
      * tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix the memory leak caused in lzo_compress_pages()
      7e635452
    • Tony Lu's avatar
      net/smc: Don't call clcsock shutdown twice when smc shutdown · bacb6c1e
      Tony Lu authored
      When applications call shutdown() with SHUT_RDWR in userspace,
      smc_close_active() calls kernel_sock_shutdown(), and it is called
      twice in smc_shutdown().
      
      This fixes this by checking sk_state before do clcsock shutdown, and
      avoids missing the application's call of smc_shutdown().
      
      Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/
      Fixes: 606a63c9 ("net/smc: Ensure the active closing peer first closes clcsock")
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Reviewed-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bacb6c1e
    • Ziyang Xuan's avatar
      net: vlan: fix underflow for the real_dev refcnt · 01d9cc2d
      Ziyang Xuan authored
      Inject error before dev_hold(real_dev) in register_vlan_dev(),
      and execute the following testcase:
      
      ip link add dev dummy1 type dummy
      ip link add name dummy1.100 link dummy1 type vlan id 100
      ip link del dev dummy1
      
      When the dummy netdevice is removed, we will get a WARNING as following:
      
      =======================================================================
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0
      
      and an endless loop of:
      
      =======================================================================
      unregister_netdevice: waiting for dummy1 to become free. Usage count = -1073741824
      
      That is because dev_put(real_dev) in vlan_dev_free() be called without
      dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev
      underflow.
      
      Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of
      ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev
      symmetrical.
      
      Fixes: 563bcbae ("net: vlan: fix a UAF in vlan_dev_real_dev()")
      Reported-by: default avatarPetr Machata <petrm@nvidia.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      01d9cc2d
    • Jakub Kicinski's avatar
      cbb91dcb
    • Julian Wiedmann's avatar
      ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce() · 0276af21
      Julian Wiedmann authored
      ethtool_set_coalesce() now uses both the .get_coalesce() and
      .set_coalesce() callbacks. But the check for their availability is
      buggy, so changing the coalesce settings on a device where the driver
      provides only _one_ of the callbacks results in a NULL pointer
      dereference instead of an -EOPNOTSUPP.
      
      Fix the condition so that the availability of both callbacks is
      ensured. This also matches the netlink code.
      
      Note that reproducing this requires some effort - it only affects the
      legacy ioctl path, and needs a specific combination of driver options:
      - have .get_coalesce() and .coalesce_supported but no
       .set_coalesce(), or
      - have .set_coalesce() but no .get_coalesce(). Here eg. ethtool doesn't
        cause the crash as it first attempts to call ethtool_get_coalesce()
        and bails out on error.
      
      Fixes: f3ccfda1 ("ethtool: extend coalesce setting uAPI with CQE mode")
      Cc: Yufeng Mo <moyufeng@huawei.com>
      Cc: Huazhong Tan <tanhuazhong@huawei.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Link: https://lore.kernel.org/r/20211126175543.28000-1-jwi@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0276af21
    • Thadeu Lima de Souza Cascardo's avatar
      nfc: virtual_ncidev: change default device permissions · c26381f9
      Thadeu Lima de Souza Cascardo authored
      Device permissions is S_IALLUGO, with many unnecessary bits. Remove them
      and also remove read and write permissions from group and others.
      
      Before the change:
      crwsrwsrwt    1 0        0          10, 125 Nov 25 13:59 /dev/virtual_nci
      
      After the change:
      crw-------    1 0        0          10, 125 Nov 25 14:05 /dev/virtual_nci
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Reviewed-by: default avatarBongsu Jeon <bongsu.jeon@samsung.com>
      Link: https://lore.kernel.org/r/20211125141457.716921-1-cascardo@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c26381f9
    • Davide Caratti's avatar
      net/sched: sch_ets: don't peek at classes beyond 'nbands' · de6d2592
      Davide Caratti authored
      when the number of DRR classes decreases, the round-robin active list can
      contain elements that have already been freed in ets_qdisc_change(). As a
      consequence, it's possible to see a NULL dereference crash, caused by the
      attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000018
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets]
       Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d
       RSP: 0000:ffffbb36c0b5fdd8 EFLAGS: 00010287
       RAX: ffff956678efed30 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000002 RSI: ffffffff9b938dc9 RDI: 0000000000000000
       RBP: ffff956678efed30 R08: e2f3207fe360129c R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000001 R12: ffff956678efeac0
       R13: ffff956678efe800 R14: ffff956611545000 R15: ffff95667ac8f100
       FS:  00007f2aa9120740(0000) GS:ffff95667b800000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000018 CR3: 000000011070c000 CR4: 0000000000350ee0
       Call Trace:
        <TASK>
        qdisc_peek_dequeued+0x29/0x70 [sch_ets]
        tbf_dequeue+0x22/0x260 [sch_tbf]
        __qdisc_run+0x7f/0x630
        net_tx_action+0x290/0x4c0
        __do_softirq+0xee/0x4f8
        irq_exit_rcu+0xf4/0x130
        sysvec_apic_timer_interrupt+0x52/0xc0
        asm_sysvec_apic_timer_interrupt+0x12/0x20
       RIP: 0033:0x7f2aa7fc9ad4
       Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00
       RSP: 002b:00007ffe5d33fab8 EFLAGS: 00000202
       RAX: 0000000000000002 RBX: 0000561f72c31460 RCX: 0000561f72c31720
       RDX: 0000000000000002 RSI: 0000561f72c31722 RDI: 0000561f72c31720
       RBP: 000000000000002a R08: 00007ffe5d33fa40 R09: 0000000000000014
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000561f7187e380
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000561f72c31460
        </TASK>
       Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
       CR2: 0000000000000018
      
      Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to
      remove from the active list those elements that are no more SP nor DRR.
      
      [1] https://lore.kernel.org/netdev/60d274838bf09777f0371253416e8af71360bc08.1633609148.git.dcaratti@redhat.com/
      
      v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting
          DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock
          acquired, thanks to Cong Wang.
      
      v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb
          from the next list item.
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Fixes: dcc68b4d ("net: sch_ets: Add a new Qdisc")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Link: https://lore.kernel.org/r/7a5c496eed2d62241620bdbb83eb03fb9d571c99.1637762721.git.dcaratti@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      de6d2592
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-properties' · 2e13e5ae
      Rafael J. Wysocki authored
      Merge fix and cleanup related to the management of ACPI device
      properties for 5.16-rc3.
      
      * acpi-properties:
        ACPI: Make acpi_node_get_parent() local
        ACPI: Get acpi_device's parent from the parent field
      2e13e5ae
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-sleep' · 7803516d
      Rafael J. Wysocki authored
      Merge hibernation-related fixes for 5.16-rc3.
      
      * pm-sleep:
        PM: hibernate: Fix snapshot partial write lengths
        PM: hibernate: use correct mode for swsusp_close()
      7803516d
    • Yannick Vignon's avatar
      net: stmmac: Disable Tx queues when reconfiguring the interface · b270bfe6
      Yannick Vignon authored
      The Tx queues were not disabled in situations where the driver needed to
      stop the interface to apply a new configuration. This could result in a
      kernel panic when doing any of the 3 following actions:
      * reconfiguring the number of queues (ethtool -L)
      * reconfiguring the size of the ring buffers (ethtool -G)
      * installing/removing an XDP program (ip l set dev ethX xdp)
      
      Prevent the panic by making sure netif_tx_disable is called when stopping
      an interface.
      
      Without this patch, the following kernel panic can be observed when doing
      any of the actions above:
      
      Unable to handle kernel paging request at virtual address ffff80001238d040
      [....]
       Call trace:
        dwmac4_set_addr+0x8/0x10
        dev_hard_start_xmit+0xe4/0x1ac
        sch_direct_xmit+0xe8/0x39c
        __dev_queue_xmit+0x3ec/0xaf0
        dev_queue_xmit+0x14/0x20
      [...]
      [ end trace 0000000000000002 ]---
      
      Fixes: 5fabb012 ("net: stmmac: Add initial XDP support")
      Fixes: aa042f60 ("net: stmmac: Add support to Ethtool get/set ring parameters")
      Fixes: 0366f7e0 ("net: stmmac: add ethtool support for get/set channels")
      Signed-off-by: default avatarYannick Vignon <yannick.vignon@nxp.com>
      Link: https://lore.kernel.org/r/20211124154731.1676949-1-yannick.vignon@oss.nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b270bfe6
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 1bff7d7e
      Linus Torvalds authored
      Pull char/misc driver fix from Greg KH:
       "Here is a single binder driver fix for 5.16-rc3.
      
        It resolves a problem reported in the set of binder fixes that went
        into 5.16-rc1. It has been in linux-next for a while with no reported
        problems"
      
      * tag 'char-misc-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        binder: fix test regression due to sender_euid change
      1bff7d7e
    • Linus Torvalds's avatar
      Merge tag 'staging-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 70337441
      Linus Torvalds authored
      Pull staging fixes from Greg KH:
       "Here are some small staging driver fixes and one driver removal for
        5.16-rc3.
      
        The fixes resolve a number of small issues found in 5.16-rc1, nothing
        huge at all. The driver removal was due to a platform being removed in
        5.16-rc1, but this driver was forgotten about. It wasn't being built
        anymore so it's safe to delete.
      
        All have been in linux-next for a while with no reported problems"
      
      * tag 'staging-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
        staging: greybus: Add missing rwsem around snd_ctl_remove() calls
        staging: Remove Netlogic XLP network driver
        staging: r8188eu: fix a memory leak in rtw_wx_read32()
        staging: r8188eu: use GFP_ATOMIC under spinlock
        staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
        staging/fbtft: Fix backlight
        staging: r8188eu: Fix breakage introduced when 5G code was removed
      70337441
    • Linus Torvalds's avatar
      Merge tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · ba2cacc1
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of small USB fixes for reported problems for
        5.16-rc3
      
        They include:
      
         - typec driver fixes
      
         - new usb-serial driver ids
      
         - usb hub enumeration issues that were much reported
      
         - gadget driver fixes
      
         - dwc3 driver fix
      
         - chipidea driver fixe
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: serial: option: add Fibocom FM101-GL variants
        usb: typec: tipd: Fix initialization sequence for cd321x
        usb: typec: tipd: Fix typo in cd321x_switch_power_state
        usb: hub: Fix locking issues with address0_mutex
        USB: serial: pl2303: fix GC type detection
        USB: serial: option: add Telit LE910S1 0x9200 composition
        usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference in probe
        usb: hub: Fix usb enumeration issue due to address0 race
        usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
        usb: dwc3: leave default DMA for PCI devices
        usb: dwc2: hcd_queue: Fix use of floating point literal
        usb: dwc3: gadget: Fix null pointer exception
        usb: gadget: udc-xilinx: Fix an error handling path in 'xudc_probe()'
        usb: xhci: tegra: Check padctrl interrupt presence in device tree
        usb: dwc2: gadget: Fix ISOC flow for elapsed frames
        usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer
        usb: dwc3: gadget: Ignore NoStream after End Transfer
        usb: dwc3: core: Revise GHWPARAMS9 offset
      ba2cacc1
    • Linus Torvalds's avatar
      Merge tag 'mmc-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · d3e64792
      Linus Torvalds authored
      Pull MMC host fixes from Ulf Hansson:
      
       - mmc_spi: Add SPI IDs to silence warning
      
       - sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
      
       - sdhci-esdhc-imx: Disable broken CMDQ for imx8qm/imx8qxp/imx8mm
      
      * tag 'mmc-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: spi: Add device-tree SPI IDs
        mmc: sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
        mmc: sdhci-esdhc-imx: disable CMDQ support
      d3e64792
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 80d75202
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "I2C has an interrupt storm fix for the i801, better timeout handling
        for the new virtio driver, and some documentation fixes this time"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        docs: i2c: smbus-protocol: mention the repeated start condition
        i2c: virtio: disable timeout handling
        i2c: i801: Fix interrupt storm from SMB_ALERT signal
        i2c: i801: Restore INTREN on unload
        dt-bindings: i2c: imx-lpi2c: Fix i.MX 8QM compatible matching
      80d75202
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 6b54698a
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - Kconfig fix to make it possible to control building of the privcmd
         driver
      
       - three fixes for issues identified by the kernel test robot
      
       - a five-patch series to simplify timeout handling for Xen PV driver
         initialization
      
       - two patches to fix error paths in xenstore/xenbus driver
         initialization
      
      * tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: make HYPERVISOR_set_debugreg() always_inline
        xen: make HYPERVISOR_get_debugreg() always_inline
        xen: detect uninitialized xenbus in xenbus_init
        xen: flag xen_snd_front to be not essential for system boot
        xen: flag pvcalls-front to be not essential for system boot
        xen: flag hvc_xen to be not essential for system boot
        xen: flag xen_drm_front to be not essential for system boot
        xen: add "not_essential" flag to struct xenbus_driver
        xen/pvh: add missing prototype to header
        xen: don't continue xenstore initialization in case of errors
        xen/privcmd: make option visible in Kconfig
      6b54698a
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · f17fb26d
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Three arm64 fixes.
      
        The main one is a fix to the way in which we evaluate the macro
        arguments to our uaccess routines, which we _think_ might be the root
        cause behind some unkillable tasks we've seen in the Android arm64 CI
        farm (testing is ongoing). In any case, it's worth fixing.
      
        Other than that, we've toned down an over-zealous VM_BUG_ON() and
        fixed ftrace stack unwinding in a bunch of cases.
      
        Summary:
      
         - Evaluate uaccess macro arguments outside of the critical section
      
         - Tighten up VM_BUG_ON() in pmd_populate_kernel() to avoid false positive
      
         - Fix ftrace stack unwinding using HAVE_FUNCTION_GRAPH_RET_ADDR_PTR"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: uaccess: avoid blocking within critical sections
        arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd
        arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
      f17fb26d
    • Qu Wenruo's avatar
      btrfs: fix the memory leak caused in lzo_compress_pages() · daf87e95
      Qu Wenruo authored
      [BUG]
      Fstests generic/027 is pretty easy to trigger a slow but steady memory
      leak if run with "-o compress=lzo" mount option.
      
      Normally one single run of generic/027 is enough to eat up at least 4G ram.
      
      [CAUSE]
      In commit d4088803 ("btrfs: subpage: make lzo_compress_pages()
      compatible") we changed how @page_in is released.
      
      But that refactoring makes @page_in only released after all pages being
      compressed.
      
      This leaves error path not releasing @page_in. And by "error path"
      things like incompressible data will also be treated as an error
      (-E2BIG).
      
      Thus it can cause a memory leak if even nothing wrong happened.
      
      [FIX]
      Add check under @out label to release @page_in when needed, so when we
      hit any error, the input page is properly released.
      Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Fixes: d4088803 ("btrfs: subpage: make lzo_compress_pages() compatible")
      Reviewed-and-tested-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      daf87e95
    • Jakub Kicinski's avatar
      Merge branch 'tls-splice_read-fixes' · 49573ff7
      Jakub Kicinski authored
      Jakub Kicinski says:
      
      ====================
      tls: splice_read fixes
      
      As I work my way to unlocked and zero-copy TLS Rx the obvious bugs
      in the splice_read implementation get harder and harder to ignore.
      This is to say the fixes here are discovered by code inspection,
      I'm not aware of anyone actually using splice_read.
      ====================
      
      Link: https://lore.kernel.org/r/20211124232557.2039757-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      49573ff7
    • Jakub Kicinski's avatar
      selftests: tls: test for correct proto_ops · f884a342
      Jakub Kicinski authored
      Previous patch fixes overriding callbacks incorrectly. Triggering
      the crash in sendpage_locked would be more spectacular but it's
      hard to get to, so take the easier path of proving this is broken
      and call getname. We're currently getting IPv4 socket info on an
      IPv6 socket.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f884a342
    • Jakub Kicinski's avatar
      tls: fix replacing proto_ops · f3911f73
      Jakub Kicinski authored
      We replace proto_ops whenever TLS is configured for RX. But our
      replacement also overrides sendpage_locked, which will crash
      unless TX is also configured. Similarly we plug both of those
      in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely
      different implementation for TX.
      
      Last but not least we always plug in something based on inet_stream_ops
      even though a few of the callbacks differ for IPv6 (getname, release,
      bind).
      
      Use a callback building method similar to what we do for struct proto.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Fixes: d4ffb02d ("net/tls: enable sk_msg redirect to tls socket egress")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3911f73
    • Jakub Kicinski's avatar
      selftests: tls: test splicing decrypted records · 274af0f9
      Jakub Kicinski authored
      Add tests for half-received and peeked records.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      274af0f9
    • Jakub Kicinski's avatar
      tls: splice_read: fix accessing pre-processed records · e062fe99
      Jakub Kicinski authored
      recvmsg() will put peek()ed and partially read records onto the rx_list.
      splice_read() needs to consult that list otherwise it may miss data.
      Align with recvmsg() and also put partially-read records onto rx_list.
      tls_sw_advance_skb() is pretty pointless now and will be removed in
      net-next.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e062fe99