1. 29 Nov, 2021 15 commits
  2. 26 Nov, 2021 25 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c5c17547
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from netfilter.
      
        Current release - regressions:
      
         - r8169: fix incorrect mac address assignment
      
         - vlan: fix underflow for the real_dev refcnt when vlan creation
           fails
      
         - smc: avoid warning of possible recursive locking
      
        Current release - new code bugs:
      
         - vsock/virtio: suppress used length validation
      
         - neigh: fix crash in v6 module initialization error path
      
        Previous releases - regressions:
      
         - af_unix: fix change in behavior in read after shutdown
      
         - igb: fix netpoll exit with traffic, avoid warning
      
         - tls: fix splice_read() when starting mid-record
      
         - lan743x: fix deadlock in lan743x_phy_link_status_change()
      
         - marvell: prestera: fix bridge port operation
      
        Previous releases - always broken:
      
         - tcp_cubic: fix spurious Hystart ACK train detections for
           not-cwnd-limited flows
      
         - nexthop: fix refcount issues when replacing IPv6 groups
      
         - nexthop: fix null pointer dereference when IPv6 is not enabled
      
         - phylink: force link down and retrigger resolve on interface change
      
         - mptcp: fix delack timer length calculation and incorrect early
           clearing
      
         - ieee802154: handle iftypes as u32, prevent shift-out-of-bounds
      
         - nfc: virtual_ncidev: change default device permissions
      
         - netfilter: ctnetlink: fix error codes and flags used for kernel
           side filtering of dumps
      
         - netfilter: flowtable: fix IPv6 tunnel addr match
      
         - ncsi: align payload to 32-bit to fix dropped packets
      
         - iavf: fix deadlock and loss of config during VF interface reset
      
         - ice: avoid bpf_prog refcount underflow
      
         - ocelot: fix broken PTP over IP and PTP API violations
      
        Misc:
      
         - marvell: mvpp2: increase MTU limit when XDP enabled"
      
      * tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
        net: dsa: microchip: implement multi-bridge support
        net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
        net: mscc: ocelot: set up traps for PTP packets
        net: ptp: add a definition for the UDP port for IEEE 1588 general messages
        net: mscc: ocelot: create a function that replaces an existing VCAP filter
        net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
        net: hns3: fix incorrect components info of ethtool --reset command
        net: hns3: fix one incorrect value of page pool info when queried by debugfs
        net: hns3: add check NULL address for page pool
        net: hns3: fix VF RSS failed problem after PF enable multi-TCs
        net: qed: fix the array may be out of bound
        net/smc: Don't call clcsock shutdown twice when smc shutdown
        net: vlan: fix underflow for the real_dev refcnt
        ptp: fix filter names in the documentation
        ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
        nfc: virtual_ncidev: change default device permissions
        net/sched: sch_ets: don't peek at classes beyond 'nbands'
        net: stmmac: Disable Tx queues when reconfiguring the interface
        selftests: tls: test for correct proto_ops
        tls: fix replacing proto_ops
        ...
      c5c17547
    • Oleksij Rempel's avatar
      net: dsa: microchip: implement multi-bridge support · b3612ccd
      Oleksij Rempel authored
      Current driver version is able to handle only one bridge at time.
      Configuring two bridges on two different ports would end up shorting this
      bridges by HW. To reproduce it:
      
      	ip l a name br0 type bridge
      	ip l a name br1 type bridge
      	ip l s dev br0 up
      	ip l s dev br1 up
      	ip l s lan1 master br0
      	ip l s dev lan1 up
      	ip l s lan2 master br1
      	ip l s dev lan2 up
      
      	Ping on lan1 and get response on lan2, which should not happen.
      
      This happened, because current driver version is storing one global "Port VLAN
      Membership" and applying it to all ports which are members of any
      bridge.
      To solve this issue, we need to handle each port separately.
      
      This patch is dropping the global port member storage and calculating
      membership dynamically depending on STP state and bridge participation.
      
      Note: STP support was broken before this patch and should be fixed
      separately.
      
      Fixes: c2e86691 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b3612ccd
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 5367cf1c
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix a NULL pointer dereference in the CPPC library code and a
        locking issue related to printing the names of ACPI device nodes in
        the device properties framework.
      
        Specifics:
      
         - Fix NULL pointer dereference in the CPPC library code occuring on
           hybrid systems without CPPC support (Rafael Wysocki).
      
         - Avoid attempts to acquire a semaphore with interrupts off when
           printing the names of ACPI device nodes and clean up code on top of
           that fix (Sakari Ailus)"
      
      * tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
        ACPI: Make acpi_node_get_parent() local
        ACPI: Get acpi_device's parent from the parent field
      5367cf1c
    • Linus Torvalds's avatar
      Merge tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0ce629b1
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These address three issues in the intel_pstate driver and fix two
        problems related to hibernation.
      
        Specifics:
      
         - Make intel_pstate work correctly on Ice Lake server systems with
           out-of-band performance control enabled (Adamos Ttofari).
      
         - Fix EPP handling in intel_pstate during CPU offline and online in
           the active mode (Rafael Wysocki).
      
         - Make intel_pstate support ITMT on asymmetric systems with
           overclocking enabled (Srinivas Pandruvada).
      
         - Fix hibernation image saving when using the user space interface
           based on the snapshot special device file (Evan Green).
      
         - Make the hibernation code release the snapshot block device using
           the same mode that was used when acquiring it (Thomas Zeitlhofer)"
      
      * tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: hibernate: Fix snapshot partial write lengths
        PM: hibernate: use correct mode for swsusp_close()
        cpufreq: intel_pstate: ITMT support for overclocked system
        cpufreq: intel_pstate: Fix active mode offline/online EPP handling
        cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
      0ce629b1
    • Linus Torvalds's avatar
      Merge tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 925c9437
      Linus Torvalds authored
      Pull fuse fix from Miklos Szeredi:
       "Fix a regression caused by a bugfix in the previous release. The
        symptom is a VM_BUG_ON triggered from splice to the fuse device.
      
        Unfortunately the original bugfix was already backported to a number
        of stable releases, so this fix-fix will need to be backported as
        well"
      
      * tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: release pipe buf after last use
      925c9437
    • Jakub Kicinski's avatar
      Merge branch 'fix-broken-ptp-over-ip-on-ocelot-switches' · 32c54497
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fix broken PTP over IP on Ocelot switches
      
      Changes in v2: added patch 5, added Richard's ack for the whole series
      sans patch 5 which is new.
      
      Po Liu reported recently that timestamping PTP over IPv4 is broken using
      the felix driver on NXP LS1028A. This has been known for a while, of
      course, since it has always been broken. The reason is because IP PTP
      packets are currently treated as unknown IP multicast, which is not
      flooded to the CPU port in the ocelot driver design, so packets don't
      reach the ptp4l program.
      
      The series solves the problem by installing packet traps per port when
      the timestamping ioctl is called, depending on the RX filter selected
      (L2, L4 or both).
      ====================
      
      Link: https://lore.kernel.org/r/20211126172845.3149260-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      32c54497
    • Vladimir Oltean's avatar
      net: mscc: ocelot: correctly report the timestamping RX filters in ethtool · c49a35ee
      Vladimir Oltean authored
      The driver doesn't support RX timestamping for non-PTP packets, but it
      declares that it does. Restrict the reported RX filters to PTP v2 over
      L2 and over L4.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c49a35ee
    • Vladimir Oltean's avatar
      net: mscc: ocelot: set up traps for PTP packets · 96ca08c0
      Vladimir Oltean authored
      IEEE 1588 support was declared too soon for the Ocelot switch. Out of
      reset, this switch does not apply any special treatment for PTP packets,
      i.e. when an event message is received, the natural tendency is to
      forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
      is under a bridge, since user space application stacks (written
      primarily for endpoint ports, not switches) like ptp4l expect that PTP
      messages are always received on AF_PACKET / AF_INET sockets (depending
      on the PTP transport being used), and never being autonomously
      forwarded. Any forwarding, if necessary (for example in Transparent
      Clock mode) is handled in software by ptp4l. Having the hardware forward
      these packets too will cause duplicates which will confuse endpoints
      connected to these switches.
      
      So PTP over L2 barely works, in the sense that PTP packets reach the CPU
      port, but they reach it via flooding, and therefore reach lots of other
      unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
      This is because the Ocelot switch have a separate destination port mask
      for unknown IP multicast (which PTP over IP is) flooding compared to
      unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
      the driver allows the CPU port to be in the PGID_MC port group, but not
      in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
      Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
      switches is not very powerful at all, so every penny they could save by
      not allowing flooding to the CPU port module matters. Unknown IP
      multicast did not make it.
      
      The de facto consensus is that when a switch is PTP-aware and an
      application stack for PTP is running, switches should have some sort of
      trapping mechanism for PTP packets, to extract them from the hardware
      data path. This avoids both problems:
      (a) PTP packets are no longer flooded to unwanted destinations
      (b) PTP over IP packets are no longer denied from reaching the CPU since
          they arrive there via a trap and not via flooding
      
      It is not the first time when this change is attempted. Last time, the
      feedback from Allan Nielsen and Andrew Lunn was that the traps should
      not be installed by default, and that PTP-unaware switching may be
      desired for some use cases:
      https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
      
      To address that feedback, the present patch adds the necessary packet
      traps according to the RX filter configuration transmitted by user space
      through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
      keep 5 filters, which are amended each time RX timestamping is enabled
      or disabled on a port:
      - 1 for PTP over L2
      - 2 for PTP over IPv4 (UDP ports 319 and 320)
      - 2 for PTP over IPv6 (UDP ports 319 and 320)
      
      The cookie by which these filters (invisible to tc) are identified is
      strategically chosen such that it does not collide with the filters used
      for the ocelot-8021q tagging protocol by the Felix driver, or with the
      MRP traps set up by the Ocelot library.
      
      Other alternatives were considered, like patching user space to do
      something, but there are so many ways in which PTP packets could be made
      to reach the CPU, generically speaking, that "do what?" is a very valid
      question. The ptp4l program from the linuxptp stack already attempts to
      do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
      PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
      a dev_mc_add() on the interface, in the kernel:
      https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
      https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c
      
      Reality shows that this is not sufficient in case the interface belongs
      to a switchdev driver, as dev_mc_add() does not show the intention to
      trap a packet to the CPU, but rather the intention to not drop it (it is
      strictly for RX filtering, same as promiscuous does not mean to send all
      traffic to the CPU, but to not drop traffic with unknown MAC DA). This
      topic is a can of worms in itself, and it would be great if user space
      could just stay out of it.
      
      On the other hand, setting up PTP traps privately within the driver is
      not new by any stretch of the imagination:
      https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
      https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
      https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21
      
      So this is the approach taken here as well. The difference here being
      that we prepare and destroy the traps per port, dynamically at runtime,
      as opposed to driver init time, because apparently, PTP-unaware
      forwarding is a use case.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Reported-by: default avatarPo Liu <po.liu@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96ca08c0
    • Vladimir Oltean's avatar
      net: ptp: add a definition for the UDP port for IEEE 1588 general messages · ec15baec
      Vladimir Oltean authored
      As opposed to event messages (Sync, PdelayReq etc) which require
      timestamping, general messages (Announce, FollowUp etc) do not.
      In PTP they are part of different streams of data.
      
      IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
      destination port assigned by IANA is 319 for event messages, and 320 for
      general messages. Yet the kernel seems to be missing the definition for
      general messages. This patch adds it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec15baec
    • Vladimir Oltean's avatar
      net: mscc: ocelot: create a function that replaces an existing VCAP filter · 95706be1
      Vladimir Oltean authored
      VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
      tc flower offload on ocelot, among other things. The ingress port mask
      on which VCAP rules match is present as a bit field in the actual key of
      the rule. This means that it is possible for a rule to be shared among
      multiple source ports. When the rule is added one by one on each desired
      port, that the ingress port mask of the key must be edited and rewritten
      to hardware.
      
      But the API in ocelot_vcap.c does not allow for this. For one thing,
      ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
      because ocelot_vcap_filter_add() works with a preallocated and
      prepopulated filter and programs it to hardware, and
      ocelot_vcap_filter_del() does both the job of removing the specified
      filter from hardware, as well as kfreeing it. That is to say, the only
      option of editing a filter in place, which is to delete it, modify the
      structure and add it back, does not work because it results in
      use-after-free.
      
      This patch introduces ocelot_vcap_filter_replace, which trivially
      reprograms a VCAP entry to hardware, at the exact same index at which it
      existed before, without modifying any list or allocating any memory.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95706be1
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP · 8a075464
      Vladimir Oltean authored
      The ocelot driver, when asked to timestamp all receiving packets, 1588
      v1 or NTP, says "nah, here's 1588 v2 for you".
      
      According to this discussion:
      https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
      drivers that downgrade from a wider request to a narrower response (or
      even a response where the intersection with the request is empty) are
      buggy, and should return -ERANGE instead. This patch fixes that.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Suggested-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a075464
    • Jakub Kicinski's avatar
      Merge branch 'net-hns3-add-some-fixes-for-net' · b32e521e
      Jakub Kicinski authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add some fixes for -net
      
      This series adds some fixes for the HNS3 ethernet driver.
      ====================
      
      Link: https://lore.kernel.org/r/20211126120318.33921-1-huangguangbin2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b32e521e
    • Jie Wang's avatar
      net: hns3: fix incorrect components info of ethtool --reset command · 82229c4d
      Jie Wang authored
      Currently, HNS3 driver doesn't clear the reset flags of components after
      successfully executing reset, it causes userspace info of
      "Components reset" and "Components not reset" is incorrect.
      
      So fix this problem by clear corresponding reset flag after reset process.
      
      Fixes: ddccc5e3 ("net: hns3: add support for triggering reset by ethtool")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      82229c4d
    • Hao Chen's avatar
      net: hns3: fix one incorrect value of page pool info when queried by debugfs · 9c147917
      Hao Chen authored
      Currently, when user queries page pool info by debugfs command
      "cat page_pool_info", the cnt of allocated page for page pool may be
      incorrect because of memory inconsistency problem caused by compiler
      optimization.
      
      So this patch uses READ_ONCE() to read value of pages_state_hold_cnt to
      fix this problem.
      
      Fixes: 850bfb91 ("net: hns3: debugfs add support dumping page pool info")
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9c147917
    • Hao Chen's avatar
      net: hns3: add check NULL address for page pool · b8af344c
      Hao Chen authored
      When page pool is not enabled, its address value is still NULL and page
      pool should not be accessed, so add a check for it.
      
      Fixes: 850bfb91 ("net: hns3: debugfs add support dumping page pool info")
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b8af344c
    • Guangbin Huang's avatar
      net: hns3: fix VF RSS failed problem after PF enable multi-TCs · 8d2ad993
      Guangbin Huang authored
      When PF is set to multi-TCs and configured mapping relationship between
      priorities and TCs, the hardware will active these settings for this PF
      and its VFs.
      
      In this case when VF just uses one TC and its rx packets contain priority,
      and if the priority is not mapped to TC0, as other TCs of VF is not valid,
      hardware always put this kind of packets to the queue 0. It cause this kind
      of packets of VF can not be used RSS function.
      
      To fix this problem, set tc mode of all unused TCs of VF to the setting of
      TC0, then rx packet with priority which map to unused TC will be direct to
      TC0.
      
      Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d2ad993
    • zhangyue's avatar
      net: qed: fix the array may be out of bound · 0435a4d0
      zhangyue authored
      If the variable 'p_bit->flags' is always 0,
      the loop condition is always 0.
      
      The variable 'j' may be greater than or equal to 32.
      
      At this time, the array 'p_aeu->bits[32]' may be out
      of bound.
      Signed-off-by: default avatarzhangyue <zhangyue1@kylinos.cn>
      Link: https://lore.kernel.org/r/20211125113610.273841-1-zhangyue1@kylinos.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0435a4d0
    • Linus Torvalds's avatar
      Merge tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 7e635452
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "One more fix to the lzo code, a missing put_page causing memory leaks
        when some error branches are taken"
      
      * tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix the memory leak caused in lzo_compress_pages()
      7e635452
    • Tony Lu's avatar
      net/smc: Don't call clcsock shutdown twice when smc shutdown · bacb6c1e
      Tony Lu authored
      When applications call shutdown() with SHUT_RDWR in userspace,
      smc_close_active() calls kernel_sock_shutdown(), and it is called
      twice in smc_shutdown().
      
      This fixes this by checking sk_state before do clcsock shutdown, and
      avoids missing the application's call of smc_shutdown().
      
      Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/
      Fixes: 606a63c9 ("net/smc: Ensure the active closing peer first closes clcsock")
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Reviewed-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bacb6c1e
    • Ziyang Xuan's avatar
      net: vlan: fix underflow for the real_dev refcnt · 01d9cc2d
      Ziyang Xuan authored
      Inject error before dev_hold(real_dev) in register_vlan_dev(),
      and execute the following testcase:
      
      ip link add dev dummy1 type dummy
      ip link add name dummy1.100 link dummy1 type vlan id 100
      ip link del dev dummy1
      
      When the dummy netdevice is removed, we will get a WARNING as following:
      
      =======================================================================
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0
      
      and an endless loop of:
      
      =======================================================================
      unregister_netdevice: waiting for dummy1 to become free. Usage count = -1073741824
      
      That is because dev_put(real_dev) in vlan_dev_free() be called without
      dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev
      underflow.
      
      Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of
      ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev
      symmetrical.
      
      Fixes: 563bcbae ("net: vlan: fix a UAF in vlan_dev_real_dev()")
      Reported-by: default avatarPetr Machata <petrm@nvidia.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      01d9cc2d
    • Jakub Kicinski's avatar
      cbb91dcb
    • Julian Wiedmann's avatar
      ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce() · 0276af21
      Julian Wiedmann authored
      ethtool_set_coalesce() now uses both the .get_coalesce() and
      .set_coalesce() callbacks. But the check for their availability is
      buggy, so changing the coalesce settings on a device where the driver
      provides only _one_ of the callbacks results in a NULL pointer
      dereference instead of an -EOPNOTSUPP.
      
      Fix the condition so that the availability of both callbacks is
      ensured. This also matches the netlink code.
      
      Note that reproducing this requires some effort - it only affects the
      legacy ioctl path, and needs a specific combination of driver options:
      - have .get_coalesce() and .coalesce_supported but no
       .set_coalesce(), or
      - have .set_coalesce() but no .get_coalesce(). Here eg. ethtool doesn't
        cause the crash as it first attempts to call ethtool_get_coalesce()
        and bails out on error.
      
      Fixes: f3ccfda1 ("ethtool: extend coalesce setting uAPI with CQE mode")
      Cc: Yufeng Mo <moyufeng@huawei.com>
      Cc: Huazhong Tan <tanhuazhong@huawei.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Link: https://lore.kernel.org/r/20211126175543.28000-1-jwi@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0276af21
    • Thadeu Lima de Souza Cascardo's avatar
      nfc: virtual_ncidev: change default device permissions · c26381f9
      Thadeu Lima de Souza Cascardo authored
      Device permissions is S_IALLUGO, with many unnecessary bits. Remove them
      and also remove read and write permissions from group and others.
      
      Before the change:
      crwsrwsrwt    1 0        0          10, 125 Nov 25 13:59 /dev/virtual_nci
      
      After the change:
      crw-------    1 0        0          10, 125 Nov 25 14:05 /dev/virtual_nci
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Reviewed-by: default avatarBongsu Jeon <bongsu.jeon@samsung.com>
      Link: https://lore.kernel.org/r/20211125141457.716921-1-cascardo@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c26381f9
    • Davide Caratti's avatar
      net/sched: sch_ets: don't peek at classes beyond 'nbands' · de6d2592
      Davide Caratti authored
      when the number of DRR classes decreases, the round-robin active list can
      contain elements that have already been freed in ets_qdisc_change(). As a
      consequence, it's possible to see a NULL dereference crash, caused by the
      attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000018
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets]
       Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d
       RSP: 0000:ffffbb36c0b5fdd8 EFLAGS: 00010287
       RAX: ffff956678efed30 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000002 RSI: ffffffff9b938dc9 RDI: 0000000000000000
       RBP: ffff956678efed30 R08: e2f3207fe360129c R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000001 R12: ffff956678efeac0
       R13: ffff956678efe800 R14: ffff956611545000 R15: ffff95667ac8f100
       FS:  00007f2aa9120740(0000) GS:ffff95667b800000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000018 CR3: 000000011070c000 CR4: 0000000000350ee0
       Call Trace:
        <TASK>
        qdisc_peek_dequeued+0x29/0x70 [sch_ets]
        tbf_dequeue+0x22/0x260 [sch_tbf]
        __qdisc_run+0x7f/0x630
        net_tx_action+0x290/0x4c0
        __do_softirq+0xee/0x4f8
        irq_exit_rcu+0xf4/0x130
        sysvec_apic_timer_interrupt+0x52/0xc0
        asm_sysvec_apic_timer_interrupt+0x12/0x20
       RIP: 0033:0x7f2aa7fc9ad4
       Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00
       RSP: 002b:00007ffe5d33fab8 EFLAGS: 00000202
       RAX: 0000000000000002 RBX: 0000561f72c31460 RCX: 0000561f72c31720
       RDX: 0000000000000002 RSI: 0000561f72c31722 RDI: 0000561f72c31720
       RBP: 000000000000002a R08: 00007ffe5d33fa40 R09: 0000000000000014
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000561f7187e380
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000561f72c31460
        </TASK>
       Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
       CR2: 0000000000000018
      
      Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to
      remove from the active list those elements that are no more SP nor DRR.
      
      [1] https://lore.kernel.org/netdev/60d274838bf09777f0371253416e8af71360bc08.1633609148.git.dcaratti@redhat.com/
      
      v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting
          DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock
          acquired, thanks to Cong Wang.
      
      v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb
          from the next list item.
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Fixes: dcc68b4d ("net: sch_ets: Add a new Qdisc")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Link: https://lore.kernel.org/r/7a5c496eed2d62241620bdbb83eb03fb9d571c99.1637762721.git.dcaratti@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      de6d2592
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-properties' · 2e13e5ae
      Rafael J. Wysocki authored
      Merge fix and cleanup related to the management of ACPI device
      properties for 5.16-rc3.
      
      * acpi-properties:
        ACPI: Make acpi_node_get_parent() local
        ACPI: Get acpi_device's parent from the parent field
      2e13e5ae