1. 01 Oct, 2023 7 commits
    • Eric Dumazet's avatar
      net_sched: sch_fq: change how @inactive is tracked · ee9af4e1
      Eric Dumazet authored
      Currently, when one fq qdisc has no more packets to send, it can still
      have some flows stored in its RR lists (q->new_flows & q->old_flows)
      
      This was a design choice, but what is a bit disturbing is that
      the inactive_flows counter does not include the count of empty flows
      in RR lists.
      
      As next patch needs to know better if there are active flows,
      this change makes inactive_flows exact.
      
      Before the patch, following command on an empty qdisc could have returned:
      
      lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
        flows 1322 (inactive 1316 throttled 0)
        flows 1330 (inactive 1325 throttled 0)
        flows 1193 (inactive 1190 throttled 0)
        flows 1208 (inactive 1202 throttled 0)
      
      After the patch, we now have:
      
      lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
        flows 1322 (inactive 1322 throttled 0)
        flows 1330 (inactive 1330 throttled 0)
        flows 1193 (inactive 1193 throttled 0)
        flows 1208 (inactive 1208 throttled 0)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee9af4e1
    • Eric Dumazet's avatar
      net_sched: sch_fq: struct sched_data reorg · 54ff8ad6
      Eric Dumazet authored
      q->flows can be often modified, and q->timer_slack is read mostly.
      
      Exchange the two fields, so that cache line countaining
      quantum, initial_quantum, and other critical parameters
      stay clean (read-mostly).
      
      Move q->watchdog next to q->stat_throttled
      
      Add comments explaining how the structure is split in
      three different parts.
      
      pahole output before the patch:
      
      struct fq_sched_data {
      	struct fq_flow_head        new_flows;            /*     0  0x10 */
      	struct fq_flow_head        old_flows;            /*  0x10  0x10 */
      	struct rb_root             delayed;              /*  0x20   0x8 */
      	u64                        time_next_delayed_flow; /*  0x28   0x8 */
      	u64                        ktime_cache;          /*  0x30   0x8 */
      	unsigned long              unthrottle_latency_ns; /*  0x38   0x8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct fq_flow             internal __attribute__((__aligned__(64))); /*  0x40  0x80 */
      
      	/* XXX last struct has 16 bytes of padding */
      
      	/* --- cacheline 3 boundary (192 bytes) --- */
      	u32                        quantum;              /*  0xc0   0x4 */
      	u32                        initial_quantum;      /*  0xc4   0x4 */
      	u32                        flow_refill_delay;    /*  0xc8   0x4 */
      	u32                        flow_plimit;          /*  0xcc   0x4 */
      	unsigned long              flow_max_rate;        /*  0xd0   0x8 */
      	u64                        ce_threshold;         /*  0xd8   0x8 */
      	u64                        horizon;              /*  0xe0   0x8 */
      	u32                        orphan_mask;          /*  0xe8   0x4 */
      	u32                        low_rate_threshold;   /*  0xec   0x4 */
      	struct rb_root *           fq_root;              /*  0xf0   0x8 */
      	u8                         rate_enable;          /*  0xf8   0x1 */
      	u8                         fq_trees_log;         /*  0xf9   0x1 */
      	u8                         horizon_drop;         /*  0xfa   0x1 */
      
      	/* XXX 1 byte hole, try to pack */
      
      <bad>	u32                        flows;                /*  0xfc   0x4 */
      	/* --- cacheline 4 boundary (256 bytes) --- */
      	u32                        inactive_flows;       /* 0x100   0x4 */
      	u32                        throttled_flows;      /* 0x104   0x4 */
      	u64                        stat_gc_flows;        /* 0x108   0x8 */
      	u64                        stat_internal_packets; /* 0x110   0x8 */
      	u64                        stat_throttled;       /* 0x118   0x8 */
      	u64                        stat_ce_mark;         /* 0x120   0x8 */
      	u64                        stat_horizon_drops;   /* 0x128   0x8 */
      	u64                        stat_horizon_caps;    /* 0x130   0x8 */
      	u64                        stat_flows_plimit;    /* 0x138   0x8 */
      	/* --- cacheline 5 boundary (320 bytes) --- */
      	u64                        stat_pkts_too_long;   /* 0x140   0x8 */
      	u64                        stat_allocation_errors; /* 0x148   0x8 */
      <bad>	u32                        timer_slack;          /* 0x150   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct qdisc_watchdog      watchdog;             /* 0x158  0x48 */
      
      	/* size: 448, cachelines: 7, members: 34 */
      	/* sum members: 411, holes: 2, sum holes: 5 */
      	/* padding: 32 */
      	/* paddings: 1, sum paddings: 16 */
      	/* forced alignments: 1 */
      };
      
      pahole output after the patch:
      
      struct fq_sched_data {
      	struct fq_flow_head        new_flows;            /*     0  0x10 */
      	struct fq_flow_head        old_flows;            /*  0x10  0x10 */
      	struct rb_root             delayed;              /*  0x20   0x8 */
      	u64                        time_next_delayed_flow; /*  0x28   0x8 */
      	u64                        ktime_cache;          /*  0x30   0x8 */
      	unsigned long              unthrottle_latency_ns; /*  0x38   0x8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct fq_flow             internal __attribute__((__aligned__(64))); /*  0x40  0x80 */
      
      	/* XXX last struct has 16 bytes of padding */
      
      	/* --- cacheline 3 boundary (192 bytes) --- */
      	u32                        quantum;              /*  0xc0   0x4 */
      	u32                        initial_quantum;      /*  0xc4   0x4 */
      	u32                        flow_refill_delay;    /*  0xc8   0x4 */
      	u32                        flow_plimit;          /*  0xcc   0x4 */
      	unsigned long              flow_max_rate;        /*  0xd0   0x8 */
      	u64                        ce_threshold;         /*  0xd8   0x8 */
      	u64                        horizon;              /*  0xe0   0x8 */
      	u32                        orphan_mask;          /*  0xe8   0x4 */
      	u32                        low_rate_threshold;   /*  0xec   0x4 */
      	struct rb_root *           fq_root;              /*  0xf0   0x8 */
      	u8                         rate_enable;          /*  0xf8   0x1 */
      	u8                         fq_trees_log;         /*  0xf9   0x1 */
      	u8                         horizon_drop;         /*  0xfa   0x1 */
      
      	/* XXX 1 byte hole, try to pack */
      
      <good>	u32                        timer_slack;          /*  0xfc   0x4 */
      	/* --- cacheline 4 boundary (256 bytes) --- */
      <good>	u32                        flows;                /* 0x100   0x4 */
      	u32                        inactive_flows;       /* 0x104   0x4 */
      	u32                        throttled_flows;      /* 0x108   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	u64                        stat_throttled;       /* 0x110   0x8 */
      <better> struct qdisc_watchdog     watchdog;             /* 0x118  0x48 */
      	/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
      	u64                        stat_gc_flows;        /* 0x160   0x8 */
      	u64                        stat_internal_packets; /* 0x168   0x8 */
      	u64                        stat_ce_mark;         /* 0x170   0x8 */
      	u64                        stat_horizon_drops;   /* 0x178   0x8 */
      	/* --- cacheline 6 boundary (384 bytes) --- */
      	u64                        stat_horizon_caps;    /* 0x180   0x8 */
      	u64                        stat_flows_plimit;    /* 0x188   0x8 */
      	u64                        stat_pkts_too_long;   /* 0x190   0x8 */
      	u64                        stat_allocation_errors; /* 0x198   0x8 */
      
      	/* Force padding: */
      	u64                        :64;
      	u64                        :64;
      	u64                        :64;
      	u64                        :64;
      
      	/* size: 448, cachelines: 7, members: 34 */
      	/* sum members: 411, holes: 2, sum holes: 5 */
      	/* padding: 32 */
      	/* paddings: 1, sum paddings: 16 */
      	/* forced alignments: 1 */
      };
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54ff8ad6
    • Eric Dumazet's avatar
      net_sched: constify qdisc_priv() · 1add9073
      Eric Dumazet authored
      In order to propagate const qualifiers, we change qdisc_priv()
      to accept a possibly const argument.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1add9073
    • David S. Miller's avatar
      Merge branch 'tcp_delack_max' · 66ac08a7
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: add tcp_delack_max()
      
      First patches are adding const qualifiers to four existing helpers.
      
      Third patch adds a much needed companion feature to RTAX_RTO_MIN.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66ac08a7
    • Eric Dumazet's avatar
      tcp: derive delack_max from rto_min · bbf80d71
      Eric Dumazet authored
      While BPF allows to set icsk->->icsk_delack_max
      and/or icsk->icsk_rto_min, we have an ip route
      attribute (RTAX_RTO_MIN) to be able to tune rto_min,
      but nothing to consequently adjust max delayed ack,
      which vary from 40ms to 200 ms (TCP_DELACK_{MIN|MAX}).
      
      This makes RTAX_RTO_MIN of almost no practical use,
      unless customers are in big trouble.
      
      Modern days datacenter communications want to set
      rto_min to ~5 ms, and the max delayed ack one jiffie
      smaller to avoid spurious retransmits.
      
      After this patch, an "rto_min 5" route attribute will
      effectively lower max delayed ack timers to 4 ms.
      
      Note in the following ss output, "rto:6 ... ato:4"
      
      $ ss -temoi dst XXXXXX
      State Recv-Q Send-Q           Local Address:Port       Peer Address:Port  Process
      ESTAB 0      0        [2002:a05:6608:295::]:52950   [2002:a05:6608:297::]:41597
           ino:255134 sk:1001 <->
               skmem:(r0,rb1707063,t872,tb262144,f0,w0,o0,bl0,d0) ts sack
       cubic wscale:8,8 rto:6 rtt:0.02/0.002 ato:4 mss:4096 pmtu:4500
       rcvmss:536 advmss:4096 cwnd:10 bytes_sent:54823160 bytes_acked:54823121
       bytes_received:54823120 segs_out:1370582 segs_in:1370580
       data_segs_out:1370579 data_segs_in:1370578 send 16.4Gbps
       pacing_rate 32.6Gbps delivery_rate 1.72Gbps delivered:1370579
       busy:26920ms unacked:1 rcv_rtt:34.615 rcv_space:65920
       rcv_ssthresh:65535 minrtt:0.015 snd_wnd:65536
      
      While we could argue this patch fixes a bug with RTAX_RTO_MIN,
      I do not add a Fixes: tag, so that we can soak it a bit before
      asking backports to stable branches.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbf80d71
    • Eric Dumazet's avatar
      tcp: constify tcp_rto_min() and tcp_rto_min_us() argument · f68a181f
      Eric Dumazet authored
      Make clear these functions do not change any field from TCP socket.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f68a181f
    • Eric Dumazet's avatar
      net: constify sk_dst_get() and __sk_dst_get() argument · 5033f58d
      Eric Dumazet authored
      Both helpers only read fields from their socket argument.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5033f58d
  2. 30 Sep, 2023 1 commit
  3. 28 Sep, 2023 11 commits
  4. 22 Sep, 2023 5 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-multicast' · 5a1b322c
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: Improve blocks selection for IPv6 multicast forwarding
      
      Amit Cohen writes:
      
      The driver configures two ACL regions during initialization, these regions
      are used for IPv4 and IPv6 multicast forwarding. Entries residing in these
      two regions match on the {SIP, DIP, VRID} key elements.
      
      Currently for IPv6 region, 9 key blocks are used. This can be improved by
      reducing the amount key blocks needed for the IPv6 region to 8. It is
      possible to use key blocks that mix subsets of the VRID element with
      subsets of the DIP element.
      
      To make this happen, we have to take in account the algorithm that chooses
      which key blocks will be used. It is lazy and not the optimal one as it is
      a complex task. It searches the block that contains the most elements that
      are required, chooses it, removes the elements that appear in the chosen
      block and starts again searching the block that contains the most elements.
      
      To optimize the nubmber of the blocks for IPv6 multicast forwarding, handle
      the following:
      
      1. Add support for key blocks that mix subsets of the VRID element with
      subsets of the DIP element.
      
      2. Prevent the algorithm from chosing another blocks for VRID.
      Currently, we have the block 'ipv4_4' which contains 2 sub-elements of
      VRID. With the existing algorithm, this block might be chosen, then 8
      blocks must be chosen for SIP and DIP and we will get 9 blocks to match on
      {SIP, DIP, VRID}. Therefore, replace this block with a new block 'ipv4_5'
      that contains 1 element for VRID, this will not be chosen for IPv6 as VRID
      element will be broken to several sub-elements. In this way we can get 8
      blocks for IPv6 multicast forwarding.
      
      This improvement was tested and indeed 8 blocks are used instead of 9.
      
      v2:
      - Resending without changes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a1b322c
    • Amit Cohen's avatar
      mlxsw: Edit IPv6 key blocks to use one less block for multicast forwarding · 92953e7a
      Amit Cohen authored
      Two ACL regions that are configured by the driver during initialization are
      the ones used for IPv4 and IPv6 multicast forwarding. Entries residing
      in these two regions match on the {SIP, DIP, VRID} key elements.
      
      Currently for IPv6 region, 9 key blocks are used:
      * 4 for SIP - 'ipv4_1', 'ipv6_{3,4,5}'
      * 4 for DIP - 'ipv4_0', 'ipv6_{0,1,2/2b}'
      * 1 for VRID - 'ipv4_4b'
      
      This can be improved by reducing the amount key blocks needed for
      the IPv6 region to 8. It is possible to use key blocks that mix subsets of
      the VRID element with subsets of the DIP element.
      The following key blocks can be used:
      * 4 for SIP - 'ipv4_1', 'ipv6_{3,4,5}'
      * 1 for subset of DIP - 'ipv4_0'
      * 3 for the rest of DIP and subsets of VRID - 'ipv6_{0,1,2/2b}'
      
      To make this happen, add VRID sub-elements as part of existing keys -
      'ipv6_{0,1,2/2b}'. Note that one of the sub-elements is called
      VRID_ROUTER_MSB and does not contain bit numbers like the rest, as for
      Spectrum < 4 this element represents bits 8-10 and for Spectrum-4 it
      represents bits 8-11.
      
      Breaking VRID into 3 sub-elements makes the driver use one less block in
      IPv6 region for multicast forwarding. The sub-elements can be filled in
      blocks that are used for destination IP.
      
      The algorithm in the driver that chooses which key blocks will be used is
      lazy and not the optimal one. It searches the block that contains the most
      elements that are required, chooses it, removes the elements that appear
      in the chosen block and starts again searching the block that contains the
      most elements.
      
      When key block 'ipv4_4' is defined, the algorithm might choose it, as it
      contains 2 sub-elements of VRID, then 8 blocks must be chosen for SIP and
      DIP and we get 9 blocks to match on {SIP, DIP, VRID}. That is why we had to
      remove key block 'ipv4_4' in a previous patch and use key block that
      contains one field for VRID.
      
      This improvement was tested and indeed 8 blocks are used instead of 9.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92953e7a
    • Amit Cohen's avatar
      mlxsw: spectrum_acl_flex_keys: Add 'ipv4_5b' flex key · c6caabdf
      Amit Cohen authored
      The previous patch replaced the key block 'ipv4_4' with 'ipv4_5'. The
      corresponding block for Spectrum-4 is 'ipv4_4b'. To be consistent, replace
      key block 'ipv4_4b' with 'ipv4_5b'.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6caabdf
    • Amit Cohen's avatar
      mlxsw: Add 'ipv4_5' flex key · c2f3e10a
      Amit Cohen authored
      Currently virtual router ID element is broken to two sub-elements -
      'VIRT_ROUTER_LSB' and 'VIRT_ROUTER_MSB'. It was broken as this field is
      broken in 'ipv4_4' flex key which is used for IPv4 in Spectrum < 4.
      For Spectrum-4, we use 'ipv4_4b' flex key which contains one field for
      virtual router, this key is not supported in older ASICs.
      
      Add 'ipv4_5' flex key which is supported in all ASICs and contains one
      field for virtual router. Then there is no reason to use 'VIRT_ROUTER_LSB'
      and 'VIRT_ROUTER_MSB', remove them and add one element 'VIRT_ROUTER' for
      this field.
      
      The motivation is to get rid of 'ipv4_4' flex key, as it might be chosen
      for IPv6 multicast forwarding region. This will not allow the improvement
      in a following patch. See more details in the cover letter and in a
      following patch.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2f3e10a
    • Peter Lafreniere's avatar
      hamradio: baycom: remove useless link in Kconfig · 84c19e65
      Peter Lafreniere authored
      The Kconfig help text for baycom drivers suggests that more information
      on the hardware can be found at <https://www.baycom.de>. The website now
      includes no information on their ham radio products other than a mention
      that they were once produced by the company, saying:
      "The amateur radio equipment is now no longer part and business of BayCom GmbH"
      
      As there is no information relavent to the baycom driver on the site,
      remove the link.
      Signed-off-by: default avatarPeter Lafreniere <peter@n8pjl.ca>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84c19e65
  5. 21 Sep, 2023 16 commits
    • Paolo Abeni's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · e9cbc890
      Paolo Abeni authored
      Cross-merge networking fixes after downstream PR.
      
      No conflicts.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e9cbc890
    • Linus Torvalds's avatar
      Merge tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 27bbf45e
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter and bpf.
      
        Current release - regressions:
      
         - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE
      
         - netfilter: fix entries val in rule reset audit log
      
         - eth: stmmac: fix incorrect rxq|txq_stats reference
      
        Previous releases - regressions:
      
         - ipv4: fix null-deref in ipv4_link_failure
      
         - netfilter:
            - fix several GC related issues
            - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
      
         - eth: team: fix null-ptr-deref when team device type is changed
      
         - eth: i40e: fix VF VLAN offloading when port VLAN is configured
      
         - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB
      
        Previous releases - always broken:
      
         - core: fix ETH_P_1588 flow dissector
      
         - mptcp: fix several connection hang-up conditions
      
         - bpf:
            - avoid deadlock when using queue and stack maps from NMI
            - add override check to kprobe multi link attach
      
         - hsr: properly parse HSRv1 supervisor frames.
      
         - eth: igc: fix infinite initialization loop with early XDP redirect
      
         - eth: octeon_ep: fix tx dma unmap len values in SG
      
         - eth: hns3: fix GRE checksum offload issue"
      
      * tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
        sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
        igc: Expose tx-usecs coalesce setting to user
        octeontx2-pf: Do xdp_do_flush() after redirects.
        bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
        net: ena: Flush XDP packets on error.
        net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
        net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev'
        netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
        netfilter: nf_tables: fix memleak when more than 255 elements expired
        netfilter: nf_tables: disable toggling dormant table state more than once
        vxlan: Add missing entries to vxlan_get_size()
        net: rds: Fix possible NULL-pointer dereference
        team: fix null-ptr-deref when team device type is changed
        net: bridge: use DEV_STATS_INC()
        net: hns3: add 5ms delay before clear firmware reset irq source
        net: hns3: fix fail to delete tc flower rules during reset issue
        net: hns3: only enable unicast promisc when mac table full
        net: hns3: fix GRE checksum offload issue
        net: hns3: add cmdq check for vf periodic service task
        net: stmmac: fix incorrect rxq|txq_stats reference
        ...
      27bbf45e
    • Linus Torvalds's avatar
      Merge tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · b5cbe7c0
      Linus Torvalds authored
      Pull finegrained timestamp reverts from Christian Brauner:
       "Earlier this week we sent a few minor fixes for the multi-grained
        timestamp work in [1]. While we were polishing those up after Linus
        realized that there might be a nicer way to fix them we received a
        regression report in [2] that fine grained timestamps break gnulib
        tests and thus possibly other tools.
      
        The kernel will elide fine-grain timestamp updates when no one is
        actively querying for them to avoid performance impacts. So a sequence
        like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may
        result in timestamp f1 to be older than the final f2 timestamp even
        though f1 was last written too but the second write didn't update the
        timestamp.
      
        Such plotholes can lead to subtle bugs when programs compare
        timestamps. For example, the nap() function in [2] will estimate that
        it needs to wait one ns on a fine-grain timestamp enabled filesytem
        between subsequent calls to observe a timestamp change. But in general
        we don't update timestamps with more than one jiffie if we think that
        no one is actively querying for fine-grain timestamps to avoid
        performance impacts.
      
        While discussing various fixes the decision was to go back to the
        drawing board and ultimately to explore a solution that involves only
        exposing such fine-grained timestamps to nfs internally and never to
        userspace.
      
        As there are multiple solutions discussed the honest thing to do here
        is not to fix this up or disable it but to cleanly revert. The general
        infrastructure will probably come back but there is no reason to keep
        this code in mainline.
      
        The general changes to timestamp handling are valid and a good cleanup
        that will stay. The revert is fully bisectable"
      
      Link: https://lore.kernel.org/all/20230918-hirte-neuzugang-4c2324e7bae3@brauner [1]
      Link: https://lore.kernel.org/all/bf0524debb976627693e12ad23690094e4514303.camel@linuxfromscratch.org [2]
      
      * tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        Revert "fs: add infrastructure for multigrain timestamps"
        Revert "btrfs: convert to multigrain timestamps"
        Revert "ext4: switch to multigrain timestamps"
        Revert "xfs: switch to multigrain timestamps"
        Revert "tmpfs: add support for multigrain timestamps"
      b5cbe7c0
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7bdfc1af
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - A fix for breakpoint handling which was using get_user() while atomic
      
       - Fix the Power10 HASHCHK handler which was using get_user() while
         atomic
      
       - A few build fixes for issues caused by recent changes
      
      Thanks to Benjamin Gray, Christophe Leroy, Kajol Jain, and Naveen N Rao.
      
      * tag 'powerpc-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/dexcr: Move HASHCHK trap handler
        powerpc/82xx: Select FSL_SOC
        powerpc: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION and FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
        powerpc/watchpoints: Annotate atomic context in more places
        powerpc/watchpoint: Disable pagefaults when getting user instruction
        powerpc/watchpoints: Disable preemption in thread_change_pc()
        powerpc/perf/hv-24x7: Update domain value check
      7bdfc1af
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.6a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 88a174a9
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - remove some unused functions in the Xen event channel handling
      
       - fix a regression (introduced during the merge window) when booting as
         Xen PV guest
      
       - small cleanup removing another strncpy() instance
      
      * tag 'for-linus-6.6a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/efi: refactor deprecated strncpy
        x86/xen: allow nesting of same lazy mode
        x86/xen: move paravirt lazy code
        arm/xen: remove lazy mode related definitions
        xen: simplify evtchn_do_upcall() call maze
      88a174a9
    • Linus Torvalds's avatar
      Merge tag 'fixes-2023-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · fb8b1b93
      Linus Torvalds authored
      Pull memblock test fixes from Mike Rapoport:
       "Fix several compilation errors and warnings in memblock tests"
      
      * tag 'fixes-2023-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
        memblock tests: fix warning ‘struct seq_file’ declared inside parameter list
        memblock tests: fix warning: "__ALIGN_KERNEL" redefined
        memblock tests: Fix compilation errors.
      fb8b1b93
    • Linus Torvalds's avatar
      Merge tag 'sound-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 2af5acba
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A large collection of fixes around this time.
      
        All small and mostly trivial fixes.
      
         - Lots of fixes for the new -Wformat-truncation warnings
      
         - A fix in ALSA rawmidi core regression and UMP handling
      
         - Series of Cirrus codec fixes
      
         - ASoC Intel and Realtek codec fixes
      
         - Usual HD- and USB-audio quirks and AMD ASoC quirks"
      
      * tag 'sound-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (64 commits)
        ALSA: hda/realtek - ALC287 Realtek I2S speaker platform support
        ALSA: hda: cs35l56: Use the new RUNTIME_PM_OPS() macro
        ALSA: usb-audio: scarlett_gen2: Fix another -Wformat-truncation warning
        ALSA: rawmidi: Fix NULL dereference at proc read
        ASoC: SOF: core: Only call sof_ops_free() on remove if the probe was successful
        ASoC: SOF: Intel: MTL: Reduce the DSP init timeout
        ASoC: cs42l43: Add shared IRQ flag for shutters
        ASoC: imx-audmix: Fix return error with devm_clk_get()
        ASoC: hdaudio.c: Add missing check for devm_kstrdup
        ALSA: riptide: Fix -Wformat-truncation warning for longname string
        ALSA: cs4231: Fix -Wformat-truncation warning for longname string
        ALSA: ad1848: Fix -Wformat-truncation warning for longname string
        ALSA: hda: generic: Check potential mixer name string truncation
        ALSA: cmipci: Fix -Wformat-truncation warning
        ALSA: firewire: Fix -Wformat-truncation warning for MIDI stream names
        ALSA: firewire: Fix -Wformat-truncation warning for longname string
        ALSA: xen: Fix -Wformat-truncation warning
        ALSA: opti9x: Fix -Wformat-truncation warning
        ALSA: es1688: Fix -Wformat-truncation warning
        ALSA: cs4236: Fix -Wformat-truncation warning
        ...
      2af5acba
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v6.6-rc3' of... · b300c0fd
      Linus Torvalds authored
      Merge tag 'hwmon-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fix from Guenter Roeck:
       "One patch to drop a non-existent alarm attribute in the nct6775 driver"
      
      * tag 'hwmon-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (nct6775) Fix non-existent ALARM warning
      b300c0fd
    • Colin Ian King's avatar
      net: dsa: sja1105: make read-only const arrays static · f30e5323
      Colin Ian King authored
      Don't populate read-only const arrays on the stack, instead make them
      static.
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230919093606.24446-1-colin.i.king@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f30e5323
    • Yang Li's avatar
    • Paolo Abeni's avatar
      Merge branch 'vsock-virtio-vhost-msg_zerocopy-preparations' · 71b263e7
      Paolo Abeni authored
      Arseniy Krasnov says:
      
      ====================
      vsock/virtio/vhost: MSG_ZEROCOPY preparations
      
      this patchset is first of three parts of another big patchset for
      MSG_ZEROCOPY flag support:
      https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
      
      During review of this series, Stefano Garzarella <sgarzare@redhat.com>
      suggested to split it for three parts to simplify review and merging:
      
      1) virtio and vhost updates (for fragged skbs) <--- this patchset
      2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
         tx completions) and update for Documentation/.
      3) Updates for tests and utils.
      
      This series enables handling of fragged skbs in virtio and vhost parts.
      Newly logic won't be triggered, because SO_ZEROCOPY options is still
      impossible to enable at this moment (next bunch of patches from big
      set above will enable it).
      ====================
      
      Link: https://lore.kernel.org/r/20230916130918.4105122-1-avkrasnov@salutedevices.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      71b263e7
    • Arseniy Krasnov's avatar
      vsock/virtio: MSG_ZEROCOPY flag support · 581512a6
      Arseniy Krasnov authored
      This adds handling of MSG_ZEROCOPY flag on transmission path:
      
      1) If this flag is set and zerocopy transmission is possible (enabled
         in socket options and transport allows zerocopy), then non-linear
         skb will be created and filled with the pages of user's buffer.
         Pages of user's buffer are locked in memory by 'get_user_pages()'.
      2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
         calls 'skb_set_owner_w()'. Reason of this change is that
         '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
         to decrease this field correctly, proper skb destructor is needed:
         'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
      3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
         If this callback is set, then transport needs extra check to be able
         to send provided number of buffers in zerocopy mode. Currently, the
         only transport that needs this callback set is virtio, because this
         transport adds new buffers to the virtio queue and we need to check,
         that number of these buffers is less than size of the queue (it is
         required by virtio spec). vhost and loopback transports don't need
         this check.
      Signed-off-by: default avatarArseniy Krasnov <avkrasnov@salutedevices.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      581512a6
    • Arseniy Krasnov's avatar
      vsock/virtio: non-linear skb handling for tap · 4b0bf10e
      Arseniy Krasnov authored
      For tap device new skb is created and data from the current skb is
      copied to it. This adds copying data from non-linear skb to new
      the skb.
      Signed-off-by: default avatarArseniy Krasnov <avkrasnov@salutedevices.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4b0bf10e
    • Arseniy Krasnov's avatar
      vsock/virtio: support to send non-linear skb · 64c99d2d
      Arseniy Krasnov authored
      For non-linear skb use its pages from fragment array as buffers in
      virtio tx queue. These pages are already pinned by 'get_user_pages()'
      during such skb creation.
      Signed-off-by: default avatarArseniy Krasnov <avkrasnov@salutedevices.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      64c99d2d
    • Arseniy Krasnov's avatar
      vsock/virtio/vhost: read data from non-linear skb · 0df7cd3c
      Arseniy Krasnov authored
      This is preparation patch for MSG_ZEROCOPY support. It adds handling of
      non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
      'skb_copy_datagram_iter()'. Main advantage of the second one is that it
      can handle paged part of the skb by using 'kmap()' on each page, but if
      there are no pages in the skb, it behaves like simple copying to iov
      iterator. This patch also adds new field to the control block of skb -
      this value shows current offset in the skb to read next portion of data
      (it doesn't matter linear it or not). Idea behind this field is that
      'skb_copy_datagram_iter()' handles both types of skb internally - it
      just needs an offset from which to copy data from the given skb. This
      offset is incremented on each read from skb. This approach allows to
      simplify handling of both linear and non-linear skbs, because for
      linear skb we need to call 'skb_pull()' after reading data from it,
      while in non-linear case we need to update 'data_len'.
      Signed-off-by: default avatarArseniy Krasnov <avkrasnov@salutedevices.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0df7cd3c
    • Paolo Abeni's avatar
      Merge tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · ecf43926
      Paolo Abeni authored
      Florian Westphal says:
      
      ====================
      netfilter updates for net
      
      The following three patches fix regressions in the netfilter subsystem:
      
      1. Reject attempts to repeatedly toggle the 'dormant' flag in a single
         transaction.  Doing so makes nf_tables lose track of the real state
         vs. the desired state.  This ends with an attempt to unregister hooks
         that were never registered in the first place, which yields a splat.
      
      2. Fix element counting in the new nftables garbage collection infra
         that came with 6.5:  More than 255 expired elements wraps a counter
         which results in memory leak.
      
      3. Since 6.4 ipset can BUG when a set is renamed while a CREATE command
         is in progress, fix from Jozsef Kadlecsik.
      
      * tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
        netfilter: nf_tables: fix memleak when more than 255 elements expired
        netfilter: nf_tables: disable toggling dormant table state more than once
      ====================
      
      Link: https://lore.kernel.org/r/20230920084156.4192-1-fw@strlen.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ecf43926