1. 27 Jul, 2021 19 commits
    • Arnd Bergmann's avatar
      phonet: use siocdevprivate · 4747c1a8
      Arnd Bergmann authored
      phonet has a single private ioctl that is broken in compat
      mode on big-endian machines today because the data returned
      from it is never copied back to user space.
      
      Move it over to the ndo_siocdevprivate callback, which also
      fixes the compat issue.
      
      Cc: Remi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarRémi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4747c1a8
    • Arnd Bergmann's avatar
      bridge: use ndo_siocdevprivate · 561d8352
      Arnd Bergmann authored
      The bridge driver has an old set of ioctls using the SIOCDEVPRIVATE
      namespace that have never worked in compat mode and are explicitly
      forbidden already.
      
      Move them over to ndo_siocdevprivate and fix compat mode for these,
      because we can.
      
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Cc: bridge@lists.linux-foundation.org
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      561d8352
    • Arnd Bergmann's avatar
      hostap: use ndo_siocdevprivate · 3f3fa534
      Arnd Bergmann authored
      hostap has a combination of iwpriv ioctls that do not work at
      all, and two SIOCDEVPRIVATE commands that work natively but
      lack a compat conversion handler.
      
      For the moment, move them over to the new ndo_siocdevprivate
      interface and return an error for compat mode.
      
      Cc: Jouni Malinen <j@w1.fi>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f3fa534
    • Arnd Bergmann's avatar
      staging: wlan-ng: use siocdevprivate · 3343c49a
      Arnd Bergmann authored
      wlan-ng has two private ioctls that correctly work in compat
      mode. Move these over to the new ndo_siocdevprivate mechanism.
      
      The p80211netdev_ethtool() function is commented out and
      has no use here, so this can be removed
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3343c49a
    • Arnd Bergmann's avatar
      staging: rtlwifi: use siocdevprivate · 89939e89
      Arnd Bergmann authored
      rtl8188eu has an "android private" ioctl command multiplexer
      that is not currently safe for use in compat mode because
      of its triple-indirect pointer.
      
      rtl8723bs uses a different interface on the SIOCDEVPRIVATE
      command, based on the iwpriv data structure
      
      Both also have normal unreachable iwpriv commands, and all
      of the above should probably just get removed. For the
      moment, just switch over to the new interface.
      
      Cc: Larry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89939e89
    • Arnd Bergmann's avatar
      net: split out SIOCDEVPRIVATE handling from dev_ioctl · b9067f5d
      Arnd Bergmann authored
      SIOCDEVPRIVATE ioctl commands are mainly used in really old
      drivers, and they have a number of problems:
      
      - They hide behind the normal .ndo_do_ioctl function that
        is also used for other things in modern drivers, so it's
        hard to spot a driver that actually uses one of these
      
      - Since drivers use a number different calling conventions,
        it is impossible to support compat mode for them in
        a generic way.
      
      - With all drivers using the same 16 commands codes, there
        is no way to introspect the data being passed through
        things like strace.
      
      Add a new net_device_ops callback pointer, to address the
      first two of these. Separating them from .ndo_do_ioctl
      makes it easy to grep for drivers with a .ndo_siocdevprivate
      callback, and the unwieldy name hopefully makes it easier
      to spot in code review.
      
      By passing the ifreq structure and the ifr_data pointer
      separately, it is no longer necessary to overload these,
      and the driver can use either one for a given command.
      
      Cc: Cong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9067f5d
    • David S. Miller's avatar
      Merge branch 'tcp-rack' · 2fba2eae
      David S. Miller authored
      Neal Cardwell says:
      
      ====================
      more accurate DSACK processing for RACK-TLP
      
      This patch series includes two minor improvements to tighten up the accuracy of
      the processing of incoming DSACK information, so that RACK-TLP behavior is
      faster and more precise: first, to ensure we detect packet loss in some extra
      corner cases; and second, to avoid growing the RACK reordering window (and
      delaying fast recovery) in cases where it seems clear we don't need to.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fba2eae
    • Neal Cardwell's avatar
      tcp: more accurately check DSACKs to grow RACK reordering window · a657db03
      Neal Cardwell authored
      Previously, a DSACK could expand the RACK reordering window when no
      reordering has been seen, and/or when the DSACK was due to an
      unnecessary TLP retransmit (rather than a spurious fast recovery due
      to reordering). This could result in unnecessarily growing the RACK
      reordering window and thus unnecessarily delaying RACK-based fast
      recovery episodes.
      
      To avoid these issues, this commit tightens the conditions under which
      a DSACK triggers the RACK reordering window to grow, so that a
      connection only expands its RACK reordering window if:
      
      (a) reordering has been seen in the connection
      (b) a DSACKed range does not match the most recent TLP retransmit
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a657db03
    • Yuchung Cheng's avatar
      tcp: more accurately detect spurious TLP probes · 63f367d9
      Yuchung Cheng authored
      Previously TLP is considered spurious if the sender receives any
      DSACK during a TLP episode. This patch further checks the DSACK
      sequences match the TLP's to improve accuracy.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63f367d9
    • Tonghao Zhang's avatar
      qdisc: add new field for qdisc_enqueue tracepoint · 409f386b
      Tonghao Zhang authored
      qdisc_enqueue tracepoint can work with qdisc:qdisc_dequeue
      to measure packets latency in qdisc queues.
      
      Add a new field txq for it, then we can retrieve more info.
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      409f386b
    • Jason Wang's avatar
      net: qed: remove unneeded return variables · ef17e2ac
      Jason Wang authored
      Some return variables are never changed until function returned.
      These variables are unneeded for their functions. Therefore, the
      unneeded return variables can be removed safely by returning their
      initial values.
      Signed-off-by: default avatarJason Wang <wangborong@cdjrlc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef17e2ac
    • Ioana Ciornei's avatar
      docs: networking: dpaa2: add documentation for the switch driver · d4b996f9
      Ioana Ciornei authored
      Add a documentation entry for the DPAA2 switch listing its
      requirements, features and some examples to go along them.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4b996f9
    • David S. Miller's avatar
      Merge branch 'ovs-upcall-issues' · 453a343c
      David S. Miller authored
      Mark Gray says:
      
      ====================
      openvswitch: per-cpu upcall patchwork issues
      
      Some issues were raised by patchwork at:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      453a343c
    • Mark Gray's avatar
      openvswitch: fix sparse warning incorrect type · 076999e4
      Mark Gray authored
      fix incorrect type in argument 1 (different address spaces)
      
      ../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces)
      ../net/openvswitch/datapath.c:169:17:    expected void const *
      ../net/openvswitch/datapath.c:169:17:    got struct dp_nlsk_pids [noderef] __rcu *upcall_portids
      
      Found at: https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159Signed-off-by: default avatarMark Gray <mark.d.gray@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      076999e4
    • Mark Gray's avatar
      openvswitch: fix alignment issues · 784dcfa5
      Mark Gray authored
      Signed-off-by: default avatarMark Gray <mark.d.gray@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      784dcfa5
    • Mark Gray's avatar
      e4252cb6
    • Yajun Deng's avatar
      net: netlink: add the case when nlh is NULL · f9b282b3
      Yajun Deng authored
      Add the case when nlh is NULL in nlmsg_report(),
      so that the caller doesn't need to deal with this case.
      Signed-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9b282b3
    • Vladimir Oltean's avatar
      net: build all switchdev drivers as modules when the bridge is a module · b0e81817
      Vladimir Oltean authored
      Currently, all drivers depend on the bool CONFIG_NET_SWITCHDEV, but only
      the drivers that call some sort of function exported by the bridge, like
      br_vlan_enabled() or whatever, have an extra dependency on CONFIG_BRIDGE.
      
      Since the blamed commit, all switchdev drivers have a functional
      dependency upon switchdev_bridge_port_{,un}offload(), which is a pair of
      functions exported by the bridge module and not by the bridge-independent
      part of CONFIG_NET_SWITCHDEV.
      
      Problems appear when we have:
      
      CONFIG_BRIDGE=m
      CONFIG_NET_SWITCHDEV=y
      CONFIG_TI_CPSW_SWITCHDEV=y
      
      because cpsw, am65_cpsw and sparx5 will then be built-in but they will
      call a symbol exported by a loadable module. This is not possible and
      will result in the following build error:
      
      drivers/net/ethernet/ti/cpsw_new.o: in function `cpsw_netdevice_event':
      drivers/net/ethernet/ti/cpsw_new.c:1520: undefined reference to
      					`switchdev_bridge_port_offload'
      drivers/net/ethernet/ti/cpsw_new.c:1537: undefined reference to
      					`switchdev_bridge_port_unoffload'
      
      As mentioned, the other switchdev drivers don't suffer from this because
      switchdev_bridge_port_offload() is not the first symbol exported by the
      bridge that they are calling, so they already needed to deal with this
      in the same way.
      
      Fixes: 2f5dc00f ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded")
      Reported-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0e81817
    • Saeed Mahameed's avatar
      ethtool: Fix rxnfc copy to user buffer overflow · 9b29a161
      Saeed Mahameed authored
      In the cited commit, copy_to_user() got called with the wrong pointer,
      instead of passing the actual buffer ptr to copy from, a pointer to
      the pointer got passed, which causes a buffer overflow calltrace to pop
      up when executing "ethtool -x ethX".
      
      Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed
      to the function, instead of a pointer to it.
      
      This fixes below call trace:
      [   15.533533] ------------[ cut here ]------------
      [   15.539007] Buffer overflow detected (8 < 192)!
      [   15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20
      [   15.549308] Modules linked in:
      [   15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058
      [   15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [   15.558378] RIP: 0010:copy_overflow+0x15/0x20
      [   15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55
      [   15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286
      [   15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000
      [   15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff
      [   15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff
      [   15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000
      [   15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000
      [   15.573584] FS:  00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000
      [   15.575639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0
      [   15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   15.582441] Call Trace:
      [   15.582970]  ethtool_rxnfc_copy_to_user+0x30/0x46
      [   15.583815]  ethtool_get_rxnfc.cold+0x23/0x2b
      [   15.584584]  dev_ethtool+0x29c/0x25f0
      [   15.585286]  ? security_netlbl_sid_to_secattr+0x77/0xd0
      [   15.586728]  ? do_set_pte+0xc4/0x110
      [   15.587349]  ? _raw_spin_unlock+0x18/0x30
      [   15.588118]  ? __might_sleep+0x49/0x80
      [   15.588956]  dev_ioctl+0x2c1/0x490
      [   15.589616]  sock_ioctl+0x18e/0x330
      [   15.591143]  __x64_sys_ioctl+0x41c/0x990
      [   15.591823]  ? irqentry_exit_to_user_mode+0x9/0x20
      [   15.592657]  ? irqentry_exit+0x33/0x40
      [   15.593308]  ? exc_page_fault+0x32f/0x770
      [   15.593877]  ? exit_to_user_mode_prepare+0x3c/0x130
      [   15.594775]  do_syscall_64+0x35/0x80
      [   15.595397]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   15.596037] RIP: 0033:0x7f5381226d4b
      [   15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48
      [   15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [   15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b
      [   15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003
      [   15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001
      [   15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000
      [   15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000
      [   15.605042] ---[ end trace 325cf185e2795048 ]---
      
      Fixes: dd98d289 ("ethtool: improve compat ioctl handling")
      Reported-by: default avatarShannon Nelson <snelson@pensando.io>
      CC: Arnd Bergmann <arnd@arndb.de>
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Tested-by: default avatarShannon Nelson <snelson@pensando.io>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b29a161
  2. 26 Jul, 2021 21 commits
    • David S. Miller's avatar
      Merge branch 'ipa-clock' · 268ca412
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: defer taking uC proxy clock
      
      This series rearranges some of the IPA initialization code.
      
      The first patch gets rid of two trivial setup and teardown
      functions, open-coding them in their callers instead.
      
      The second patch has memory regions get configured before endpoints.
      
      IPA interrupts do not depend on GSI being initialized.  Therefore
      they can be initialized in the config phase rather than waiting for
      setup.  The third patch moves this initialization earlier; memory
      regions must already be defined, so it's done after memory config.
      
      The microcontroller also has no dependency on GSI, though it does
      require IPA interrupts to be configured.  The fourth patch moves
      microcontroller initialization so it too happens during the config
      phase rather than setup.
      
      Finally, we currently take a "proxy clock" for the microcontroller
      during the config phase, dropping it only after we learn the
      microcontroller is initialized.  But microcontroller initialization
      is started by the modem, so there's no point in taking that clock
      reference before we know the modem has booted.  So the last patch
      arranges to wait to take the "proxy clock" for the microcontroller
      until we know the modem is about to boot.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      268ca412
    • Alex Elder's avatar
      net: ipa: introduce ipa_uc_clock() · e2f154e6
      Alex Elder authored
      The first time it's booted, the modem loads and starts the
      IPA-resident microcontroller.  Once the microcontroller has
      completed its initialization, it notifies the AP it's "ready"
      by sending an INIT_COMPLETED response message.
      
      Until it receives that microcontroller message, the AP must ensure
      the IPA core clock remains operational.  Currently, a "proxy" clock
      reference is taken in ipa_uc_config(), dropping it again once the
      message is received.
      
      However there could be a long delay between when ipa_config()
      completes and when modem actually starts.  And because the
      microcontroller gets loaded by the modem, there's no need to
      get the modem "proxy clock" until the first time it starts.
      
      Create a new function ipa_uc_clock() which takes the "proxy" clock
      reference for the microcontroller.  Call it when we get remoteproc
      SSR notification that the modem is about to start.  Keep an
      additional flag to record whether this proxy clock reference needs
      to be dropped at shutdown time, and issue a warning if we get the
      microcontroller message either before the clock reference is taken,
      or after it has already been dropped.
      
      Drop the nearby use of "hh" length modifiers, which are no longer
      encouraged in the kernel.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2f154e6
    • Alex Elder's avatar
      net: ipa: set up the microcontroller earlier · dc8f7e39
      Alex Elder authored
      Initializing up the IPA-resident microcontroller requires the IPA
      clock, and sets up two IPA interrupt handlers, but this does not
      require GSI access.  The interrupt handlers also require the clock
      to be enabled, and require the IPA memory regions to be configured,
      but neither requires GSI access.  As a result, the microcontroller
      can be initialized during the "config" rather than "setup" phase of
      IPA initialization.
      
      Initialize the microcontroller in ipa_config() rather than
      ipa_setup(), and rename the called function ipa_uc_config().
      Do the inverse in ipa_deconfig() rather than ipa_teardown(),
      and rename the function for that case ipa_uc_deconfig().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc8f7e39
    • Alex Elder's avatar
      net: ipa: set up IPA interrupts earlier · 1118a147
      Alex Elder authored
      Initialization of the IPA driver has several phases:
         - "init" phase can be done without any access to IPA hardware
         - "config" phase requires the IPA hardware to be clocked
         - "setup" phase requires the GSI layer to be functional
      
      Currently, initialization for the IPA interrupt handling code occurs
      in the setup phase.  It requires access to the IPA hardware but does
      not need GSI, so it can be moved to the config phase instead.
      
      Call the interrupt configuration function early in ipa_config()
      rather than from ipa_setup().  Rename ipa_interrupt_setup() to be
      ipa_interrupt_config(), and ipa_interrupt_teardown() to be
      ipa_interupt_deconfig(), so their names properly indicate when
      they get called.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1118a147
    • Alex Elder's avatar
      net: ipa: configure memory regions early · 07e1f689
      Alex Elder authored
      IPA-resident memory is one of the most primitive resources that
      needs initialization, so call init_mem_config() early in
      ipa_config().
      
      This is in preparation for initializing the IPA-resident
      microcontroller earlier.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07e1f689
    • Alex Elder's avatar
      net: ipa: kill ipa_modem_setup() · 63961f54
      Alex Elder authored
      The functions ipa_modem_setup() and ipa_modem_teardown() are trivial
      wrappers that call ipa_qmi_setup() and ipa_qmi_teardown().  Just
      call the QMI functions directly, and get rid of the wrappers.
      
      Improve the documentation of what setting up QMI does.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63961f54
    • Gustavo A. R. Silva's avatar
      flow_dissector: Fix out-of-bounds warnings · 323e0cb4
      Gustavo A. R. Silva authored
      Fix the following out-of-bounds warnings:
      
          net/core/flow_dissector.c: In function '__skb_flow_dissect':
      >> net/core/flow_dissector.c:1104:4: warning: 'memcpy' offset [24, 39] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'struct in6_addr' at offset 8 [-Warray-bounds]
           1104 |    memcpy(&key_addrs->v6addrs, &iph->saddr,
                |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           1105 |           sizeof(key_addrs->v6addrs));
                |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~
          In file included from include/linux/ipv6.h:5,
                           from net/core/flow_dissector.c:6:
          include/uapi/linux/ipv6.h:133:18: note: subobject 'saddr' declared here
            133 |  struct in6_addr saddr;
                |                  ^~~~~
      >> net/core/flow_dissector.c:1059:4: warning: 'memcpy' offset [16, 19] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'unsigned int' at offset 12 [-Warray-bounds]
           1059 |    memcpy(&key_addrs->v4addrs, &iph->saddr,
                |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           1060 |           sizeof(key_addrs->v4addrs));
                |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~
          In file included from include/linux/ip.h:17,
                           from net/core/flow_dissector.c:5:
          include/uapi/linux/ip.h:103:9: note: subobject 'saddr' declared here
            103 |  __be32 saddr;
                |         ^~~~~
      
      The problem is that the original code is trying to copy data into a
      couple of struct members adjacent to each other in a single call to
      memcpy().  So, the compiler legitimately complains about it. As these
      are just a couple of members, fix this by copying each one of them in
      separate calls to memcpy().
      
      This helps with the ongoing efforts to globally enable -Warray-bounds
      and get us closer to being able to tighten the FORTIFY_SOURCE routines
      on memcpy().
      
      Link: https://github.com/KSPP/linux/issues/109Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/lkml/d5ae2e65-1f18-2577-246f-bada7eee6ccd@intel.com/Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      323e0cb4
    • Gustavo A. R. Silva's avatar
      ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs() · 6321c7ac
      Gustavo A. R. Silva authored
      Fix the following out-of-bounds warning:
      
          In function 'ip_copy_addrs',
              inlined from '__ip_queue_xmit' at net/ipv4/ip_output.c:517:2:
      net/ipv4/ip_output.c:449:2: warning: 'memcpy' offset [40, 43] from the object at 'fl' is out of the bounds of referenced subobject 'saddr' with type 'unsigned int' at offset 36 [-Warray-bounds]
            449 |  memcpy(&iph->saddr, &fl4->saddr,
                |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            450 |         sizeof(fl4->saddr) + sizeof(fl4->daddr));
                |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The problem is that the original code is trying to copy data into a
      couple of struct members adjacent to each other in a single call to
      memcpy(). This causes a legitimate compiler warning because memcpy()
      overruns the length of &iph->saddr and &fl4->saddr. As these are just
      a couple of struct members, fix this by using direct assignments,
      instead of memcpy().
      
      This helps with the ongoing efforts to globally enable -Warray-bounds
      and get us closer to being able to tighten the FORTIFY_SOURCE routines
      on memcpy().
      
      Link: https://github.com/KSPP/linux/issues/109Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/lkml/d5ae2e65-1f18-2577-246f-bada7eee6ccd@intel.com/Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6321c7ac
    • Alex Elder's avatar
      net: ipa: enable inline checksum offload for IPA v4.5+ · 22171146
      Alex Elder authored
      The RMNet and IPA drivers both support inline checksum offload now.
      So enable it for the TX and RX modem endoints for IPA version 4.5+.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22171146
    • David S. Miller's avatar
      Merge branch 'ipa-kill-validation' · 2739bd76
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: kill IPA_VALIDATION
      
      A few months ago I proposed cleaning up some code that validates
      certain things conditionally, arguing that doing so once is enough,
      thus doing so always should not be necessary.
        https://lore.kernel.org/netdev/20210320141729.1956732-1-elder@linaro.org/
      Leon Romanovsky felt strongly that this was a mistake, and in the
      end I agreed to change my plans.
      
      This series finally completes what I said I would do about this,
      ultimately eliminating the IPA_VALIDATION symbol and conditional
      code entirely.
      
      The first patch both extends and simplifies some validation done for
      IPA immediate commands, and performs those tests unconditionally.
      
      The second patch fixes a bug that wasn't normally exposed because of
      the conditional compilation (a reason Leon was right about this).
      It makes filter and routing table validation occur unconditionally.
      
      The third eliminates the remaining conditionally-defined code and
      removes the line in the Makefile used to enable validation.
      
      And the fourth removes all comments containing ipa_assert()
      statements, replacing most of them with WARN_ON() calls.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2739bd76
    • Alex Elder's avatar
      net: ipa: use WARN_ON() rather than assertions · 5bc55884
      Alex Elder authored
      I've added commented assertions to record certain properties that
      can be assumed to hold in certain places in the IPA code.  Convert
      these into real WARN_ON() calls so the assertions are actually
      checked, using the standard WARN_ON() mechanism.
      
      Where errors can be returned, return an error if a warning is
      triggered.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bc55884
    • Alex Elder's avatar
      net: ipa: kill the remaining conditional validation code · 442d68eb
      Alex Elder authored
      There are only a few remaining spots that validate IPA code
      conditional on whether a symbol is defined at compile time.
      The checks are not expensive, so just build them always.
      
      This completes the removal of all CONFIG_VALIDATE/CONFIG_VALIDATION
      IPA code.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      442d68eb
    • Alex Elder's avatar
      net: ipa: always validate filter and route tables · 546948bf
      Alex Elder authored
      All checks in ipa_table_validate_build() are computed at build time,
      so build that unconditionally.
      
      In ipa_table_valid() calls to ipa_table_valid_one() are missing the
      IPA pointer parameter is missing in (a bug that shows up only when
      IPA_VALIDATE is defined).  Don't bother checking whether hashed
      table memory regions are valid if hashed tables are not supported.
      
      With those things fixed, have these table validation functions built
      unconditionally (not dependent on IPA_VALIDATE).
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      546948bf
    • Alex Elder's avatar
      net: ipa: fix ipa_cmd_table_valid() · f2c1dac0
      Alex Elder authored
      Stop supporting different sizes for hashed and non-hashed filter or
      route tables.  Add BUILD_BUG_ON() calls to verify the sizes of the
      fields in the filter/route table initialization immediate command
      are the same.
      
      Add a check to ipa_cmd_table_valid() to ensure the size of the
      memory region being checked fits within the immediate command field
      that must hold it.
      
      Remove two Boolean parameters used only for error reporting.  This
      actually fixes a bug that would only show up if IPA_VALIDATE were
      defined.  Define ipa_cmd_table_valid() unconditionally (no longer
      dependent on IPA_VALIDATE).
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2c1dac0
    • David S. Miller's avatar
      Merge branch 'sja1105-bridge-port-traffic-termination' · beeee08c
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Traffic termination for sja1105 ports under VLAN-aware bridge
      
      This set of patches updates the sja1105 DSA driver to be able to send
      and receive network stack packets on behalf of a VLAN-aware upper bridge
      interface.
      
      The reasons why this has traditionally been a problem are explained in
      the "Traffic support" section of Documentation/networking/dsa/sja1105.rst.
      (the entire documentation will be revised in a separate patch series).
      
      The limitations that have prevented us from doing this so far have now
      been partially lifted by the bridge's ability to send a packet with
      skb->offload_fwd_mark = true, which means that the accelerator is
      allowed to look up its hardware FDB when sending a packet and deliver it
      to those destination ports. Basically skb->dev is now just a conduit to
      the switchdev driver's ndo_start_xmit(), and does not guarantee that the
      packet will really be transmitted on that port (but it will be
      transmitted where it should, nonetheless).
      
      Apart from the ability to perform IP termination on VLAN-aware bridges
      on top of sja1105 interfaces, we also gain the following features:
      - VLAN-aware software bridging between sja1105 ports and "foreign"
        (non-DSA) interfaces
      - software bridging between sja1105 bridge ports, and software LAG
        uppers of sja1105 ports (as long as the bridge is VLAN-aware)
      
      The only things that don't work are:
      1. to create an AF_PACKET socket on top of a sja1105 port that is under
         a VLAN-aware bridge. This is because the "imprecise RX" procedure
         selects an RX port for data plane* packets based on the assumption
         that the packet will land in the bridge's data path.  If ebtables
         rules are added to remove some packets from the bridge's data path,
         that assumption will be broken. Nonetheless, this is not a limitation
         that negatively impacts the known use cases with this switch.  If
         there was a way to impose user space restrictions against creating
         AF_PACKET sockets on this particular configuration, I could be
         interested in adding those restrictions, but I think there are other
         known broken configs already which are not checked by the kernel
         today (like for example that the bridge's rx_handler steals packets
         anyway from AF_PACKET sockets with exact-match ptype handlers, as
         opposed to ptype_all which are processed earlier; this is precisely
         the reason why ebtables rules are generally needed to avoid that).
      2. to send traffic on behalf of an 8021q upper of a standalone interface,
         while other sja1105 ports are part of a VLAN-aware bridge. This is
         because sja1105 sets ds->vlan_filtering_is_global = true, so we
         cannot make the standalone port ignore the VLAN header from the
         packet on RX, so we cannot make tag_8021q enforce its own pvid for
         the packets belonging to that port's 8021q upper. So we cannot
         determine in the first place that packets come from that port, unless
         we iterate through all 8021q uppers of all ports, and enforce
         uniqueness of VLAN IDs. I am not sure if this is what I want / if it
         is worth it, so currently all 8021q uppers are denied, regardless of
         whether the switch has ports under a VLAN-aware bridge or not
         (otherwise it becomes complicated even to track the state).
         Nonetheless, the VID uniqueness of all 8021q uppers does raise
         another question: what to do with VID 0, which has no 8021q upper,
         but the 8021q module adds it to our RX filter with vlan_vid_add().
         I am honestly not sure what to do. The best I can do is enable a
         hardware bit in sja1105 which reclassifies VID 0 frames to the PVID,
         and they will be sent on the CPU port using either the tag_8021q pvid
         of standalone ports, or the bridge pvid of VLAN-aware ports. So at
         the very least, those packets are still 'kinda' processed as if they
         were untagged, but the VID 0 is lost, though. In my defence, Marvell
         appears to do the same thing with reclassifying VID 0 frames, see
         commit b8b79c41 ("net: dsa: mv88e6xxx: Fix adding vlan 0").
      
      *Control packets (currently hardcoded in sja1105 as link-local packets
      for MAC DA ranges 01-80-c2-xx-xx-xx and 01-1b-19-xx-xx-xx) are received
      based on packet traps and their precise source port is always known.
      
      I have taken one patch from Colin because my work conflicts with his,
      and integrating it all through the same series avoids that.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      beeee08c
    • Vladimir Oltean's avatar
      Revert "net: dsa: Allow drivers to filter packets they can decode source port from" · edac6f63
      Vladimir Oltean authored
      This reverts commit cc1939e4.
      
      Currently 2 classes of DSA drivers are able to send/receive packets
      directly through the DSA master:
      - drivers with DSA_TAG_PROTO_NONE
      - sja1105
      
      Now that sja1105 has gained the ability to perform traffic termination
      even under the tricky case (VLAN-aware bridge), and that is much more
      functional (we can perform VLAN-aware bridging with foreign interfaces),
      there is no reason to keep this code in the receive path of the network
      core. So delete it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edac6f63
    • Vladimir Oltean's avatar
      net: dsa: sja1105: add bridge TX data plane offload based on tag_8021q · b6ad86e6
      Vladimir Oltean authored
      The main desire for having this feature in sja1105 is to support network
      stack termination for traffic coming from a VLAN-aware bridge.
      
      For sja1105, offloading the bridge data plane means sending packets
      as-is, with the proper VLAN tag, to the chip. The chip will look up its
      FDB and forward them to the correct destination port.
      
      But we support bridge data plane offload even for VLAN-unaware bridges,
      and the implementation there is different. In fact, VLAN-unaware
      bridging is governed by tag_8021q, so it makes sense to have the
      .bridge_fwd_offload_add() implementation fully within tag_8021q.
      The key difference is that we only support 1 VLAN-aware bridge, but we
      support multiple VLAN-unaware bridges. So we need to make sure that the
      forwarding domain is not crossed by packets injected from the stack.
      
      For this, we introduce the concept of a tag_8021q TX VLAN for bridge
      forwarding offload. As opposed to the regular TX VLANs which contain
      only 2 ports (the user port and the CPU port), a bridge data plane TX
      VLAN is "multicast" (or "imprecise"): it contains all the ports that are
      part of a certain bridge, and the hardware will select where the packet
      goes within this "imprecise" forwarding domain.
      
      Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use
      of the unique "bridge_num" provided by DSA for the data plane offload.
      We use the same 3 bits from the tag_8021q VLAN ID format to encode this
      bridge number.
      
      Note that these 3 bit positions have been used before for sub-VLANs in
      best-effort VLAN filtering mode. The difference is that for best-effort,
      the sub-VLANs were only valid on RX (and it was documented that the
      sub-VLAN field needed to be transmitted as zero). Whereas for the bridge
      data plane offload, these 3 bits are only valid on TX.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6ad86e6
    • Vladimir Oltean's avatar
      net: dsa: sja1105: add support for imprecise RX · 884be12f
      Vladimir Oltean authored
      This is already common knowledge by now, but the sja1105 does not have
      hardware support for DSA tagging for data plane packets, and tag_8021q
      sets up a unique pvid per port, transmitted as VLAN-tagged towards the
      CPU, for the source port to be decoded nonetheless.
      
      When the port is part of a VLAN-aware bridge, the pvid committed to
      hardware is taken from the bridge and not from tag_8021q, so we need to
      work with that the best we can.
      
      Configure the switches to send all packets to the CPU as VLAN-tagged
      (even ones that were originally untagged on the wire) and make use of
      dsa_untag_bridge_pvid() to get rid of it before we send those packets up
      the network stack.
      
      With the classified VLAN used by hardware known to the tagger, we first
      peek at the VID in an attempt to figure out if the packet was received
      from a VLAN-unaware port (standalone or under a VLAN-unaware bridge),
      case in which we can continue to call dsa_8021q_rcv(). If that is not
      the case, the packet probably came from a VLAN-aware bridge. So we call
      the DSA helper that finds for us a "designated bridge port" - one that
      is a member of the VLAN ID from the packet, and is in the proper STP
      state - basically these are all checks performed by br_handle_frame() in
      the software RX data path.
      
      The bridge will accept the packet as valid even if the source port was
      maybe wrong. So it will maybe learn the MAC SA of the packet on the
      wrong port, and its software FDB will be out of sync with the hardware
      FDB. So replies towards this same MAC DA will not work, because the
      bridge will send towards a different netdev.
      
      This is where the bridge data plane offload ("imprecise TX") added by
      the next patch comes in handy. The software FDB is wrong, true, but the
      hardware FDB isn't, and by offloading the bridge forwarding plane we
      have a chance to right a wrong, and have the hardware look up the FDB
      for us for the reply packet. So it all cancels out.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      884be12f
    • Vladimir Oltean's avatar
      net: dsa: sja1105: deny more than one VLAN-aware bridge · 19fa937a
      Vladimir Oltean authored
      With tag_sja1105.c's only ability being to perform an imprecise RX
      procedure and identify whether a packet comes from a VLAN-aware bridge
      or not, we have no way to determine whether a packet with VLAN ID 5
      comes from, say, br0 or br1. Actually we could, but it would mean that
      we need to restrict all VLANs from br0 to be different from all VLANs
      from br1, and this includes the default_pvid, which makes a setup with 2
      VLAN-aware bridges highly imprectical.
      
      The fact of the matter is that this isn't even that big of a practical
      limitation, since even with a single VLAN-aware bridge we can pretty
      much enforce forwarding isolation based on the VLAN port membership.
      
      So in the end, tell the user that they need to model their setup using a
      single VLAN-aware bridge.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19fa937a
    • Vladimir Oltean's avatar
      net: dsa: sja1105: deny 8021q uppers on ports · 4fbc08bd
      Vladimir Oltean authored
      Now that best-effort VLAN filtering is gone and we are left with the
      imprecise RX and imprecise TX based in VLAN-aware mode, where the tagger
      just guesses the source port based on plausibility of the VLAN ID, 8021q
      uppers installed on top of a standalone port, while other ports of that
      switch are under a VLAN-aware bridge don't quite "just work".
      
      In fact it could be possible to restrict the VLAN IDs used by the 8021q
      uppers to not be shared with VLAN IDs used by that VLAN-aware bridge,
      but then the tagger needs to be patched to search for 8021q uppers too,
      not just for the "designated bridge port" which will be introduced in a
      later patch.
      
      I haven't given a possible implementation full thought, it seems maybe
      possible but not worth the effort right now. The only certain thing is
      that currently the tagger won't be able to figure out the source port
      for these packets because they will come with the VLAN ID of the 8021q
      upper and are no longer retagged to a tag_8021q sub-VLAN like the best
      effort VLAN filtering code used to do. So just deny these for the
      moment.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fbc08bd
    • Vladimir Oltean's avatar
      net: dsa: sja1105: delete vlan delta save/restore logic · 6dfd23d3
      Vladimir Oltean authored
      With the best_effort_vlan_filtering mode now gone, the driver does not
      have 3 operating modes anymore (VLAN-unaware, VLAN-aware and best effort),
      but only 2.
      
      The idea is that we will gain support for network stack I/O through a
      VLAN-aware bridge, using the data plane offload framework (imprecise RX,
      imprecise TX). So the VLAN-aware use case will be more functional.
      
      But standalone ports that are part of the same switch when some other
      ports are under a VLAN-aware bridge should work too. Termination on
      those should work through the tag_8021q RX VLAN and TX VLAN.
      
      This was not possible using the old logic, because:
      - in VLAN-unaware mode, only the tag_8021q VLANs were committed to hw
      - in VLAN-aware mode, only the bridge VLANs were committed to hw
      - in best-effort VLAN mode, both the tag_8021q and bridge VLANs were
        committed to hw
      
      The strategy for the new VLAN-aware mode is to allow the bridge and the
      tag_8021q VLANs to coexist in the VLAN table at the same time.
      
      [ yes, we need to make sure that the bridge cannot install a tag_8021q
        VLAN, but ]
      
      This means that the save/restore logic introduced by commit ec5ae610
      ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
      does not serve a purpose any longer. We can delete it and restore the
      old code that simply adds a VLAN to the VLAN table and calls it a day.
      
      Note that we keep the sja1105_commit_pvid() function from those days,
      but adapt it slightly. Ports that are under a VLAN-aware bridge use the
      bridge's pvid, ports that are standalone or under a VLAN-unaware bridge
      use the tag_8021q pvid, for local termination or VLAN-unaware forwarding.
      
      Now, when the vlan_filtering property is toggled for the bridge, the
      pvid of the ports beneath it is the only thing that's changing, we no
      longer delete some VLANs and restore others.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dfd23d3