1. 12 May, 2020 13 commits
  2. 11 May, 2020 27 commits
    • David S. Miller's avatar
      Merge branch 'improve-msg_control-kernel-vs-user-pointer-handling' · 97cf0ef9
      David S. Miller authored
      Christoph Hellwig says:
      
      ====================
      improve msg_control kernel vs user pointer handling
      
      this series replace the msg_control in the kernel msghdr structure
      with an anonymous union and separate fields for kernel vs user
      pointers.  In addition to helping a bit with type safety and reducing
      sparse warnings, this also allows to remove the set_fs() in
      kernel_recvmsg, helping with an eventual entire removal of set_fs().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97cf0ef9
    • Christoph Hellwig's avatar
      net: cleanly handle kernel vs user buffers for ->msg_control · 1f466e1f
      Christoph Hellwig authored
      The msg_control field in struct msghdr can either contain a user
      pointer when used with the recvmsg system call, or a kernel pointer
      when used with sendmsg.  To complicate things further kernel_recvmsg
      can stuff a kernel pointer in and then use set_fs to make the uaccess
      helpers accept it.
      
      Replace it with a union of a kernel pointer msg_control field, and
      a user pointer msg_control_user one, and allow kernel_recvmsg operate
      on a proper kernel pointer using a bitfield to override the normal
      choice of a user pointer for recvmsg.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f466e1f
    • Christoph Hellwig's avatar
      net/scm: cleanup scm_detach_fds · 2618d530
      Christoph Hellwig authored
      Factor out two helpes to keep the code tidy.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2618d530
    • Christoph Hellwig's avatar
      net: add a CMSG_USER_DATA macro · 0462b6bd
      Christoph Hellwig authored
      Add a variant of CMSG_DATA that operates on user pointer to avoid
      sparse warnings about casting to/from user pointers.  Also fix up
      CMSG_DATA to rely on the gcc extension that allows void pointer
      arithmetics to cut down on the amount of casts.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0462b6bd
    • David S. Miller's avatar
      Merge branch 'net-dsa-Constify-two-tagger-ops' · 3242956b
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: dsa: Constify two tagger ops
      
      This patch series constifies the dsa_device_ops for ocelot and sja1105
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3242956b
    • Florian Fainelli's avatar
      net: dsa: tag_sja1105: Constify dsa_device_ops · 097f0244
      Florian Fainelli authored
      sja1105_netdev_ops should be const since that is what the DSA layer
      expects.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      097f0244
    • Florian Fainelli's avatar
      net: dsa: ocelot: Constify dsa_device_ops · 2fa3888b
      Florian Fainelli authored
      ocelot_netdev_ops should be const since that is what the DSA layer
      expects.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fa3888b
    • David S. Miller's avatar
      Merge branch 'sfc-remove-nic_data-usage-in-common-code' · 9b1b31d5
      David S. Miller authored
      Edward Cree says:
      
      ====================
      sfc: remove nic_data usage in common code
      
      efx->nic_data should only be used from NIC-specific code (i.e. nic_type
       functions and things they call), in files like ef10[_sriov].c and
       siena.c.  This series refactors several nic_data usages from common
       code (mainly in mcdi_filters.c) into nic_type functions, in preparation
       for the upcoming ef100 driver which will use those functions but have
       its own struct layout for efx->nic_data distinct from ef10's.
      After this series, one nic_data usage (in ptp.c) remains; it wasn't
       clear to me how to fix it, and ef100 devices don't yet have PTP support
       (so the initial ef100 driver will not call that code).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b1b31d5
    • Edward Cree's avatar
      sfc: make firmware-variant printing a nic_type function · 9b46132c
      Edward Cree authored
      Instead of having efx_mcdi_print_fwver() look at efx_nic_rev and
       conditionally poke around inside ef10-specific nic_data, add a new
       efx->type->print_additional_fwver() method to do this work.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b46132c
    • Edward Cree's avatar
      sfc: make filter table probe caller responsible for adding VLANs · ed02112c
      Edward Cree authored
      By making the caller of efx_mcdi_filter_table_probe() loop over the
       vlan_list calling efx_mcdi_filter_add_vlan(), instead of doing it in
       efx_mcdi_filter_table_probe(), the latter avoids looking in ef10-
       specific nic_data.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed02112c
    • Edward Cree's avatar
      sfc: move rx_rss_context_exclusive into struct efx_mcdi_filter_table · dbf2c669
      Edward Cree authored
      It's both set and used solely by mcdi_filters.c, so there's no reason
       for it to be in ef10-specific nic_data.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbf2c669
    • Edward Cree's avatar
      sfc: rework handling of (firmware) multicast chaining state · fd14e5fd
      Edward Cree authored
      Store the mc_chaining bit in struct efx_mcdi_filter_table, so that common
       code in mcdi_filters.c doesn't need to get it from ef10-specific nic_data.
      Also, probe the firmware workaround just before the call to
       efx_mcdi_filter_table_probe(), rather than in a random other part of the
       driver bringup, to ensure that (a) it gets probed in time and (b) it gets
       reprobed as necessary on resets, no matter how the surrounding code gets
       reorganised and reordered.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd14e5fd
    • Edward Cree's avatar
      sfc: move 'must restore' flags out of ef10-specific nic_data · e4fe938c
      Edward Cree authored
      Common code in mcdi_filters.c uses these flags, so by moving them to
       either struct efx_nic (in the case of must_realloc_vis) or struct
       efx_mcdi_filter_table (for must_restore_rss_contexts and
       must_restore_filters), decouple this code from ef10's nic_data.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4fe938c
    • Edward Cree's avatar
      sfc: use efx_has_cap for capability checks outside of NIC-specific code · 484a75b1
      Edward Cree authored
      Removes some efx_ef10_nic_data references from common code.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      484a75b1
    • Tom Zhao's avatar
      sfc: make capability checking a nic_type function · be904b85
      Tom Zhao authored
      Various MCDI functions (especially in filter handling) need to check the
       datapath caps, but those live in nic_data (since they don't exist on
       Siena).  Decouple from ef10-specific data structures by adding check_caps
       to the nic_type, to allow using these functions from non-ef10 drivers.
      
      Also add a convenience macro efx_has_cap() to reduce the amount of
       boilerplate involved in calling it.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be904b85
    • Edward Cree's avatar
      sfc: move vport_id to struct efx_nic · dfcabb07
      Edward Cree authored
      Remove some usage of ef10-specific nic_data structs from common MCDI
       functions, in preparation for using them from a non-EF10 driver.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dfcabb07
    • David S. Miller's avatar
      Merge branch 'net-Optimize-the-qed-allocations-inside-kdump-kernel' · a90f704a
      David S. Miller authored
      Bhupesh Sharma says:
      
      ====================
      net: Optimize the qed* allocations inside kdump kernel
      
      Changes since v1:
      ----------------
      - v1 can be seen here: http://lists.infradead.org/pipermail/kexec/2020-May/024935.html
      - Addressed review comments received on v1:
        * Removed unnecessary paranthesis.
        * Used a different macro for minimum RX/TX ring count value in kdump
          kernel.
      
      Since kdump kernel(s) run under severe memory constraint with the
      basic idea being to save the crashdump vmcore reliably when the primary
      kernel panics/hangs, large memory allocations done by a network driver
      can cause the crashkernel to panic with OOM.
      
      The qed* drivers take up approximately 214MB memory when run in the
      kdump kernel with the default configuration settings presently used in
      the driver. With an usual crashkernel size of 512M, this allocation
      is equal to almost half of the total crashkernel size allocated.
      
      See some logs obtained via memstrack tool (see [1]) below:
       dracut-pre-pivot[676]: ======== Report format module_summary: ========
       dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
       dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)
      
      This patchset tries to reduce the overall memory allocation profile of
      the qed* driver when they run in the kdump kernel. With these
      optimization we can see a saving of approx 85M in the kdump kernel:
       dracut-pre-pivot[671]: ======== Report format module_summary: ========
       dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)
       <..snip..>
       dracut-pre-pivot[671]: Module qede using 4.6MB (73 pages), peak allocation 4.6MB (74 pages)
      
      And the kdump kernel can save vmcore successfully via both ssh and nfs
      interfaces.
      
      This patchset contains two patches:
      [PATCH 1/2] - Reduces the default TX and RX ring count in kdump kernel.
      [PATCH 2/2] - Disables qed SRIOV feature in kdump kernel (as it is
                    normally not a supported kdump target for saving
      	      vmcore).
      
      [1]. Memstrack tool: https://github.com/ryncsn/memstrack
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a90f704a
    • Bhupesh Sharma's avatar
      net: qed: Disable SRIOV functionality inside kdump kernel · 37d4f8a6
      Bhupesh Sharma authored
      Since we have kdump kernel(s) running under severe memory constraint
      it makes sense to disable the qed SRIOV functionality when running the
      kdump kernel as kdump configurations on several distributions don't
      support SRIOV targets for saving the vmcore (see [1] for example).
      
      Currently the qed SRIOV functionality ends up consuming memory in
      the kdump kernel, when we don't really use the same.
      
      An example log seen in the kdump kernel with the SRIOV functionality
      enabled can be seen below (obtained via memstrack tool, see [2]):
       dracut-pre-pivot[676]: ======== Report format module_summary: ========
       dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
      
      This patch disables the SRIOV functionality inside kdump kernel and with
      the same applied the memory consumption goes down:
       dracut-pre-pivot[671]: ======== Report format module_summary: ========
       dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)
      
      [1]. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/installing-and-configuring-kdump_managing-monitoring-and-updating-the-kernel#supported-kdump-targets_supported-kdump-configurations-and-targets
      [2]. Memstrack tool: https://github.com/ryncsn/memstrack
      
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Ariel Elior <aelior@marvell.com>
      Cc: GR-everest-linux-l2@marvell.com
      Cc: Manish Chopra <manishc@marvell.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37d4f8a6
    • Bhupesh Sharma's avatar
      net: qed*: Reduce RX and TX default ring count when running inside kdump kernel · 73e03097
      Bhupesh Sharma authored
      Normally kdump kernel(s) run under severe memory constraint with the
      basic idea being to save the crashdump vmcore reliably when the primary
      kernel panics/hangs.
      
      Currently the qed* ethernet driver ends up consuming a lot of memory in
      the kdump kernel, leading to kdump kernel panic when one tries to save
      the vmcore via ssh/nfs (thus utilizing the services of the underlying
      qed* network interfaces).
      
      An example OOM message log seen in the kdump kernel can be seen here
      [1], with crashkernel size reservation of 512M.
      
      Using tools like memstrack (see [2]), we can track the modules taking up
      the bulk of memory in the kdump kernel and organize the memory usage
      output as per 'highest allocator first'. An example log for the OOM case
      indicates that the qed* modules end up allocating approximately 216M
      memory, which is a large part of the total crashkernel size:
      
       dracut-pre-pivot[676]: ======== Report format module_summary: ========
       dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
       dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)
      
      This patch reduces the default RX and TX ring count from 1024 to 64
      when running inside kdump kernel, which leads to a significant memory
      saving.
      
      An example log with the patch applied shows the reduced memory
      allocation in the kdump kernel:
       dracut-pre-pivot[674]: ======== Report format module_summary: ========
       dracut-pre-pivot[674]: Module qed using 141.8MB (2268 pages), peak allocation 141.8MB (2268 pages)
       <..snip..>
      [dracut-pre-pivot[674]: Module qede using 4.8MB (76 pages), peak allocation 4.9MB (78 pages)
      
      Tested crashdump vmcore save via ssh/nfs protocol using underlying qed*
      network interface after applying this patch.
      
      [1] OOM log:
      ------------
      
       kworker/0:6: page allocation failure: order:6,
       mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
       kworker/0:6 cpuset=/ mems_allowed=0
       CPU: 0 PID: 145 Comm: kworker/0:6 Not tainted 4.18.0-109.el8.aarch64 #1
       Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL025
       01/18/2019
       Workqueue: events work_for_cpu_fn
       Call trace:
        dump_backtrace+0x0/0x188
        show_stack+0x24/0x30
        dump_stack+0x90/0xb4
        warn_alloc+0xf4/0x178
        __alloc_pages_nodemask+0xcac/0xd58
        alloc_pages_current+0x8c/0xf8
        kmalloc_order_trace+0x38/0x108
        qed_iov_alloc+0x40/0x248 [qed]
        qed_resc_alloc+0x224/0x518 [qed]
        qed_slowpath_start+0x254/0x928 [qed]
         __qede_probe+0xf8/0x5e0 [qede]
        qede_probe+0x68/0xd8 [qede]
        local_pci_probe+0x44/0xa8
        work_for_cpu_fn+0x20/0x30
        process_one_work+0x1ac/0x3e8
        worker_thread+0x44/0x448
        kthread+0x130/0x138
        ret_from_fork+0x10/0x18
        Cannot start slowpath
        qede: probe of 0000:05:00.1 failed with error -12
      
      [2]. Memstrack tool: https://github.com/ryncsn/memstrack
      
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Ariel Elior <aelior@marvell.com>
      Cc: GR-everest-linux-l2@marvell.com
      Cc: Manish Chopra <manishc@marvell.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73e03097
    • Luo bin's avatar
      hinic: add link_ksettings ethtool_ops support · 01f2b3da
      Luo bin authored
      add set_link_ksettings implementation and improve the implementation
      of get_link_ksettings
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01f2b3da
    • Gustavo A. R. Silva's avatar
      team: Replace zero-length array with flexible-array · 9c8255c8
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c8255c8
    • Gustavo A. R. Silva's avatar
      net: atarilance: Replace zero-length array with flexible-array · c2dfc7d2
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2dfc7d2
    • Gustavo A. R. Silva's avatar
      ipv6: Replace zero-length array with flexible-array · 0fa39d6d
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fa39d6d
    • Jakub Kicinski's avatar
      Merge branch 'cross-chip-bridging-for-disjoint-dsa-trees' · a6f0b26d
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      This series adds support for boards where DSA switches of multiple types
      are cascaded together. Actually this type of setup was brought up before
      on netdev, and it looks like utilizing disjoint trees is the way to go:
      
      https://lkml.org/lkml/2019/7/7/225
      
      The trouble with disjoint trees (prior to this patch series) is that only
      bridging of ports within the same hardware switch can be offloaded.
      After scratching my head for a while, it looks like the easiest way to
      support hardware bridging between different DSA trees is to bridge their
      DSA masters and extend the crosschip bridging operations.
      
      I have given some thought to bridging the DSA masters with the slaves
      themselves, but given the hardware topology described in the commit
      message of patch 4/4, virtually any number (and combination) of bridges
      (forwarding domains) can be created on top of those 3x4-port front-panel
      switches. So it becomes a lot less obvious, when the front-panel ports
      are enslaved to more than 1 bridge, which bridge should the DSA masters
      be enslaved to.
      
      So the least awkward approach was to just create a completely separate
      bridge for the DSA masters, whose entire purpose is to permit hardware
      forwarding between the discrete switches beneath it.
      
      This is a direct resend of v3, which was deferred due to lack of review.
      In the meantime Florian has reviewed and tested some of them.
      
      v1 was submitted here:
      https://patchwork.ozlabs.org/project/netdev/cover/20200429161952.17769-1-olteanv@gmail.com/
      
      v2 was submitted here:
      https://patchwork.ozlabs.org/project/netdev/cover/20200430202542.11797-1-olteanv@gmail.com/
      
      v3 was submitted here:
      https://patchwork.ozlabs.org/project/netdev/cover/20200503221228.10928-1-olteanv@gmail.com/
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6f0b26d
    • Vladimir Oltean's avatar
      net: dsa: sja1105: implement cross-chip bridging operations · ac02a451
      Vladimir Oltean authored
      sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart
      and which is compatible with cascading. A complete description of this
      tagging format is in net/dsa/tag_8021q.c, but a quick summary is that
      each external-facing port tags incoming frames with a unique pvid, and
      this special VLAN is transmitted as tagged towards the inside of the
      system, and as untagged towards the exterior. The tag encodes the switch
      id and the source port index.
      
      This means that cross-chip bridging for dsa_8021q only entails adding
      the dsa_8021q pvids of one switch to the RX filter of the other
      switches. Everything else falls naturally into place, as long as the
      bottom-end of ports (the leaves in the tree) is comprised exclusively of
      dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be
      a chance that a front-panel switch transmits a packet tagged with a
      dsa_8021q header, header which it wouldn't be able to remove, and which
      would hence "leak" out.
      
      The only use case I tested (due to lack of board availability) was when
      the sja1105 switches are part of disjoint trees (however, this doesn't
      change the fact that multiple sja1105 switches still need unique switch
      identifiers in such a system). But in principle, even "true" single-tree
      setups (with DSA links) should work just as fine, except for a small
      change which I can't test: dsa_towards_port should be used instead of
      dsa_upstream_port (I made the assumption that the routing port that any
      sja1105 should use towards its neighbours is the CPU port. That might
      not hold true in other setups).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ac02a451
    • Vladimir Oltean's avatar
      net: dsa: introduce a dsa_switch_find function · 3b7bc1f0
      Vladimir Oltean authored
      Somewhat similar to dsa_tree_find, dsa_switch_find returns a dsa_switch
      structure pointer by searching for its tree index and switch index (the
      parameters from dsa,member). To be used, for example, by drivers who
      implement .crosschip_bridge_join and need a reference to the other
      switch indicated to by the tree_index and sw_index arguments.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3b7bc1f0
    • Vladimir Oltean's avatar
      net: dsa: permit cross-chip bridging between all trees in the system · f66a6a69
      Vladimir Oltean authored
      One way of utilizing DSA is by cascading switches which do not all have
      compatible taggers. Consider the following real-life topology:
      
            +---------------------------------------------------------------+
            | LS1028A                                                       |
            |               +------------------------------+                |
            |               |      DSA master for Felix    |                |
            |               |(internal ENETC port 2: eno2))|                |
            |  +------------+------------------------------+-------------+  |
            |  | Felix embedded L2 switch                                |  |
            |  |                                                         |  |
            |  | +--------------+   +--------------+   +--------------+  |  |
            |  | |DSA master for|   |DSA master for|   |DSA master for|  |  |
            |  | |  SJA1105 1   |   |  SJA1105 2   |   |  SJA1105 3   |  |  |
            |  | |(Felix port 1)|   |(Felix port 2)|   |(Felix port 3)|  |  |
            +--+-+--------------+---+--------------+---+--------------+--+--+
      
      +-----------------------+ +-----------------------+ +-----------------------+
      |   SJA1105 switch 1    | |   SJA1105 switch 2    | |   SJA1105 switch 3    |
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3|
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      
      The above can be described in the device tree as follows (obviously not
      complete):
      
      mscc_felix {
      	dsa,member = <0 0>;
      	ports {
      		port@4 {
      			ethernet = <&enetc_port2>;
      		};
      	};
      };
      
      sja1105_switch1 {
      	dsa,member = <1 1>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port1>;
      		};
      	};
      };
      
      sja1105_switch2 {
      	dsa,member = <2 2>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port2>;
      		};
      	};
      };
      
      sja1105_switch3 {
      	dsa,member = <3 3>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port3>;
      		};
      	};
      };
      
      Basically we instantiate one DSA switch tree for every hardware switch
      in the system, but we still give them globally unique switch IDs (will
      come back to that later). Having 3 disjoint switch trees makes the
      tagger drivers "just work", because net devices are registered for the
      3 Felix DSA master ports, and they are also DSA slave ports to the ENETC
      port. So packets received on the ENETC port are stripped of their
      stacked DSA tags one by one.
      
      Currently, hardware bridging between ports on the same sja1105 chip is
      possible, but switching between sja1105 ports on different chips is
      handled by the software bridge. This is fine, but we can do better.
      
      In fact, the dsa_8021q tag used by sja1105 is compatible with cascading.
      In other words, a sja1105 switch can correctly parse and route a packet
      containing a dsa_8021q tag. So if we could enable hardware bridging on
      the Felix DSA master ports, cross-chip bridging could be completely
      offloaded.
      
      Such as system would be used as follows:
      
      ip link add dev br0 type bridge && ip link set dev br0 up
      for port in sw0p0 sw0p1 sw0p2 sw0p3 \
      	    sw1p0 sw1p1 sw1p2 sw1p3 \
      	    sw2p0 sw2p1 sw2p2 sw2p3; do
      	ip link set dev $port master br0
      done
      
      The above makes switching between ports on the same row be performed in
      hardware, and between ports on different rows in software. Now assume
      the Felix switch ports are called swp0, swp1, swp2. By running the
      following extra commands:
      
      ip link add dev br1 type bridge && ip link set dev br1 up
      for port in swp0 swp1 swp2; do
      	ip link set dev $port master br1
      done
      
      the CPU no longer sees packets which traverse sja1105 switch boundaries
      and can be forwarded directly by Felix. The br1 bridge would not be used
      for any sort of traffic termination.
      
      For this to work, we need to give drivers an opportunity to listen for
      bridging events on DSA trees other than their own, and pass that other
      tree index as argument. I have made the assumption, for the moment, that
      the other existing DSA notifiers don't need to be broadcast to other
      trees. That assumption might turn out to be incorrect. But in the
      meantime, introduce a dsa_broadcast function, similar in purpose to
      dsa_port_notify, which is used only by the bridging notifiers.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f66a6a69