1. 30 Jul, 2021 12 commits
    • Leon Romanovsky's avatar
      devlink: Break parameter notification sequence to be before/after unload/load driver · 05a7f4a8
      Leon Romanovsky authored
      The change of namespaces during devlink reload calls to driver unload
      before it accesses devlink parameters. The commands below causes to
      use-after-free bug when trying to get flow steering mode.
      
       * ip netns add n1
       * devlink dev reload pci/0000:00:09.0 netns n1
      
       ==================================================================
       BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
       Read of size 4 at addr ffff888009d04308 by task devlink/275
      
       CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x93/0xc2
        print_address_description.constprop.0+0x18/0x140
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        kasan_report.cold+0x7c/0xd8
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        devlink_nl_param_fill+0x1c8/0xe80
        ? __free_pages_ok+0x37a/0x8a0
        ? devlink_flash_update_timeout_notify+0xd0/0xd0
        ? lock_acquire+0x1a9/0x6d0
        ? fs_reclaim_acquire+0xb7/0x160
        ? lock_is_held_type+0x98/0x110
        ? 0xffffffff81000000
        ? lock_release+0x1f9/0x6c0
        ? fs_reclaim_release+0xa1/0xf0
        ? lock_downgrade+0x6d0/0x6d0
        ? lock_is_held_type+0x98/0x110
        ? lock_is_held_type+0x98/0x110
        ? memset+0x20/0x40
        ? __build_skb_around+0x1f8/0x2b0
        devlink_param_notify+0x6d/0x180
        devlink_reload+0x1c3/0x520
        ? devlink_remote_reload_actions_performed+0x30/0x30
        ? mutex_trylock+0x24b/0x2d0
        ? devlink_nl_cmd_reload+0x62b/0x1070
        devlink_nl_cmd_reload+0x66d/0x1070
        ? devlink_reload+0x520/0x520
        ? devlink_get_from_attrs+0x1bc/0x260
        ? devlink_nl_pre_doit+0x64/0x4d0
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        ? mutex_lock_io_nested+0x1130/0x1130
        ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
        ? security_capable+0x51/0x90
        genl_rcv_msg+0x27f/0x4a0
        ? genl_get_cmd+0x3c0/0x3c0
        ? lock_acquire+0x1a9/0x6d0
        ? devlink_reload+0x520/0x520
        ? lock_release+0x6c0/0x6c0
        netlink_rcv_skb+0x11d/0x340
        ? genl_get_cmd+0x3c0/0x3c0
        ? netlink_ack+0x9f0/0x9f0
        ? lock_release+0x1f9/0x6c0
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        ? netlink_attachskb+0x730/0x730
        ? _copy_from_iter_full+0x178/0x650
        ? __alloc_skb+0x113/0x2b0
        netlink_sendmsg+0x6f1/0xbd0
        ? netlink_unicast+0x700/0x700
        ? lock_is_held_type+0x98/0x110
        ? netlink_unicast+0x700/0x700
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        ? __x64_sys_getpeername+0xb0/0xb0
        ? do_sys_openat2+0x10b/0x370
        ? __up_read+0x1a1/0x7b0
        ? do_user_addr_fault+0x219/0xdc0
        ? __x64_sys_openat+0x120/0x1d0
        ? __x64_sys_open+0x1a0/0x1a0
        __x64_sys_sendto+0xdd/0x1b0
        ? syscall_enter_from_user_mode+0x1d/0x50
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7fc69d0af14a
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
       RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a
       RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003
       RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
       Allocated by task 146:
        kasan_save_stack+0x1b/0x40
        __kasan_kmalloc+0x99/0xc0
        mlx5_init_fs+0xf0/0x1c50 [mlx5_core]
        mlx5_load+0xd2/0x180 [mlx5_core]
        mlx5_init_one+0x2f6/0x450 [mlx5_core]
        probe_one+0x47d/0x6e0 [mlx5_core]
        pci_device_probe+0x2a0/0x4a0
        really_probe+0x20a/0xc90
        driver_probe_device+0xd8/0x380
        device_driver_attach+0x1df/0x250
        __driver_attach+0xff/0x240
        bus_for_each_dev+0x11e/0x1a0
        bus_add_driver+0x309/0x570
        driver_register+0x1ee/0x380
        0xffffffffa06b8062
        do_one_initcall+0xd5/0x410
        do_init_module+0x1c8/0x760
        load_module+0x6d8b/0x9650
        __do_sys_finit_module+0x118/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       Freed by task 275:
        kasan_save_stack+0x1b/0x40
        kasan_set_track+0x1c/0x30
        kasan_set_free_info+0x20/0x30
        __kasan_slab_free+0x102/0x140
        slab_free_freelist_hook+0x74/0x1b0
        kfree+0xd7/0x2a0
        mlx5_unload+0x16/0xb0 [mlx5_core]
        mlx5_unload_one+0xae/0x120 [mlx5_core]
        mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core]
        devlink_reload+0x141/0x520
        devlink_nl_cmd_reload+0x66d/0x1070
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        genl_rcv_msg+0x27f/0x4a0
        netlink_rcv_skb+0x11d/0x340
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6f1/0xbd0
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        __x64_sys_sendto+0xdd/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       The buggy address belongs to the object at ffff888009d04300
        which belongs to the cache kmalloc-128 of size 128
       The buggy address is located 8 bytes inside of
        128-byte region [ffff888009d04300, ffff888009d04380)
       The buggy address belongs to the page:
       page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04
       head:0000000086a64ecc order:1 compound_mapcount:0
       flags: 0x4000000000010200(slab|head)
       raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0
       raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
        ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      
      The right solution to devlink reload is to notify about deletion of
      parameters, unload driver, change net namespaces, load driver and notify
      about addition of parameters.
      
      Fixes: 070c63f2 ("net: devlink: allow to change namespaces during reload")
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      05a7f4a8
    • Paolo Abeni's avatar
      sk_buff: avoid potentially clearing 'slow_gro' field · a432934a
      Paolo Abeni authored
      If skb_dst_set_noref() is invoked with a NULL dst, the 'slow_gro'
      field is cleared, too. That could lead to wrong behavior if
      the skb later enters the GRO stage.
      
      Fix the potential issue replacing preserving a non-zero value of
      the 'slow_gro' field.
      
      Additionally, fix a comment typo.
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Fixes: 8a886b14 ("sk_buff: track dst status in slow_gro")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/aa42529252dc8bb02bd42e8629427040d1058537.1627662501.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a432934a
    • Yajun Deng's avatar
      net: netlink: Remove unused function · bc830525
      Yajun Deng authored
      lockdep_genl_is_held() and its caller arm not used now, just remove them.
      Signed-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Link: https://lore.kernel.org/r/20210729074854.8968-1-yajun.deng@linux.devSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc830525
    • Jakub Kicinski's avatar
      Merge branch 'nfc-constify-pointed-data-missed-part' · 373a1f2b
      Jakub Kicinski authored
      Krzysztof Kozlowski says:
      
      ====================
      nfc: constify pointed data - missed part
      
      This was previously sent [1] but got lost. It was a prerequisite to part two of NFC const [2].
      
      Changes since v2:
      1. Drop patch previously 7/8 which cases new warnings "warning: Using
         plain integer as NULL pointer".
      
      Changes since v1:
      1. Add patch 1/8 fixing up nfcmrvl_spi_parse_dt()
      
      [1] https://lore.kernel.org/lkml/20210726145224.146006-1-krzysztof.kozlowski@canonical.com/
      [2] https://lore.kernel.org/linux-nfc/20210729104022.47761-1-krzysztof.kozlowski@canonical.com/T/#m199fbdde180fa005a10addf28479fcbdc6263eab
      ====================
      
      Link: https://lore.kernel.org/r/20210730144202.255890-1-krzysztof.kozlowski@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      373a1f2b
    • Krzysztof Kozlowski's avatar
      nfc: hci: cleanup unneeded spaces · 77411df5
      Krzysztof Kozlowski authored
      No need for multiple spaces in variable declaration (the code does not
      use them in other places).  No functional change.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      77411df5
    • Krzysztof Kozlowski's avatar
      nfc: nci: constify several pointers to u8, sk_buff and other structs · ddecf555
      Krzysztof Kozlowski authored
      Several functions receive pointers to u8, sk_buff or other structs but
      do not modify the contents so make them const.  This allows doing the
      same for local variables and in total makes the code a little bit safer.
      
      This makes const also data passed as "unsigned long opt" argument to
      nci_request() function.  Usual flow for such functions is:
      1. Receive "u8 *" and store it (the pointer) in a structure
         allocated on stack (e.g. struct nci_set_config_param),
      2. Call nci_request() or __nci_request() passing a callback function an
         the pointer to the structure via an "unsigned long opt",
      3. nci_request() calls the callback which dereferences "unsigned long
         opt" in a read-only way.
      
      This converts all above paths to use proper pointer to const data, so
      entire flow is safer.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ddecf555
    • Krzysztof Kozlowski's avatar
      nfc: constify local pointer variables · f2479c0a
      Krzysztof Kozlowski authored
      Few pointers to struct nfc_target and struct nfc_se can be made const.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2479c0a
    • Krzysztof Kozlowski's avatar
      nfc: constify several pointers to u8, char and sk_buff · 3df40eb3
      Krzysztof Kozlowski authored
      Several functions receive pointers to u8, char or sk_buff but do not
      modify the contents so make them const.  This allows doing the same for
      local variables and in total makes the code a little bit safer.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3df40eb3
    • Krzysztof Kozlowski's avatar
      nfc: hci: annotate nfc_llc_init() as __init · 4932c378
      Krzysztof Kozlowski authored
      The nfc_llc_init() is used only in other __init annotated context.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4932c378
    • Krzysztof Kozlowski's avatar
      nfc: annotate af_nfc_exit() as __exit · bf6cd772
      Krzysztof Kozlowski authored
      The af_nfc_exit() is used only in other __exit annotated context
      (nfc_exit()).
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf6cd772
    • Krzysztof Kozlowski's avatar
      nfc: mrvl: correct nfcmrvl_spi_parse_dt() device_node argument · 3833b874
      Krzysztof Kozlowski authored
      The device_node in nfcmrvl_spi_parse_dt() cannot be const as it is
      passed to OF functions which modify it.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3833b874
    • Yajun Deng's avatar
      net: convert fib_treeref from int to refcount_t · 79976892
      Yajun Deng authored
      refcount_t type should be used instead of int when fib_treeref is used as
      a reference counter,and avoid use-after-free risks.
      Signed-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210729071350.28919-1-yajun.deng@linux.devSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      79976892
  2. 29 Jul, 2021 28 commits
    • Tang Bin's avatar
      bcm63xx_enet: delete a redundant assignment · 3e12361b
      Tang Bin authored
      In the function bcm_enetsw_probe(), 'ret' will be assigned by
      bcm_enet_change_mtu(), so 'ret = 0' make no sense.
      Signed-off-by: default avatarZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Signed-off-by: default avatarTang Bin <tangbin@cmss.chinamobile.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e12361b
    • Vladimir Oltean's avatar
      net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge · bea79078
      Vladimir Oltean authored
      DSA has gained the recent ability to deal gracefully with upper
      interfaces it cannot offload, such as the bridge, bonding or team
      drivers. When such uppers exist, the ports are still in standalone mode
      as far as the hardware is concerned.
      
      But when we deliver packets to the software bridge in order for that to
      do the forwarding, there is an unpleasant surprise in that the bridge
      will refuse to forward them. This is because we unconditionally set
      skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
      were already forwarded in hardware by us.
      
      Since dp->bridge_dev is populated only when there is hardware offload
      for it, but not in the software fallback case, let's introduce a new
      helper that can be called from the tagger data path which sets the
      skb->offload_fwd_mark accordingly to zero when there is no hardware
      offload for bridging. This lets the bridge forward packets back to other
      interfaces of our switch, if needed.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bea79078
    • Di Zhu's avatar
      ipvlan: Add handling of NETDEV_UP events · 57fb346c
      Di Zhu authored
      When an ipvlan device is created on a bond device, the link state
      of the ipvlan device may be abnormal. This is because bonding device
      allows to add physical network card device in the down state and so
      NETDEV_CHANGE event will not be notified to other listeners, so ipvlan
      has no chance to update its link status.
      
      The following steps can cause such problems:
      	1) bond0 is down
      	2) ip link add link bond0 name ipvlan type ipvlan mode l2
      	3) echo +enp2s7 >/sys/class/net/bond0/bonding/slaves
      	4) ip link set bond0 up
      
      After these steps, use ip link command, we found ipvlan has NO-CARRIER:
        ipvlan@bond0: <NO-CARRIER, BROADCAST,MULTICAST,UP,M-DOWN> mtu ...>
      
      We can deal with this problem like VLAN: Add handling of NETDEV_UP
      events. If we receive NETDEV_UP event, we will update the link status
      of the ipvlan.
      Signed-off-by: default avatarDi Zhu <zhudi21@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57fb346c
    • Davide Caratti's avatar
      net/sched: store the last executed chain also for clsact egress · 3aa26055
      Davide Caratti authored
      currently, only 'ingress' and 'clsact ingress' qdiscs store the tc 'chain
      id' in the skb extension. However, userspace programs (like ovs) are able
      to setup egress rules, and datapath gets confused in case it doesn't find
      the 'chain id' for a packet that's "recirculated" by tc.
      Change tcf_classify() to have the same semantic as tcf_classify_ingress()
      so that a single function can be called in ingress / egress, using the tc
      ingress / egress block respectively.
      Suggested-by: default avatarAlaa Hleilel <alaa@nvidia.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3aa26055
    • David S. Miller's avatar
      Merge branch 'dpaa2-switch-add-mirroring-support' · b2492d50
      David S. Miller authored
      Ioana Ciornei says:
      
      ====================
      dpaa2-switch: add mirroring support
      
      This patch set adds per port and per VLAN mirroring in dpaa2-switch.
      
      The first 4 patches are just cosmetic changes. We renamed the
      dpaa2_switch_acl_tbl structure into dpaa2_switch_filter_block so that we
      can reuse it for filters that do not use the ACL table and reorganized
      the addition of trap, redirect and drop filters into a separate
      function. All this just to make for a more streamlined addition of the
      support for mirroring.
      
      The next 4 patches are actually adding the advertised support. Mirroring
      rules can be added in shared blocks, the driver will replicate the same
      configuration on all the switch ports part of the same block.
      
      The last patch documents the feature, presents its behavior and
      limitations and gives a couple of examples.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2492d50
    • Ioana Ciornei's avatar
      docs: networking: dpaa2: document mirroring support on the switch · d1626a1c
      Ioana Ciornei authored
      Document the mirroring capabilities of the dpaa2-switch driver,
      any restrictions that are imposed and some example commands.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1626a1c
    • Ioana Ciornei's avatar
      dpaa2-switch: offload shared block mirror filters when binding to a port · 7a91f907
      Ioana Ciornei authored
      When mirroring rules are added in shared filter blocks, the same
      mirroring rule has to be configured on all the switch ports that are
      part of the same block.
      
      In case a switch port joins a shared block after mirroring filters have
      been already added to it, then all the mirror rules should be offloaded
      to the port. The reverse, removal of mirroring rules, has to be done at
      block unbind.
      
      For this purpose, the dpaa2_switch_block_offload_mirror() and
      dpaa2_switch_block_unoffload_mirror() functions are added and called
      upon binding and unbinding a switch port to/from a block.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a91f907
    • Ioana Ciornei's avatar
      dpaa2-switch: add VLAN based mirroring · 0f3faece
      Ioana Ciornei authored
      Using the infrastructure added in the previous patch, extend tc-flower
      support with FLOW_ACTION_MIRRED based on VLAN.
      
      Tested with:
      
      tc qdisc add dev eth8 ingress_block 1 clsact
      tc filter add block 1 ingress protocol 802.1q flower skip_sw \
      	vlan_id 100 action mirred egress mirror dev eth6
      tc filter del block 1 ingress pref 49152
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f3faece
    • Ioana Ciornei's avatar
      dpaa2-switch: add support for port mirroring · e0ead825
      Ioana Ciornei authored
      Add support for per port mirroring for the DPAA2 switch. We support
      only single mirror port, therefore we allow mirroring rules only as long
      as the destination port is always the same.
      
      Unlike all the actions (drop, redirect, trap) already supported by the
      dpaa2-switch driver, adding mirroring filters in shared blocks is not
      achieved by a singular ACL entry added in a table shared by the ports.
      This is why, when a new mirror filter is added in a block we have to got
      through all the switch ports sharing it and configure the filter
      individually on all.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0ead825
    • Ioana Ciornei's avatar
      dpaa2-switch: add API for setting up mirroring · cbc2a889
      Ioana Ciornei authored
      Add the necessary MC API for setting up and configuring the mirroring
      feature on the DPSW DPAA2 object.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbc2a889
    • Ioana Ciornei's avatar
      dpaa2-switch: reorganize dpaa2_switch_cls_matchall_replace · 3fa5514a
      Ioana Ciornei authored
      Extract the necessary steps to offload a filter by using the ACL table
      in a separate function - dpaa2_switch_cls_matchall_replace_acl().
      
      This is intended to help with the code readability when the mirroring
      support is added.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fa5514a
    • Ioana Ciornei's avatar
      dpaa2-switch: reorganize dpaa2_switch_cls_flower_replace · c5f6d490
      Ioana Ciornei authored
      Extract the necessary steps to offload a filter by using the ACL table
      in a separate function - dpaa2_switch_cls_flower_replace_acl().
      This is intended to help with the code readability when the mirroring
      support is added.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5f6d490
    • Ioana Ciornei's avatar
      dpaa2-switch: rename dpaa2_switch_acl_tbl into filter_block · adcb7aa3
      Ioana Ciornei authored
      Until now, shared filter blocks were implemented only by ACL tables
      shared between ports. Going forward, when the mirroring support will be
      added, this will not be true anymore.
      
      Rename the dpaa2_switch_acl_tbl into dpaa2_switch_filter_block so that
      we make it clear that the structure is used not only for filters that
      use the ACL table but will be used for all the filters that are added in
      a block.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adcb7aa3
    • Ioana Ciornei's avatar
      dpaa2-switch: rename dpaa2_switch_tc_parse_action to specify the ACL · 3b5d8b44
      Ioana Ciornei authored
      Until now, the dpaa2_switch_tc_parse_action() function was used for all
      the supported tc actions since all of them were implemented by adding
      ACL table entries. In the next commits, the dpaa2-switch driver will
      gain mirroring support which is not using the same HW feature.
      
      Make sure that we specify the ACL in the function name so that we make
      it clear that it's only used for specific actions.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b5d8b44
    • Shai Malin's avatar
      qede: Remove the qede module version · 88ea96f8
      Shai Malin authored
      Removing the qede module version which is not needed and not allowed
      with inbox drivers.
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88ea96f8
    • Shai Malin's avatar
      qed: Remove the qed module version · 7a3febed
      Shai Malin authored
      Removing the qed module version which is not needed and not allowed
      with inbox drivers.
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a3febed
    • David S. Miller's avatar
      Merge branch 'sja110-vlan-fixes' · 3bdf4d61
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      NXP SJA1105 VLAN regressions
      
      These are 3 patches to fix issues seen with some more varied testing
      done after the changes in the "Traffic termination for sja1105 ports
      under VLAN-aware bridge" series were made:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210726165536.1338471-1-vladimir.oltean@nxp.com/
      
      Issue 1: traffic no longer works on a port after leaving a VLAN-aware bridge
      Issue 2: untagged traffic not dropped if pvid is absent from a VLAN-aware port
      Issue 3: PTP and STP broken on ports under a VLAN-aware bridge
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bdf4d61
    • Vladimir Oltean's avatar
      net: dsa: tag_sja1105: fix control packets on SJA1110 being received on an imprecise port · 04a17583
      Vladimir Oltean authored
      On RX, a control packet with SJA1110 will have:
      - an in-band control extension (DSA tag) composed of a header and an
        optional trailer (if it is a timestamp frame). We can (and do) deduce
        the source port and switch id from this.
      - a VLAN header, which can either be the tag_8021q RX VLAN (pvid) or the
        bridge VLAN. The sja1105_vlan_rcv() function attempts to deduce the
        source port and switch id a second time from this.
      
      The basic idea is that even though we don't need the source port
      information from the tag_8021q header if it's a control packet, we do
      need to strip that header before we pass it on to the network stack.
      
      The problem is that we call sja1105_vlan_rcv for ports under VLAN-aware
      bridges, and that function tells us it couldn't identify a tag_8021q
      header, so we need to perform imprecise RX by VID. Well, we don't,
      because we already know the source port and switch ID.
      
      This patch drops the return value from sja1105_vlan_rcv and we just look
      at the source_port and switch_id values from sja1105_rcv and sja1110_rcv
      which were initialized to -1. If they are still -1 it means we need to
      perform imprecise RX.
      
      Fixes: 884be12f ("net: dsa: sja1105: add support for imprecise RX")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04a17583
    • Vladimir Oltean's avatar
      net: dsa: sja1105: make sure untagged packets are dropped on ingress ports with no pvid · bef0746c
      Vladimir Oltean authored
      Surprisingly, this configuration:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp2 master br0
      bridge vlan del dev swp2 vid 1
      
      still has the sja1105 switch sending untagged packets to the CPU (and
      failing to decode them, since dsa_find_designated_bridge_port_by_vid
      searches by VID 1 and rightfully finds no bridge VLAN 1 on a port).
      
      Dumping the switch configuration, the VLANs are managed properly:
      - the pvid of swp2 is 1 in the MAC Configuration Table, but
      - only the CPU port is in the port membership of VLANID 1 in the VLAN
        Lookup Table
      
      When the ingress packets are tagged with VID 1, they are properly
      dropped. But when they are untagged, they are able to reach the CPU
      port. Also, when the pvid in the MAC Configuration Table is changed to
      e.g. 55 (an unused VLAN), the untagged packets are also dropped.
      
      So it looks like:
      - the switch bypasses ingress VLAN membership checks for untagged traffic
      - the reason why the untagged traffic is dropped when I make the pvid 55
        is due to the lack of valid destination ports in VLAN 55, rather than
        an ingress membership violation
      - the ingress VLAN membership cheks are only done for VLAN-tagged traffic
      
      Interesting. It looks like there is an explicit bit to drop untagged
      traffic, so we should probably be using that to preserve user expectations.
      
      Note that only VLAN-aware ports should drop untagged packets due to no
      pvid - when VLAN-unaware, the software bridge doesn't do this even if
      there is no pvid on any bridge port and on the bridge itself. So the new
      sja1105_drop_untagged() function cannot simply be called with "false"
      from sja1105_bridge_vlan_add() and with "true" from sja1105_bridge_vlan_del.
      Instead, we need to also consider the VLAN awareness state. That means
      we need to hook the "drop untagged" setting in all the same places where
      the "commit pvid" logic is, and it needs to factor in all the state when
      flipping the "drop untagged" bit: is our current pvid in the VLAN Lookup
      Table, and is the current port in that VLAN's port membership list?
      VLAN-unaware ports will never drop untagged frames because these checks
      always succeed by construction, and the tag_8021q VLANs cannot be changed
      by the user.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bef0746c
    • Vladimir Oltean's avatar
      net: dsa: sja1105: reset the port pvid when leaving a VLAN-aware bridge · cde8078e
      Vladimir Oltean authored
      Now that we no longer have the ultra-central sja1105_build_vlan_table(),
      we need to be more careful about checking all corner cases manually.
      
      For example, when a port leaves a VLAN-aware bridge, it becomes
      standalone so its pvid should become a tag_8021q RX VLAN again. However,
      sja1105_commit_pvid() only gets called from sja1105_bridge_vlan_add()
      and from sja1105_vlan_filtering(), and no VLAN awareness change takes
      place (VLAN filtering is a global setting for sja1105, so the switch
      remains VLAN-aware overall).
      
      This means that we need to put another sja1105_commit_pvid() call in
      sja1105_bridge_member().
      
      Fixes: 6dfd23d3 ("net: dsa: sja1105: delete vlan delta save/restore logic")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cde8078e
    • David S. Miller's avatar
      Merge branch 'mctp' · e5fe3a5f
      David S. Miller authored
      Jeremy Kerr says:
      
      ====================
      Add Management Component Transport Protocol support
      
      This series adds core MCTP support to the kernel. From the Kconfig
      description:
      
        Management Component Transport Protocol (MCTP) is an in-system
        protocol for communicating between management controllers and
        their managed devices (peripherals, host processors, etc.). The
        protocol is defined by DMTF specification DSP0236.
      
        This option enables core MCTP support. For communicating with other
        devices, you'll want to enable a driver for a specific hardware
        channel.
      
      This implementation allows a sockets-based API for sending and receiving
      MCTP messages via sendmsg/recvmsg on SOCK_DGRAM sockets. Kernel stack
      control is all via netlink, using existing RTM_* messages. The userspace
      ABI change is fairly small; just the necessary AF_/ETH_P_/ARPHDR_
      constants, a new sockaddr, and a new netlink attribute.
      
      For MAINTAINERS, I've just included netdev@ as the list entry. I'm happy
      to alter this based on preferences here - an alternative would be the
      OpenBMC list (the main user of the MCTP interface), or we can create a
      new list entirely.
      
      We have a couple of interface drivers almost ready to go at the moment,
      but those can wait until the core code has some review.
      
      This is v4 of the series; v1 and v2 were both RFC.
      
      selinux folks: CCing 01/15 due to the new PF_MCTP protocol family.
      
      linux-doc folks: CCing 15/15 for the new MCTP overview document.
      
      Review, comments, questions etc. are most welcome.
      
      Cheers,
      
      Jeremy
      
      v2:
       - change to match spec terminology: controller -> component
       - require specific capabilities for bind() & sendmsg()
       - add address and tag defintions to uapi
       - add selinux AF_MCTP table definitions
       - remove strict cflags; warnings are present in common headers
      
      v3:
       - require caps for MCTP bind() & send()
       - comment typo fixes
       - switch to an array for local EIDs
       - fix addrinfo dump iteration & error path
       - add RTM_DELADDR
       - remove GENMASK() and BIT() from uapi
      
      v4:
       - drop tun patch; that can be submitted separately
       - keep nipa happy: add maintainer CCs, including doc and selinux
       - net-next rebase
       - Include AF_MCTP in af_family_slock_keys and pf_family_names
       - Introduce MODULE_ definitions earlier
       - upstream change: set_link_af no longer called with RTNL held
       - add kdoc for net_device.mctp_ptr
       - don't inline mctp_rt_match_eid
       - require rtm_type == RTN_UNICAST in route management handlers
       - remove unused RTAX policy table
       - fix mctp_sock->keys rcu annotations
       - fix spurious rcu_read_unlock in route input
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5fe3a5f
    • Jeremy Kerr's avatar
      mctp: Add MCTP overview document · 6a2d98b1
      Jeremy Kerr authored
      This change adds a brief document about the sockets API provided for
      sending and receiving MCTP messages from userspace.
      
      This is roughly based on the OpenBMC design document, at:
      
        https://github.com/openbmc/docs/blob/master/designs/mctp/mctp-kernel.mdSigned-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a2d98b1
    • Matt Johnston's avatar
      mctp: Allow per-netns default networks · 03f2bbc4
      Matt Johnston authored
      Currently we have a compile-time default network
      (MCTP_INITIAL_DEFAULT_NET). This change introduces a default_net field
      on the net namespace, allowing future configuration for new interfaces.
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03f2bbc4
    • Matt Johnston's avatar
      mctp: Add dest neighbour lladdr to route output · 26ab3fca
      Matt Johnston authored
      Now that we have a neighbour implementation, hook it up to the output
      path to set the dest hardware address for outgoing packets.
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26ab3fca
    • Jeremy Kerr's avatar
      mctp: Implement message fragmentation & reassembly · 4a992bbd
      Jeremy Kerr authored
      This change implements MCTP fragmentation (based on route & device MTU),
      and corresponding reassembly.
      
      The MCTP specification only allows for fragmentation on the originating
      message endpoint, and reassembly on the destination endpoint -
      intermediate nodes do not need to reassemble/refragment.  Consequently,
      we only fragment in the local transmit path, and reassemble
      locally-bound packets. Messages are required to be in-order, so we
      simply cancel reassembly on out-of-order or missing packets.
      
      In the fragmentation path, we just break up the message into MTU-sized
      fragments; the skb structure is a simple copy for now, which we can later
      improve with a shared data implementation.
      
      For reassembly, we keep track of incoming message fragments using the
      existing tag infrastructure, allocating a key on the (src,dest,tag)
      tuple, and reassembles matching fragments into a skb->frag_list.
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a992bbd
    • Jeremy Kerr's avatar
      mctp: Populate socket implementation · 833ef3b9
      Jeremy Kerr authored
      Start filling-out the socket syscalls: bind, sendmsg & recvmsg.
      
      This requires an input route implementation, so we add to
      mctp_route_input, allowing lookups on binds & message tags. This just
      handles single-packet messages at present, we will add fragmentation in
      a future change.
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      833ef3b9
    • Matt Johnston's avatar
      mctp: Add neighbour netlink interface · 831119f8
      Matt Johnston authored
      This change adds the netlink interfaces for manipulating the MCTP
      neighbour table.
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      831119f8
    • Matt Johnston's avatar
      mctp: Add neighbour implementation · 4d8b9319
      Matt Johnston authored
      Add an initial neighbour table implementation, to be used in the route
      output path.
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d8b9319