1. 21 Oct, 2022 11 commits
  2. 19 Oct, 2022 29 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf,x64: Use BMI2 for shifts' · 04a8f9d7
      Alexei Starovoitov authored
      Jie Meng says:
      
      ====================
      
      With baseline x64 instruction set, shift count can only be an immediate
      or in %cl. The implicit dependency on %cl makes it necessary to shuffle
      registers around and/or add push/pop operations.
      
      BMI2 provides shift instructions that can use any general register as
      the shift count, saving us instructions and a few bytes in most cases.
      
      Suboptimal codegen when %ecx is source and/or destination is also
      addressed and unnecessary instructions are removed.
      
      test_progs: Summary: 267/1340 PASSED, 25 SKIPPED, 0 FAILED
      test_progs-no_alu32: Summary: 267/1333 PASSED, 26 SKIPPED, 0 FAILED
      test_verifier: Summary: 1367 PASSED, 636 SKIPPED, 0 FAILED (same result
       with or without BMI2)
      test_maps: OK, 0 SKIPPED
      lib/test_bpf:
        test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
        test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
        test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      ---
      v4 -> v5:
      - More comments regarding instruction encoding
      v3 -> v4:
      - Fixed a regression when BMI2 isn't available
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      04a8f9d7
    • Jie Meng's avatar
      bpf: add selftests for lsh, rsh, arsh with reg operand · 8662de23
      Jie Meng authored
      Current tests cover only shifts with an immediate as the source
      operand/shift counts; add a new test case to cover register operand.
      Signed-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-4-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8662de23
    • Jie Meng's avatar
      bpf,x64: use shrx/sarx/shlx when available · 77d8f5d4
      Jie Meng authored
      BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX
      encoding but target general purpose registers [1]. They allow the shift
      count in any general purpose register and have the same performance as
      non BMI2 shift instructions [2].
      
      Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx),
      emit their more flexible alternatives provided in BMI2 when advantageous;
      keep using the non BMI2 instructions when shift count is already in
      BPF_REG_4/%rcx as non BMI2 instructions are shorter.
      
      To summarize, when BMI2 is available:
      -------------------------------------------------
                  |   arbitrary dst
      =================================================
      src == ecx  |   shl dst, cl
      -------------------------------------------------
      src != ecx  |   shlx dst, dst, src
      -------------------------------------------------
      
      And no additional register shuffling is needed.
      
      A concrete example between non BMI2 and BMI2 codegen.  To shift %rsi by
      %rdi:
      
      Without BMI2:
      
       ef3:   push   %rcx
              51
       ef4:   mov    %rdi,%rcx
              48 89 f9
       ef7:   shl    %cl,%rsi
              48 d3 e6
       efa:   pop    %rcx
              59
      
      With BMI2:
      
       f0b:   shlx   %rdi,%rsi,%rsi
              c4 e2 c1 f7 f6
      
      [1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
      [2] https://www.agner.org/optimize/instruction_tables.pdfSigned-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-3-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      77d8f5d4
    • Jie Meng's avatar
      bpf,x64: avoid unnecessary instructions when shift dest is ecx · 81b35e7c
      Jie Meng authored
      x64 JIT produces redundant instructions when a shift operation's
      destination register is BPF_REG_4/ecx and this patch removes them.
      
      Specifically, when dest reg is BPF_REG_4 but the src isn't, we
      needn't push and pop ecx around shift only to get it overwritten
      by r11 immediately afterwards.
      
      In the rare case when both dest and src registers are BPF_REG_4,
      a single shift instruction is sufficient and we don't need the
      two MOV instructions around the shift.
      
      To summarize using shift left as an example, without patch:
      -------------------------------------------------
                  |   dst == ecx     |    dst != ecx
      =================================================
      src == ecx  |   mov r11, ecx   |    shl dst, cl
                  |   shl r11, ecx   |
                  |   mov ecx, r11   |
      -------------------------------------------------
      src != ecx  |   mov r11, ecx   |    push ecx
                  |   push ecx       |    mov ecx, src
                  |   mov ecx, src   |    shl dst, cl
                  |   shl r11, cl    |    pop ecx
                  |   pop ecx        |
                  |   mov ecx, r11   |
      -------------------------------------------------
      
      With patch:
      -------------------------------------------------
                  |   dst == ecx     |    dst != ecx
      =================================================
      src == ecx  |   shl ecx, cl    |    shl dst, cl
      -------------------------------------------------
      src != ecx  |   mov r11, ecx   |    push ecx
                  |   mov ecx, src   |    mov ecx, src
                  |   shl r11, cl    |    shl dst, cl
                  |   mov ecx, r11   |    pop ecx
      -------------------------------------------------
      Signed-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-2-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      81b35e7c
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf: support non-mmap()'able data sections' · 7d8d5355
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Make libbpf more conservative in using BPF_F_MMAPABLE flag with internal BPF
      array maps that are backing global data sections. See patch #2 for full
      description and justification.
      
      Changes in this dataset support having bpf_spinlock, kptr, rb_tree nodes and
      other "special" variables as global variables. Combining this with libbpf's
      existing support for multiple custom .data.* sections allows BPF programs to
      utilize multiple spinlock/rbtree_node/kptr variables in a pretty natural way
      by just putting all such variables into separate data sections (and thus ARRAY
      maps).
      
      v1->v2:
        - address Stanislav's feedback, adds acks.
      ====================
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d8d5355
    • Andrii Nakryiko's avatar
      libbpf: add non-mmapable data section selftest · 2f968e9f
      Andrii Nakryiko authored
      Add non-mmapable data section to test_skeleton selftest and make sure it
      really isn't mmapable by trying to mmap() it anyways.
      
      Also make sure that libbpf doesn't report BPF_F_MMAPABLE flag to users.
      
      Additional, some more manual testing was performed that this feature
      works as intended.
      
      Looking at created map through bpftool shows that flags passed to kernel are
      indeed zero:
      
        $ bpftool map show
        ...
        1782: array  name .data.non_mmapa  flags 0x0
                key 4B  value 16B  max_entries 1  memlock 4096B
                btf_id 1169
                pids test_progs(8311)
        ...
      
      Checking BTF uploaded to kernel for this map shows that zero_key and
      zero_value are indeed marked as static, even though zero_key is actually
      original global (but STV_HIDDEN) variable:
      
        $ bpftool btf dump id 1169
        ...
        [51] VAR 'zero_key' type_id=2, linkage=static
        [52] VAR 'zero_value' type_id=7, linkage=static
        ...
        [62] DATASEC '.data.non_mmapable' size=16 vlen=2
                type_id=51 offset=0 size=4 (VAR 'zero_key')
                type_id=52 offset=4 size=12 (VAR 'zero_value')
        ...
      
      And original BTF does have zero_key marked as linkage=global:
      
        $ bpftool btf dump file test_skeleton.bpf.linked3.o
        ...
        [51] VAR 'zero_key' type_id=2, linkage=global
        [52] VAR 'zero_value' type_id=7, linkage=static
        ...
        [62] DATASEC '.data.non_mmapable' size=16 vlen=2
                type_id=51 offset=0 size=4 (VAR 'zero_key')
                type_id=52 offset=4 size=12 (VAR 'zero_value')
      
      Bpftool didn't require any changes at all because it checks whether internal
      map is mmapable already, but just to double-check generated skeleton, we
      see that .data.non_mmapable neither sets mmaped pointer nor has
      a corresponding field in the skeleton:
      
        $ grep non_mmapable test_skeleton.skel.h
                        struct bpf_map *data_non_mmapable;
                s->maps[7].name = ".data.non_mmapable";
                s->maps[7].map = &obj->maps.data_non_mmapable;
      
      But .data.read_mostly has all of those things:
      
        $ grep read_mostly test_skeleton.skel.h
                        struct bpf_map *data_read_mostly;
                struct test_skeleton__data_read_mostly {
                        int read_mostly_var;
                } *data_read_mostly;
                s->maps[6].name = ".data.read_mostly";
                s->maps[6].map = &obj->maps.data_read_mostly;
                s->maps[6].mmaped = (void **)&obj->data_read_mostly;
                _Static_assert(sizeof(s->data_read_mostly->read_mostly_var) == 4, "unexpected size of 'read_mostly_var'");
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221019002816.359650-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2f968e9f
    • Andrii Nakryiko's avatar
      libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars · 4fcac46c
      Andrii Nakryiko authored
      Teach libbpf to not add BPF_F_MMAPABLE flag unnecessarily for ARRAY maps
      that are backing data sections, if such data sections don't expose any
      variables to user-space. Exposed variables are those that have
      STB_GLOBAL or STB_WEAK ELF binding and correspond to BTF VAR's
      BTF_VAR_GLOBAL_ALLOCATED linkage.
      
      The overall idea is that if some data section doesn't have any variable that
      is exposed through BPF skeleton, then there is no reason to make such
      BPF array mmapable. Making BPF array mmapable is not a free no-op
      action, because BPF verifier doesn't allow users to put special objects
      (such as BPF spin locks, RB tree nodes, linked list nodes, kptrs, etc;
      anything that has a sensitive internal state that should not be modified
      arbitrarily from user space) into mmapable arrays, as there is no way to
      prevent user space from corrupting such sensitive state through direct
      memory access through memory-mapped region.
      
      By making sure that libbpf doesn't add BPF_F_MMAPABLE flag to BPF array
      maps corresponding to data sections that only have static variables
      (which are not supposed to be visible to user space according to libbpf
      and BPF skeleton rules), users now can have spinlocks, kptrs, etc in
      either default .bss/.data sections or custom .data.* sections (assuming
      there are no global variables in such sections).
      
      The only possible hiccup with this approach is the need to use global
      variables during BPF static linking, even if it's not intended to be
      shared with user space through BPF skeleton. To allow such scenarios,
      extend libbpf's STV_HIDDEN ELF visibility attribute handling to
      variables. Libbpf is already treating global hidden BPF subprograms as
      static subprograms and adjusts BTF accordingly to make BPF verifier
      verify such subprograms as static subprograms with preserving entire BPF
      verifier state between subprog calls. This patch teaches libbpf to treat
      global hidden variables as static ones and adjust BTF information
      accordingly as well. This allows to share variables between multiple
      object files during static linking, but still keep them internal to BPF
      program and not get them exposed through BPF skeleton.
      
      Note, that if the user has some advanced scenario where they absolutely
      need BPF_F_MMAPABLE flag on .data/.bss/.rodata BPF array map despite
      only having static variables, they still can achieve this by forcing it
      through explicit bpf_map__set_map_flags() API.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20221019002816.359650-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4fcac46c
    • Andrii Nakryiko's avatar
      libbpf: clean up and refactor BTF fixup step · f33f742d
      Andrii Nakryiko authored
      Refactor libbpf's BTF fixup step during BPF object open phase. The only
      functional change is that we now ignore BTF_VAR_GLOBAL_EXTERN variables
      during fix up, not just BTF_VAR_STATIC ones, which shouldn't cause any
      change in behavior as there shouldn't be any extern variable in data
      sections for valid BPF object anyways.
      
      Otherwise it's just collapsing two functions that have no reason to be
      separate, and switching find_elf_var_offset() helper to return entire
      symbol pointer, not just its offset. This will be used by next patch to
      get ELF symbol visibility.
      
      While refactoring, also "normalize" debug messages inside
      btf_fixup_datasec() to follow general libbpf style and print out data
      section name consistently, where it's available.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221019002816.359650-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f33f742d
    • Daniel Müller's avatar
      bpf/docs: Summarize CI system and deny lists · 81bfcc3f
      Daniel Müller authored
      This change adds a brief summary of the BPF continuous integration (CI)
      to the BPF selftest documentation. The summary focuses not so much on
      actual workings of the CI, as it is maintained outside of the
      repository, but aims to document the few bits of it that are sourced
      from this repository and that developers may want to adjust as part of
      patch submissions: the BPF kernel configuration and the deny list
      file(s).
      
      Changelog:
      - v1->v2:
        - use s390x instead of s390 for consistency
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20221018164015.1970862-1-deso@posteo.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      81bfcc3f
    • Daniel Müller's avatar
      samples/bpf: Fix typos in README · 2c4d72d6
      Daniel Müller authored
      This change fixes some typos found in the BPF samples README file.
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20221018163231.1926462-1-deso@posteo.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      2c4d72d6
    • Shaomin Deng's avatar
      01dea954
    • Gerhard Engleder's avatar
      samples/bpf: Fix MAC address swapping in xdp2_kern · 7a698edf
      Gerhard Engleder authored
      xdp2_kern rewrites and forwards packets out on the same interface.
      Forwarding still works but rewrite got broken when xdp multibuffer
      support has been added.
      
      With xdp multibuffer a local copy of the packet has been introduced. The
      MAC address is now swapped in the local copy, but the local copy in not
      written back.
      
      Fix MAC address swapping be adding write back of modified packet.
      
      Fixes: 77225174 ("samples/bpf: fixup some tools to be able to support xdp multibuffer")
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Link: https://lore.kernel.org/r/20221015213050.65222-1-gerhard@engleder-embedded.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      7a698edf
    • Gerhard Engleder's avatar
      samples/bpf: Fix map iteration in xdp1_user · 05ee658c
      Gerhard Engleder authored
      BPF map iteration in xdp1_user results in endless loop without any
      output, because the return value of bpf_map_get_next_key() is checked
      against the wrong value.
      
      Other call locations of bpf_map_get_next_key() check for equal 0 for
      continuing the iteration. xdp1_user checks against unequal -1. This is
      wrong for a function which can return arbitrary negative errno values,
      because a return value of e.g. -2 results in an endless loop.
      
      With this fix xdp1_user is printing statistics again:
      proto 0:          1 pkt/s
      proto 0:          1 pkt/s
      proto 17:     107383 pkt/s
      proto 17:     881655 pkt/s
      proto 17:     882083 pkt/s
      proto 17:     881758 pkt/s
      
      Fixes: bd054102 ("libbpf: enforce strict libbpf 1.0 behaviors")
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20221013200922.17167-1-gerhard@engleder-embedded.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      05ee658c
    • Alexandru Tachici's avatar
      net: ethernet: adi: adin1110: Fix SPI transfers · a526a3cc
      Alexandru Tachici authored
      No need to use more than one SPI transfer for reads.
      Use only one from now as ADIN1110/2111 does not tolerate
      CS changes during reads.
      
      The BCM2711/2708 SPI controllers worked fine, but the NXP
      IMX8MM could not keep CS lowered during SPI bursts.
      
      This change aims to make the ADIN1110/2111 driver compatible
      with both SPI controllers, without any loss of bandwidth/other
      capabilities.
      
      Fixes: bc93e19d ("net: ethernet: adi: Add ADIN1110 support")
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a526a3cc
    • David S. Miller's avatar
      Merge branch 'net-bridge-mc-cleanups' · ac3208fb
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      bridge: A few multicast cleanups
      
      Clean up a few issues spotted while working on the bridge multicast code
      and running its selftests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac3208fb
    • Ido Schimmel's avatar
      bridge: mcast: Simplify MDB entry creation · d1942cd4
      Ido Schimmel authored
      Before creating a new MDB entry, br_multicast_new_group() will call
      br_mdb_ip_get() to see if one exists and return it if so.
      
      Therefore, simply call br_multicast_new_group() and omit the call to
      br_mdb_ip_get().
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1942cd4
    • Ido Schimmel's avatar
      bridge: mcast: Use spin_lock() instead of spin_lock_bh() · 262985fa
      Ido Schimmel authored
      IGMPv3 / MLDv2 Membership Reports are only processed from the data path
      with softIRQ disabled, so there is no need to call spin_lock_bh(). Use
      spin_lock() instead.
      
      This is consistent with how other IGMP / MLD packets are processed.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      262985fa
    • Ido Schimmel's avatar
      selftests: bridge_igmp: Remove unnecessary address deletion · b526b2ea
      Ido Schimmel authored
      The test group address is added and removed in v2reportleave_test().
      There is no need to delete it again during cleanup as it results in the
      following error message:
      
       # bash -x ./bridge_igmp.sh
       [...]
       + cleanup
       + pre_cleanup
       [...]
       + ip address del dev swp4 239.10.10.10/32
       RTNETLINK answers: Cannot assign requested address
       + h2_destroy
      
      Solve by removing the unnecessary address deletion.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b526b2ea
    • Ido Schimmel's avatar
      selftests: bridge_vlan_mcast: Delete qdiscs during cleanup · 6fb1faa1
      Ido Schimmel authored
      The qdiscs are added during setup, but not deleted during cleanup,
      resulting in the following error messages:
      
       # ./bridge_vlan_mcast.sh
       [...]
       # ./bridge_vlan_mcast.sh
       Error: Exclusivity flag on, cannot modify.
       Error: Exclusivity flag on, cannot modify.
      
      Solve by deleting the qdiscs during cleanup.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fb1faa1
    • David S. Miller's avatar
      Merge branch 'dpaa-phylink' · 5cacb2c7
      David S. Miller authored
      Sean Anderson says:
      
      ====================
      net: dpaa: Convert to phylink
      
      This series converts the DPAA driver to phylink.
      
      I have tried to maintain backwards compatibility with existing device
      trees whereever possible. However, one area where I was unable to
      achieve this was with QSGMII. Please refer to patch 2 for details.
      
      All mac drivers have now been converted. I would greatly appreciate if
      anyone has T-series or P-series boards they can test/debug this series
      on. I only have an LS1046ARDB. Everything but QSGMII should work without
      breakage; QSGMII needs patches 7 and 8. For this reason, the last 4
      patches in this series should be applied together (and should not go
      through separate trees).
      
      Changes in v7:
      - provide phylink_validate_mask_caps() helper
      - Fix oops if memac_pcs_create returned -EPROBE_DEFER
      - Fix using pcs-names instead of pcs-handle-names
      - Fix not checking for -ENODATA when looking for sgmii pcs
      - Fix 81-character line
      - Simplify memac_validate with phylink_validate_mask_caps
      
      Changes in v6:
      - Remove unnecessary $ref from renesas,rzn1-a5psw
      - Remove unnecessary type from pcs-handle-names
      - Add maxItems to pcs-handle
      - Fix 81-character line
      - Fix uninitialized variable in dtsec_mac_config
      
      Changes in v5:
      - Add Lynx PCS binding
      
      Changes in v4:
      - Use pcs-handle-names instead of pcs-names, as discussed
      - Don't fail if phy support was not compiled in
      - Split off rate adaptation series
      - Split off DPAA "preparation" series
      - Split off Lynx 10G support
      - t208x: Mark MAC1 and MAC2 as 10G
      - Add XFI PCS for t208x MAC1/MAC2
      
      Changes in v3:
      - Expand pcs-handle to an array
      - Add vendor prefix 'fsl,' to rgmii and mii properties.
      - Set maxItems for pcs-names
      - Remove phy-* properties from example because dt-schema complains and I
        can't be bothered to figure out how to make it work.
      - Add pcs-handle as a preferred version of pcsphy-handle
      - Deprecate pcsphy-handle
      - Remove mii/rmii properties
      - Put the PCS mdiodev only after we are done with it (since the PCS
        does not perform a get itself).
      - Remove _return label from memac_initialization in favor of returning
        directly
      - Fix grabbing the default PCS not checking for -ENODATA from
        of_property_match_string
      - Set DTSEC_ECNTRL_R100M in dtsec_link_up instead of dtsec_mac_config
      - Remove rmii/mii properties
      - Replace 1000Base... with 1000BASE... to match IEEE capitalization
      - Add compatibles for QSGMII PCSs
      - Split arm and powerpcs dts updates
      
      Changes in v2:
      - Better document how we select which PCS to use in the default case
      - Move PCS_LYNX dependency to fman Kconfig
      - Remove unused variable slow_10g_if
      - Restrict valid link modes based on the phy interface. This is easier
        to set up, and mostly captures what I intended to do the first time.
        We now have a custom validate which restricts half-duplex for some SoCs
        for RGMII, but generally just uses the default phylink validate.
      - Configure the SerDes in enable/disable
      - Properly implement all ethtool ops and ioctls. These were mostly
        stubbed out just enough to compile last time.
      - Convert 10GEC and dTSEC as well
      - Fix capitalization of mEMAC in commit messages
      - Add nodes for QSGMII PCSs
      - Add nodes for QSGMII PCSs
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cacb2c7
    • Sean Anderson's avatar
      arm64: dts: layerscape: Add nodes for QSGMII PCSs · 4e748b1b
      Sean Anderson authored
      Now that we actually read registers from QSGMII PCSs, it's important
      that we have the correct address (instead of hoping that we're the MAC
      with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
      PCSs.  The exact mapping of QSGMII to MACs depends on the SoC.
      
      Since the first QSGMII PCSs share an address with the SGMII and XFI
      PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
      on the bus.
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e748b1b
    • Sean Anderson's avatar
      powerpc: dts: qoriq: Add nodes for QSGMII PCSs · 4e31b808
      Sean Anderson authored
      Now that we actually read registers from QSGMII PCSs, it's important
      that we have the correct address (instead of hoping that we're the MAC
      with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
      PCSs. They have the same addresses on all SoCs (e.g. if QSGMIIA is
      present it's used for MACs 1 through 4).
      
      Since the first QSGMII PCSs share an address with the SGMII and XFI
      PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
      on the bus.
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e31b808
    • Sean Anderson's avatar
      powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G · 36926a7d
      Sean Anderson authored
      On the T208X SoCs, MAC1 and MAC2 support XGMII. Add some new MAC dtsi
      fragments, and mark the QMAN ports as 10G.
      
      Fixes: da414bb9 ("powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)")
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36926a7d
    • Sean Anderson's avatar
      net: dpaa: Convert to phylink · 5d93cfcf
      Sean Anderson authored
      This converts DPAA to phylink. All macs are converted. This should work
      with no device tree modifications (including those made in this series),
      except for QSGMII (as noted previously).
      
      The mEMAC configuration is one of the tricker areas. I have tried to
      capture all the restrictions across the various models. Most of the time,
      we assume that if the serdes supports a mode or the phy-interface-mode
      specifies it, then we support it. The only place we can't do this is
      (RG)MII, since there's no serdes. In that case, we rely on a (new)
      devicetree property. There are also several cases where half-duplex is
      broken. Unfortunately, only a single compatible is used for the MAC, so we
      have to use the board compatible instead.
      
      The 10GEC conversion is very straightforward, since it only supports XAUI.
      There is generally nothing to configure.
      
      The dTSEC conversion is broadly similar to mEMAC, but is simpler because we
      don't support configuring the SerDes (though this can be easily added) and
      we don't have multiple PCSs. From what I can tell, there's nothing
      different in the driver or documentation between SGMII and 1000BASE-X
      except for the advertising. Similarly, I couldn't find anything about
      2500BASE-X. In both cases, I treat them like SGMII. These modes aren't used
      by any in-tree boards. Similarly, despite being mentioned in the driver, I
      couldn't find any documented SoCs which supported QSGMII.  I have left it
      unimplemented for now.
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d93cfcf
    • Sean Anderson's avatar
      net: fman: memac: Use lynx pcs driver · a7c2a32e
      Sean Anderson authored
      Although not stated in the datasheet, as far as I can tell PCS for mEMACs
      is a "Lynx." By reusing the existing driver, we can remove the PCS
      management code from the memac driver. This requires calling some PCS
      functions manually which phylink would usually do for us, but we will let
      it do that soon.
      
      One problem is that we don't actually have a PCS for QSGMII. We pretend
      that each mEMAC's MDIO bus has four QSGMII PCSs, but this is not the case.
      Only the "base" mEMAC's MDIO bus has the four QSGMII PCSs. This is not an
      issue yet, because we never get the PCS state. However, it will be once the
      conversion to phylink is complete, since the links will appear to never
      come up. To get around this, we allow specifying multiple PCSs in pcsphy.
      This breaks backwards compatibility with old device trees, but only for
      QSGMII. IMO this is the only reasonable way to figure out what the actual
      QSGMII PCS is.
      
      Additionally, we now also support a separate XFI PCS. This can allow the
      SerDes driver to set different addresses for the SGMII and XFI PCSs so they
      can be accessed at the same time.
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7c2a32e
    • Sean Anderson's avatar
      net: fman: memac: Add serdes support · 0fc83bd7
      Sean Anderson authored
      This adds support for using a serdes which has to be configured. This is
      primarly in preparation for phylink conversion, which will then change the
      serdes mode dynamically.
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fc83bd7
    • Russell King (Oracle)'s avatar
      net: phylink: provide phylink_validate_mask_caps() helper · f392a184
      Russell King (Oracle) authored
      Provide a helper that restricts the link modes according to the
      phylink capabilities.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      [rebased on net-next/master and added documentation]
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f392a184
    • Sean Anderson's avatar
      dt-bindings: net: fman: Add additional interface properties · 045d0501
      Sean Anderson authored
      At the moment, mEMACs are configured almost completely based on the
      phy-connection-type. That is, if the phy interface is RGMII, it assumed
      that RGMII is supported. For some interfaces, it is assumed that the
      RCW/bootloader has set up the SerDes properly. This is generally OK, but
      restricts runtime reconfiguration. The actual link state is never
      reported.
      
      To address these shortcomings, the driver will need additional
      information. First, it needs to know how to access the PCS/PMAs (in
      order to configure them and get the link status). The SGMII PCS/PMA is
      the only currently-described PCS/PMA. Add the XFI and QSGMII PCS/PMAs as
      well. The XFI (and 10GBASE-KR) PCS/PMA is a c45 "phy" which sits on the
      same MDIO bus as SGMII PCS/PMA. By default they will have conflicting
      addresses, but they are also not enabled at the same time by default.
      Therefore, we can let the XFI PCS/PMA be the default when
      phy-connection-type is xgmii. This will allow for
      backwards-compatibility.
      
      QSGMII, however, cannot work with the current binding. This is because
      the QSGMII PCS/PMAs are only present on one MAC's MDIO bus. At the
      moment this is worked around by having every MAC write to the PCS/PMA
      addresses (without checking if they are present). This only works if
      each MAC has the same configuration, and only if we don't need to know
      the status. Because the QSGMII PCS/PMA will typically be located on a
      different MDIO bus than the MAC's SGMII PCS/PMA, there is no fallback
      for the QSGMII PCS/PMA.
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      045d0501
    • Sean Anderson's avatar
      dt-bindings: net: Add Lynx PCS binding · 00af103d
      Sean Anderson authored
      This binding is fairly bare-bones for now, since the Lynx driver doesn't
      parse any properties (or match based on the compatible). We just need it
      in order to prevent the PCS nodes from having phy devices attached to
      them. This is not really a problem, but it is a bit inefficient.
      
      This binding is really for three separate PCSs (SGMII, QSGMII, and XFI).
      However, the driver treats all of them the same. This works because the
      SGMII and XFI devices typically use the same address, and the SerDes
      driver (or RCW) muxes between them. The QSGMII PCSs have the same
      register layout as the SGMII PCSs. To do things properly, we'd probably
      do something like
      
      	ethernet-pcs@0 {
      		#pcs-cells = <1>;
      		compatible = "fsl,lynx-pcs";
      		reg = <0>, <1>, <2>, <3>;
      	};
      
      but that would add complexity, and we can describe the hardware just
      fine using separate PCSs for now.
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00af103d