1. 15 Dec, 2019 1 commit
    • Paul Chaignon's avatar
      bpftool: Match several programs with same tag · ec202509
      Paul Chaignon authored
      When several BPF programs have the same tag, bpftool matches only the
      first (in ID order).  This patch changes that behavior such that dump and
      show commands return all matched programs.  Commands that require a single
      program (e.g., pin and attach) will error out if given a tag that matches
      several.  bpftool prog dump will also error out if file or visual are
      given and several programs have the given tag.
      
      In the case of the dump command, a program header is added before each
      dump only if the tag matches several programs; this patch doesn't change
      the output if a single program matches.  The output when several
      programs match thus looks as follows.
      
      $ ./bpftool prog dump xlated tag 6deef7357e7b4530
      3: cgroup_skb  tag 6deef7357e7b4530  gpl
         0: (bf) r6 = r1
         [...]
         7: (95) exit
      
      4: cgroup_skb  tag 6deef7357e7b4530  gpl
         0: (bf) r6 = r1
         [...]
         7: (95) exit
      Signed-off-by: default avatarPaul Chaignon <paul.chaignon@orange.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/fb1fe943202659a69cd21dd5b907c205af1e1e22.1576263640.git.paul.chaignon@gmail.com
      ec202509
  2. 13 Dec, 2019 24 commits
  3. 12 Dec, 2019 2 commits
  4. 11 Dec, 2019 13 commits
    • Daniel Borkmann's avatar
      bpf: Emit audit messages upon successful prog load and unload · bae141f5
      Daniel Borkmann authored
      Allow for audit messages to be emitted upon BPF program load and
      unload for having a timeline of events. The load itself is in
      syscall context, so additional info about the process initiating
      the BPF prog creation can be logged and later directly correlated
      to the unload event.
      
      The only info really needed from BPF side is the globally unique
      prog ID where then audit user space tooling can query / dump all
      info needed about the specific BPF program right upon load event
      and enrich the record, thus these changes needed here can be kept
      small and non-intrusive to the core.
      
      Raw example output:
      
        # auditctl -D
        # auditctl -a always,exit -F arch=x86_64 -S bpf
        # ausearch --start recent -m 1334
        ...
        ----
        time->Wed Nov 27 16:04:13 2019
        type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf"
        type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321   \
          success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477    \
          pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001    \
          egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf"                \
          exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf"                  \
          subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
        type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD
        ----
        time->Wed Nov 27 16:04:13 2019
        type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD
        ...
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Co-developed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/bpf/20191206214934.11319-1-jolsa@kernel.org
      bae141f5
    • Stanislav Fomichev's avatar
      bpf: Switch to offsetofend in BPF_PROG_TEST_RUN · b590cb5f
      Stanislav Fomichev authored
      Switch existing pattern of "offsetof(..., member) + FIELD_SIZEOF(...,
      member)' to "offsetofend(..., member)" which does exactly what
      we need without all the copy-paste.
      Suggested-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20191210191933.105321-1-sdf@google.com
      b590cb5f
    • Andrii Nakryiko's avatar
      libbpf: Bump libpf current version to v0.0.7 · 09c4708d
      Andrii Nakryiko authored
      New development cycles starts, bump to v0.0.7 proactively.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191209224022.3544519-1-andriin@fb.com
      09c4708d
    • Russell King's avatar
      ARM: net: bpf: Improve prologue code sequence · c4533128
      Russell King authored
      Improve the prologue code sequence to be able to take advantage of
      64-bit stores, changing the code from:
      
        push    {r4, r5, r6, r7, r8, r9, fp, lr}
        mov     fp, sp
        sub     ip, sp, #80     ; 0x50
        sub     sp, sp, #600    ; 0x258
        str     ip, [fp, #-100] ; 0xffffff9c
        mov     r6, #0
        str     r6, [fp, #-96]  ; 0xffffffa0
        mov     r4, #0
        mov     r3, r4
        mov     r2, r0
        str     r4, [fp, #-104] ; 0xffffff98
        str     r4, [fp, #-108] ; 0xffffff94
      
      to the tighter:
      
        push    {r4, r5, r6, r7, r8, r9, fp, lr}
        mov     fp, sp
        mov     r3, #0
        sub     r2, sp, #80     ; 0x50
        sub     sp, sp, #600    ; 0x258
        strd    r2, [fp, #-100] ; 0xffffff9c
        mov     r2, #0
        strd    r2, [fp, #-108] ; 0xffffff94
        mov     r2, r0
      
      resulting in a saving of three instructions.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/E1ieH2g-0004ih-Rb@rmk-PC.armlinux.org.uk
      c4533128
    • Shahjada Abul Husain's avatar
      cxgb4: add support for high priority filters · c2193999
      Shahjada Abul Husain authored
      T6 has a separate region known as high priority filter region
      that allows classifying packets going through ULD path. So,
      query firmware for HPFILTER resources and enable the high
      priority offload filter support when it is available.
      Signed-off-by: default avatarShahjada Abul Husain <shahjada@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2193999
    • Chen Wandun's avatar
      enetc: remove variable 'tc_max_sized_frame' set but not used · 6525b5ef
      Chen Wandun authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      drivers/net/ethernet/freescale/enetc/enetc_qos.c: In function enetc_setup_tc_cbs:
      drivers/net/ethernet/freescale/enetc/enetc_qos.c:195:6: warning: variable tc_max_sized_frame set but not used [-Wunused-but-set-variable]
      
      Fixes: c431047c ("enetc: add support Credit Based Shaper(CBS) for hardware offload")
      Signed-off-by: default avatarChen Wandun <chenwandun@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6525b5ef
    • Jakub Kicinski's avatar
      nfp: add support for TLV device stats · ca866ee8
      Jakub Kicinski authored
      Device stats are currently hard coded in the PCI BAR0 layout.
      Add a ability to read them from the TLV area instead.
      Names for the stats are maintained by the driver, and their
      meaning documented. This allows us to more easily add and
      remove device stats.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca866ee8
    • Kuniyuki Iwashima's avatar
      tcp: Cleanup duplicate initialization of sk->sk_state. · 5000b28b
      Kuniyuki Iwashima authored
      When a TCP socket is created, sk->sk_state is initialized twice as
      TCP_CLOSE in sock_init_data() and tcp_init_sock(). The tcp_init_sock() is
      always called after the sock_init_data(), so it is not necessary to update
      sk->sk_state in the tcp_init_sock().
      
      Before v2.1.8, the code of the two functions was in the inet_create(). In
      the patch of v2.1.8, the tcp_v4/v6_init_sock() were added and the code of
      initialization of sk->state was duplicated.
      Signed-off-by: default avatarKuniyuki Iwashima <kuni1840@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5000b28b
    • Michael Walle's avatar
      enetc: add software timestamping · 4caefbce
      Michael Walle authored
      Provide a software TX timestamp and add it to the ethtool query
      interface.
      
      skb_tx_timestamp() is also needed if one would like to use PHY
      timestamping.
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4caefbce
    • David S. Miller's avatar
      Merge branch 'tipc-introduce-variable-window-congestion-control' · bb9d8454
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: introduce variable window congestion control
      
      We improve thoughput greatly by introducing a variety of the Reno
      congestion control algorithm at the link level.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb9d8454
    • Jon Maloy's avatar
      tipc: introduce variable window congestion control · 16ad3f40
      Jon Maloy authored
      We introduce a simple variable window congestion control for links.
      The algorithm is inspired by the Reno algorithm, covering both 'slow
      start', 'congestion avoidance', and 'fast recovery' modes.
      
      - We introduce hard lower and upper window limits per link, still
        different and configurable per bearer type.
      
      - We introduce a 'slow start theshold' variable, initially set to
        the maximum window size.
      
      - We let a link start at the minimum congestion window, i.e. in slow
        start mode, and then let is grow rapidly (+1 per rceived ACK) until
        it reaches the slow start threshold and enters congestion avoidance
        mode.
      
      - In congestion avoidance mode we increment the congestion window for
        each window-size number of acked packets, up to a possible maximum
        equal to the configured maximum window.
      
      - For each non-duplicate NACK received, we drop back to fast recovery
        mode, by setting the both the slow start threshold to and the
        congestion window to (current_congestion_window / 2).
      
      - If the timeout handler finds that the transmit queue has not moved
        since the previous timeout, it drops the link back to slow start
        and forces a probe containing the last sent sequence number to the
        sent to the peer, so that this can discover the stale situation.
      
      This change does in reality have effect only on unicast ethernet
      transport, as we have seen that there is no room whatsoever for
      increasing the window max size for the UDP bearer.
      For now, we also choose to keep the limits for the broadcast link
      unchanged and equal.
      
      This algorithm seems to give a 50-100% throughput improvement for
      messages larger than MTU.
      Suggested-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16ad3f40
    • Jon Maloy's avatar
      tipc: eliminate more unnecessary nacks and retransmissions · d3b09995
      Jon Maloy authored
      When we increase the link tranmsit window we often observe the following
      scenario:
      
      1) A STATE message bypasses a sequence of traffic packets and arrives
         far ahead of those to the receiver. STATE messages contain a
         'peers_nxt_snt' field to indicate which was the last packet sent
         from the peer. This mechanism is intended as a last resort for the
         receiver to detect missing packets, e.g., during very low traffic
         when there is no packet flow to help early loss detection.
      3) The receiving link compares the 'peer_nxt_snt' field to its own
         'rcv_nxt', finds that there is a gap, and immediately sends a
         NACK message back to the peer.
      4) When this NACKs arrives at the sender, all the requested
         retransmissions are performed, since it is a first-time request.
      
      Just like in the scenario described in the previous commit this leads
      to many redundant retransmissions, with decreased throughput as a
      consequence.
      
      We fix this by adding two more conditions before we send a NACK in
      this sitution. First, the deferred queue must be empty, so we cannot
      assume that the potential packet loss has already been detected by
      other means. Second, we check the 'peers_snd_nxt' field only in probe/
      probe_reply messages, thus turning this into a true mechanism of last
      resort as it was really meant to be.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3b09995
    • Jon Maloy's avatar
      tipc: eliminate gap indicator from ACK messages · 02288248
      Jon Maloy authored
      When we increase the link send window we sometimes observe the
      following scenario:
      
      1) A packet #N arrives out of order far ahead of a sequence of older
         packets which are still under way. The packet is added to the
         deferred queue.
      2) The missing packets arrive in sequence, and for each 16th of them
         an ACK is sent back to the receiver, as it should be.
      3) When building those ACK messages, it is checked if there is a gap
         between the link's 'rcv_nxt' and the first packet in the deferred
         queue. This is always the case until packet number #N-1 arrives, and
         a 'gap' indicator is added, effectively turning them into NACK
         messages.
      4) When those NACKs arrive at the sender, all the requested
         retransmissions are done, since it is a first-time request.
      
      This sometimes leads to a huge amount of redundant retransmissions,
      causing a drop in max throughput. This problem gets worse when we
      in a later commit introduce variable window congestion control,
      since it drops the link back to 'fast recovery' much more often
      than necessary.
      
      We now fix this by not sending any 'gap' indicator in regular ACK
      messages. We already have a mechanism for sending explicit NACKs
      in place, and this is sufficient to keep up the packet flow.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02288248