1. 12 Apr, 2019 40 commits
    • Andrey Ignatov's avatar
      selftests/bpf: C based test for sysctl and strtoX · 7568f4cb
      Andrey Ignatov authored
      Add C based test for a few bpf_sysctl_* helpers and bpf_strtoul.
      
      Make sure that sysctl can be identified by name and that multiple
      integers can be parsed from sysctl value with bpf_strtoul.
      
      net/ipv4/tcp_mem is chosen as a testing sysctl, it contains 3 unsigned
      longs, they all are parsed and compared (val[0] < val[1] < val[2]).
      
      Example of output:
        # ./test_sysctl
        ...
        Test case: C prog: deny all writes .. [PASS]
        Test case: C prog: deny access by name .. [PASS]
        Test case: C prog: read tcp_mem .. [PASS]
        Summary: 39 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7568f4cb
    • Andrey Ignatov's avatar
      selftests/bpf: Test bpf_strtol and bpf_strtoul helpers · 8549ddc8
      Andrey Ignatov authored
      Test that bpf_strtol and  bpf_strtoul helpers can be used to convert
      provided buffer to long or unsigned long correspondingly and return both
      correct result and number of consumed bytes, or proper errno.
      
      Example of output:
        # ./test_sysctl
        ..
        Test case: bpf_strtoul one number string .. [PASS]
        Test case: bpf_strtoul multi number string .. [PASS]
        Test case: bpf_strtoul buf_len = 0, reject .. [PASS]
        Test case: bpf_strtoul supported base, ok .. [PASS]
        Test case: bpf_strtoul unsupported base, EINVAL .. [PASS]
        Test case: bpf_strtoul buf with spaces only, EINVAL .. [PASS]
        Test case: bpf_strtoul negative number, EINVAL .. [PASS]
        Test case: bpf_strtol negative number, ok .. [PASS]
        Test case: bpf_strtol hex number, ok .. [PASS]
        Test case: bpf_strtol max long .. [PASS]
        Test case: bpf_strtol overflow, ERANGE .. [PASS]
        Summary: 36 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8549ddc8
    • Andrey Ignatov's avatar
      selftests/bpf: Test ARG_PTR_TO_LONG arg type · c2d5f12e
      Andrey Ignatov authored
      Test that verifier handles new argument types properly, including
      uninitialized or partially initialized value, misaligned stack access,
      etc.
      
      Example of output:
        #456/p ARG_PTR_TO_LONG uninitialized OK
        #457/p ARG_PTR_TO_LONG half-uninitialized OK
        #458/p ARG_PTR_TO_LONG misaligned OK
        #459/p ARG_PTR_TO_LONG size < sizeof(long) OK
        #460/p ARG_PTR_TO_LONG initialized OK
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c2d5f12e
    • Andrey Ignatov's avatar
      selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h · 99f57973
      Andrey Ignatov authored
      Add bpf_sysctl_* and bpf_strtoX helpers to bpf_helpers.h.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      99f57973
    • Andrey Ignatov's avatar
      bpf: Sync bpf.h to tools/ · b457e553
      Andrey Ignatov authored
      Sync bpf_strtoX related bpf UAPI changes to tools/.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b457e553
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_strtol and bpf_strtoul helpers · d7a4cb9b
      Andrey Ignatov authored
      Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned
      long correspondingly. It's similar to user space strtol(3) and
      strtoul(3) with a few changes to the API:
      
      * instead of NUL-terminated C string the helpers expect buffer and
        buffer length;
      
      * resulting long or unsigned long is returned in a separate
        result-argument;
      
      * return value is used to indicate success or failure, on success number
        of consumed bytes is returned that can be used to identify position to
        read next if the buffer is expected to contain multiple integers;
      
      * instead of *base* argument, *flags* is used that provides base in 5
        LSB, other bits are reserved for future use;
      
      * number of supported bases is limited.
      
      Documentation for the new helpers is provided in bpf.h UAPI.
      
      The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to
      be able to convert string input to e.g. "ulongvec" output.
      
      E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be
      parsed by calling to bpf_strtoul three times.
      
      Implementation notes:
      
      Implementation includes "../../lib/kstrtox.h" to reuse integer parsing
      functions. It's done exactly same way as fs/proc/base.c already does.
      
      Unfortunately existing kstrtoX function can't be used directly since
      they fail if any invalid character is present right after integer in the
      string. Existing simple_strtoX functions can't be used either since
      they're obsolete and don't handle overflow properly.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d7a4cb9b
    • Andrey Ignatov's avatar
      bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types · 57c3bb72
      Andrey Ignatov authored
      Currently the way to pass result from BPF helper to BPF program is to
      provide memory area defined by pointer and size: func(void *, size_t).
      
      It works great for generic use-case, but for simple types, such as int,
      it's overkill and consumes two arguments when it could use just one.
      
      Introduce new argument types ARG_PTR_TO_INT and ARG_PTR_TO_LONG to be
      able to pass result from helper to program via pointer to int and long
      correspondingly: func(int *) or func(long *).
      
      New argument types are similar to ARG_PTR_TO_MEM with the following
      differences:
      * they don't require corresponding ARG_CONST_SIZE argument, predefined
        access sizes are used instead (32bit for int, 64bit for long);
      * it's possible to use more than one such an argument in a helper;
      * provided pointers have to be aligned.
      
      It's easy to introduce similar ARG_PTR_TO_CHAR and ARG_PTR_TO_SHORT
      argument types. It's not done due to lack of use-case though.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      57c3bb72
    • Andrey Ignatov's avatar
      selftests/bpf: Test file_pos field in bpf_sysctl ctx · 9a1027e5
      Andrey Ignatov authored
      Test access to file_pos field of bpf_sysctl context, both read (incl.
      narrow read) and write.
      
        # ./test_sysctl
        ...
        Test case: ctx:file_pos sysctl:read read ok .. [PASS]
        Test case: ctx:file_pos sysctl:read read ok narrow .. [PASS]
        Test case: ctx:file_pos sysctl:read write ok .. [PASS]
        ...
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9a1027e5
    • Andrey Ignatov's avatar
      selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers · 786047dd
      Andrey Ignatov authored
      Test that new value provided by user space on sysctl write can be read
      by bpf_sysctl_get_new_value and overridden by bpf_sysctl_set_new_value.
      
        # ./test_sysctl
        ...
        Test case: sysctl_get_new_value sysctl:read EINVAL .. [PASS]
        Test case: sysctl_get_new_value sysctl:write ok .. [PASS]
        Test case: sysctl_get_new_value sysctl:write ok long .. [PASS]
        Test case: sysctl_get_new_value sysctl:write E2BIG .. [PASS]
        Test case: sysctl_set_new_value sysctl:read EINVAL .. [PASS]
        Test case: sysctl_set_new_value sysctl:write ok .. [PASS]
        Summary: 22 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      786047dd
    • Andrey Ignatov's avatar
      selftests/bpf: Test sysctl_get_current_value helper · 11ff34f7
      Andrey Ignatov authored
      Test sysctl_get_current_value on sysctl read and write, buffers with
      enough space and too small buffers to get E2BIG and truncated result,
      etc.
      
        # ./test_sysctl
        ...
        Test case: sysctl_get_current_value sysctl:read ok, gt .. [PASS]
        Test case: sysctl_get_current_value sysctl:read ok, eq .. [PASS]
        Test case: sysctl_get_current_value sysctl:read E2BIG truncated ..  [PASS]
        Test case: sysctl_get_current_value sysctl:read EINVAL .. [PASS]
        Test case: sysctl_get_current_value sysctl:write ok .. [PASS]
        Summary: 16 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      11ff34f7
    • Andrey Ignatov's avatar
      selftests/bpf: Test bpf_sysctl_get_name helper · 6041c67f
      Andrey Ignatov authored
      Test w/ and w/o BPF_F_SYSCTL_BASE_NAME, buffers with enough space and
      too small buffers to get E2BIG and truncated result, etc.
      
        # ./test_sysctl
        ...
        Test case: sysctl_get_name sysctl_value:base ok .. [PASS]
        Test case: sysctl_get_name sysctl_value:base E2BIG truncated .. [PASS]
        Test case: sysctl_get_name sysctl:full ok .. [PASS]
        Test case: sysctl_get_name sysctl:full E2BIG truncated .. [PASS]
        Test case: sysctl_get_name sysctl:full E2BIG truncated small .. [PASS]
        Summary: 11 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6041c67f
    • Andrey Ignatov's avatar
      selftests/bpf: Test BPF_CGROUP_SYSCTL · 1f5fa9ab
      Andrey Ignatov authored
      Add unit test for BPF_PROG_TYPE_CGROUP_SYSCTL program type.
      
      Test that program can allow/deny access.
      Test both valid and invalid accesses to ctx->write.
      
      Example of output:
        # ./test_sysctl
        Test case: sysctl wrong attach_type .. [PASS]
        Test case: sysctl:read allow all .. [PASS]
        Test case: sysctl:read deny all .. [PASS]
        Test case: ctx:write sysctl:read read ok .. [PASS]
        Test case: ctx:write sysctl:write read ok .. [PASS]
        Test case: ctx:write sysctl:read write reject .. [PASS]
        Summary: 6 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f5fa9ab
    • Andrey Ignatov's avatar
      selftests/bpf: Test sysctl section name · 7007af63
      Andrey Ignatov authored
      Add unit test to verify that program and attach types are properly
      identified for "cgroup/sysctl" section name.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7007af63
    • Andrey Ignatov's avatar
      libbpf: Support sysctl hook · 063cc9f0
      Andrey Ignatov authored
      Support BPF_PROG_TYPE_CGROUP_SYSCTL program in libbpf: identifying
      program and attach types by section name, probe.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      063cc9f0
    • Andrey Ignatov's avatar
      bpf: Sync bpf.h to tools/ · 196398d4
      Andrey Ignatov authored
      Sync BPF_PROG_TYPE_CGROUP_SYSCTL related bpf UAPI changes to tools/.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      196398d4
    • Andrey Ignatov's avatar
      bpf: Add file_pos field to bpf_sysctl ctx · e1550bfe
      Andrey Ignatov authored
      Add file_pos field to bpf_sysctl context to read and write sysctl file
      position at which sysctl is being accessed (read or written).
      
      The field can be used to e.g. override whole sysctl value on write to
      sysctl even when sys_write is called by user space with file_pos > 0. Or
      BPF program may reject such accesses.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e1550bfe
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_sysctl_{get,set}_new_value helpers · 4e63acdf
      Andrey Ignatov authored
      Add helpers to work with new value being written to sysctl by user
      space.
      
      bpf_sysctl_get_new_value() copies value being written to sysctl into
      provided buffer.
      
      bpf_sysctl_set_new_value() overrides new value being written by user
      space with a one from provided buffer. Buffer should contain string
      representation of the value, similar to what can be seen in /proc/sys/.
      
      Both helpers can be used only on sysctl write.
      
      File position matters and can be managed by an interface that will be
      introduced separately. E.g. if user space calls sys_write to a file in
      /proc/sys/ at file position = X, where X > 0, then the value set by
      bpf_sysctl_set_new_value() will be written starting from X. If program
      wants to override whole value with specified buffer, file position has
      to be set to zero.
      
      Documentation for the new helpers is provided in bpf.h UAPI.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4e63acdf
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_sysctl_get_current_value helper · 1d11b301
      Andrey Ignatov authored
      Add bpf_sysctl_get_current_value() helper to copy current sysctl value
      into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer.
      
      It provides same string as user space can see by reading corresponding
      file in /proc/sys/, including new line, etc.
      
      Documentation for the new helper is provided in bpf.h UAPI.
      
      Since current value is kept in ctl_table->data in a parsed form,
      ctl_table->proc_handler() with write=0 is called to read that data and
      convert it to a string. Such a string can later be parsed by a program
      using helpers that will be introduced separately.
      
      Unfortunately it's not trivial to provide API to access parsed data due to
      variety of data representations (string, intvec, uintvec, ulongvec,
      custom structures, even NULL, etc). Instead it's assumed that user know
      how to handle specific sysctl they're interested in and appropriate
      helpers can be used.
      
      Since ctl_table->proc_handler() expects __user buffer, conversion to
      __user happens for kernel allocated one where the value is stored.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1d11b301
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_sysctl_get_name helper · 808649fb
      Andrey Ignatov authored
      Add bpf_sysctl_get_name() helper to copy sysctl name (/proc/sys/ entry)
      into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer.
      
      By default full name (w/o /proc/sys/) is copied, e.g. "net/ipv4/tcp_mem".
      
      If BPF_F_SYSCTL_BASE_NAME flag is set, only base name will be copied,
      e.g. "tcp_mem".
      
      Documentation for the new helper is provided in bpf.h UAPI.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      808649fb
    • Andrey Ignatov's avatar
      bpf: Sysctl hook · 7b146ceb
      Andrey Ignatov authored
      Containerized applications may run as root and it may create problems
      for whole host. Specifically such applications may change a sysctl and
      affect applications in other containers.
      
      Furthermore in existing infrastructure it may not be possible to just
      completely disable writing to sysctl, instead such a process should be
      gradual with ability to log what sysctl are being changed by a
      container, investigate, limit the set of writable sysctl to currently
      used ones (so that new ones can not be changed) and eventually reduce
      this set to zero.
      
      The patch introduces new program type BPF_PROG_TYPE_CGROUP_SYSCTL and
      attach type BPF_CGROUP_SYSCTL to solve these problems on cgroup basis.
      
      New program type has access to following minimal context:
      	struct bpf_sysctl {
      		__u32	write;
      	};
      
      Where @write indicates whether sysctl is being read (= 0) or written (=
      1).
      
      Helpers to access sysctl name and value will be introduced separately.
      
      BPF_CGROUP_SYSCTL attach point is added to sysctl code right before
      passing control to ctl_table->proc_handler so that BPF program can
      either allow or deny access to sysctl.
      Suggested-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7b146ceb
    • Andrey Ignatov's avatar
      bpf: Add base proto function for cgroup-bpf programs · b1cd609d
      Andrey Ignatov authored
      Currently kernel/bpf/cgroup.c contains only one program type and one
      proto function cgroup_dev_func_proto(). It'd be useful to have base
      proto function that can be reused for new cgroup-bpf program types
      coming soon.
      
      Introduce cgroup_base_func_proto().
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b1cd609d
    • David S. Miller's avatar
      Merge branch 'smc-next' · e0a092eb
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: patches 2019-04-12
      
      here are patches for SMC:
      * patch 1 improves behavior of non-blocking connect
      * patches 2, 3, 5, 7, and 8 improve connecting return codes
      * patches 4 and 6 are a cleanups without functional change
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0a092eb
    • Karsten Graul's avatar
      net/smc: improve smc_conn_create reason codes · 7a62725a
      Karsten Graul authored
      Rework smc_conn_create() to always return a valid DECLINE reason code.
      This removes the need to translate the return codes on 4 different
      places and allows to easily add more detailed return codes by changing
      smc_conn_create() only.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a62725a
    • Karsten Graul's avatar
      net/smc: improve smc_listen_work reason codes · 9aa68d29
      Karsten Graul authored
      Rework smc_listen_work() to provide improved reason codes when an
      SMC connection is declined. This allows better debugging on user side.
      This also adds 3 more detailed reason codes in smc_clc.h to indicate
      what type of device was not found (ism or rdma or both), or if ism
      cannot talk to the peer.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aa68d29
    • Karsten Graul's avatar
      net/smc: code cleanup smc_listen_work · 228bae05
      Karsten Graul authored
      In smc_listen_work() the variables rc and reason_code are defined which
      have the same meaning. Eliminate reason_code in favor of the shorter
      name rc. No functional changes.
      Rename the functions smc_check_ism() and smc_check_rdma() into
      smc_find_ism_device() and smc_find_rdma_device() to make there purpose
      more clear. No functional changes.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      228bae05
    • Karsten Graul's avatar
      net/smc: cleanup of get vlan id · fba7e8ef
      Karsten Graul authored
      The vlan_id of the underlying CLC socket was retrieved two times
      during processing of the listen handshaking. Change this to get the
      vlan id one time in connect and in listen processing, and reuse the id.
      And add a new CLC DECLINE return code for the case when the retrieval
      of the vlan id failed.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fba7e8ef
    • Karsten Graul's avatar
      net/smc: consolidate function parameters · bc36d2fc
      Karsten Graul authored
      During initialization of an SMC socket a lot of function parameters need
      to get passed down the function call path. Consolidate the parameters
      in a helper struct so there are less enough parameters to get all passed
      by register.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc36d2fc
    • Karsten Graul's avatar
      net/smc: check for ip prefix and subnet · 59886697
      Karsten Graul authored
      The check for a matching ip prefix and subnet was only done for SMC-R
      in smc_listen_rdma_check() but not when an SMC-D connection was
      possible. Rename the function into smc_listen_prfx_check() and move its
      call to a place where it is called for both SMC variants.
      And add a new CLC DECLINE reason for the case when the IP prefix or
      subnet check fails so the reason for the failing SMC connection can be
      found out more easily.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59886697
    • Karsten Graul's avatar
      net/smc: fallback to TCP after connect problems · 4ada81fd
      Karsten Graul authored
      Correct the CLC decline reason codes for internal problems to not have
      the sign bit set, negative reason codes are interpreted as not eligible
      for TCP fallback.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ada81fd
    • Ursula Braun's avatar
      net/smc: nonblocking connect rework · 50717a37
      Ursula Braun authored
      For nonblocking sockets move the kernel_connect() from the connect
      worker into the initial smc_connect part to return kernel_connect()
      errors other than -EINPROGRESS to user space.
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50717a37
    • Dongli Zhang's avatar
      xen-netback: add reference from xenvif to backend_info to facilitate coredump analysis · 6dc400af
      Dongli Zhang authored
      During coredump analysis, it is not easy to obtain the address of
      backend_info in xen-netback.
      
      So far there are two ways to obtain backend_info:
      
      1. Do what xenbus_device_find() does for vmcore to find the xenbus_device
      and then derive it from dev_get_drvdata().
      
      2. Extract backend_info from callstack of xenwatch (e.g., netback_remove()
      or frontend_changed()).
      
      This patch adds a reference from xenvif to backend_info so that it would be
      much more easier to obtain backend_info during coredump analysis.
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dc400af
    • David S. Miller's avatar
      Merge branch 'sctp-skb-list' · 8af9f729
      David S. Miller authored
      David Miller says:
      
      ====================
      SCTP: Event skb list overhaul.
      
      This patch series eliminates the explicit reference to the skb list
      implementation via skb->prev dereferences.
      
      The approach used is to pass a non-empty skb list around instead of an
      event skb object which may or may not be on a list.
      
      I'd like to thank Marcelo Leitner, Xin Long, and Neil Horman for
      reviewing previous versions of this series.
      
      Testing would be very much appreciated, in addition to the review of
      course.
      
      v4 --> v5: Rebase to net-next
      
      v3 --> v4: Fix the logic in patch #4 so that we don't miss cases
                 where we should add event to the on-stack temp list.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8af9f729
    • David Miller's avatar
      sctp: Pass sk_buff_head explicitly to sctp_ulpq_tail_event(). · 013b96ec
      David Miller authored
      Now the SKB list implementation assumption can be removed.
      
      And now that we know that the list head is always non-NULL
      we can remove the code blocks dealing with that as well.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      013b96ec
    • David Miller's avatar
      sctp: Make sctp_enqueue_event tak an skb list. · 178ca044
      David Miller authored
      Pass this, instead of an event.  Then everything trickles down and we
      always have events a non-empty list.
      
      Then we needs a list creating stub to place into .enqueue_event for sctp_stream_interleave_1.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      178ca044
    • David Miller's avatar
      sctp: Use helper for sctp_ulpq_tail_event() when hooked up to ->enqueue_event · 5e8f641d
      David Miller authored
      This way we can make sure events sent this way to
      sctp_ulpq_tail_event() are on a list as well.  Now all such code paths
      are fully covered.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e8f641d
    • David Miller's avatar
      sctp: Always pass skbs on a list to sctp_ulpq_tail_event(). · 925b9374
      David Miller authored
      This way we can simplify the logic and remove assumptions
      about the implementation of skb lists.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      925b9374
    • David Miller's avatar
      sctp: Remove superfluous test in sctp_ulpq_reasm_drain(). · 0eff1052
      David Miller authored
      Inside the loop, we always start with event non-NULL.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eff1052
    • Vlad Buslov's avatar
      net: sched: flower: fix filter net reference counting · 9994677c
      Vlad Buslov authored
      Fix net reference counting in fl_change() and remove redundant call to
      tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
      before releasing exts and deallocating a filter, so this code caused flower
      classifier to obtain net twice per filter that is being deleted.
      
      Implementation of __fl_delete() called tcf_exts_get_net() to pass its
      result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
      and only complicates fl_mask_put() implementation. This functionality seems
      to be copied from filter cleanup code, where it was added by Cong with
      following explanation:
      
          This patchset tries to fix the race between call_rcu() and
          cleanup_net() again. Without holding the netns refcnt the
          tc_action_net_exit() in netns workqueue could be called before
          filter destroy works in tc filter workqueue. This patchset
          moves the netns refcnt from tc actions to tcf_exts, without
          breaking per-netns tc actions.
      
      This doesn't apply to flower mask, which doesn't call any tc action code
      during cleanup. Simplify fl_mask_put() by removing the flag parameter and
      always use tcf_queue_work() to free mask objects.
      
      Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters")
      Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Reported-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9994677c
    • David Ahern's avatar
      selftests: Add debugging options to pmtu.sh · 56490b62
      David Ahern authored
      pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
      If a test fails, it is near impossible to debug why. For example:
      
          TEST: ipv6: PMTU exceptions                       [FAIL]
      
      There are a lot of commands run behind the scenes for this test. Which
      one is failing?
      
      Add a VERBOSE option to show commands that are run and any output from
      those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
      fails allowing users to poke around with the setup in the failed state.
      
      In the process, rename tracing to TRACING and move declaration to top
      with the new variables.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56490b62
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · bb23581b
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2019-04-12
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Improve BPF verifier scalability for large programs through two
         optimizations: i) remove verifier states that are not useful in pruning,
         ii) stop walking parentage chain once first LIVE_READ is seen. Combined
         gives approx 20x speedup. Increase limits for accepting large programs
         under root, and add various stress tests, from Alexei.
      
      2) Implement global data support in BPF. This enables static global variables
         for .data, .rodata and .bss sections to be properly handled which allows
         for more natural program development. This also opens up the possibility
         to optimize program workflow by compiling ELFs only once and later only
         rewriting section data before reload, from Daniel and with test cases and
         libbpf refactoring from Joe.
      
      3) Add config option to generate BTF type info for vmlinux as part of the
         kernel build process. DWARF debug info is converted via pahole to BTF.
         Latter relies on libbpf and makes use of BTF deduplication algorithm which
         results in 100x savings compared to DWARF data. Resulting .BTF section is
         typically about 2MB in size, from Andrii.
      
      4) Add BPF verifier support for stack access with variable offset from
         helpers and add various test cases along with it, from Andrey.
      
      5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
         so that L2 encapsulation can be used for tc tunnels, from Alan.
      
      6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
         users can define a subset of allowed __sk_buff fields that get fed into
         the test program, from Stanislav.
      
      7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
         various UBSAN warnings in bpftool, from Yonghong.
      
      8) Generate a pkg-config file for libbpf, from Luca.
      
      9) Dump program's BTF id in bpftool, from Prashant.
      
      10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
          program, from Magnus.
      
      11) kallsyms related fixes for the case when symbols are not present in
          BPF selftests and samples, from Daniel
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb23581b