- 15 May, 2020 40 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller authored
Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Fix sk_psock reference count leak on receive, from Xiyu Yang. 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven. 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this, from Maciej Żenczykowski. 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo Abeni. 5) Fix netprio cgroup v2 leak, from Zefan Li. 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal. 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet. 8) Various bpf probe handling fixes, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) selftests: mptcp: pm: rm the right tmp file dpaa2-eth: properly handle buffer size restrictions bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range bpf: Restrict bpf_probe_read{, str}() only to archs where they work MAINTAINERS: Mark networking drivers as Maintained. ipmr: Add lockdep expression to ipmr_for_each_table macro ipmr: Fix RCU list debugging warning drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 tcp: fix error recovery in tcp_zerocopy_receive() MAINTAINERS: Add Jakub to networking drivers. MAINTAINERS: another add of Karsten Graul for S390 networking drivers: ipa: fix typos for ipa_smp2p structure doc pppoe: only process PADT targeted at local interfaces selftests/bpf: Enforce returning 0 for fentry/fexit programs bpf: Enforce returning 0 for fentry/fexit progs net: stmmac: fix num_por initialization security: Fix the default value of secid_to_secctx hook libbpf: Fix register naming in PT_REGS s390 macros ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds authored
Pull rdma fixes from Jason Gunthorpe: "A few minor bug fixes for user visible defects, and one regression: - Various bugs from static checkers and syzkaller - Add missing error checking in mlx4 - Prevent RTNL lock recursion in i40iw - Fix segfault in cxgb4 in peer abort cases - Fix a regression added in 5.7 where the IB_EVENT_DEVICE_FATAL could be lost, and wasn't delivered to all the FDs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Move IB_EVENT_DEVICE_FATAL to destroy_uobj RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event RDMA/iw_cxgb4: Fix incorrect function parameters RDMA/core: Fix double put of resource IB/core: Fix potential NULL pointer dereference in pkey cache IB/hfi1: Fix another case where pq is left on waitlist IB/i40iw: Remove bogus call to netdev_master_upper_dev_get() IB/mlx4: Test return value of calls to ib_get_cached_pkey RDMA/rxe: Always return ERR_PTR from rxe_create_mmap_info() i40iw: Fix error handling in i40iw_manage_arp_cache()
-
Linus Torvalds authored
Merge tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - lkdtm runner fixes to prevent dmesg clearing and shellcheck errors - ftrace test handling when test module doesn't exist - nsfs test fix to replace zero-length array with flexible-array - dmabuf-heaps test fix to return clear error value * tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/lkdtm: Use grep -E instead of egrep selftests/lkdtm: Don't clear dmesg when running tests selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist tools/testing: Replace zero-length array with flexible-array kselftests: dmabuf-heaps: Fix confused return value on expected error testing
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: "A handful of build fixes, all found by Huawei's autobuilder. None of these patches should have any functional impact on kernels that build, and they're mostly related to various features intermingling with !MMU. While some of these might be better hoisted to generic code, it seems better to have the simple fixes in the meanwhile. As far as I know these are the only outstanding patches for 5.7" * tag 'riscv-for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: mmiowb: Fix implicit declaration of function 'smp_processor_id' riscv: pgtable: Fix __kernel_map_pages build error if NOMMU riscv: Make SYS_SUPPORTS_HUGETLBFS depends on MMU riscv: Disable ARCH_HAS_DEBUG_VIRTUAL if NOMMU riscv: Add pgprot_writecombine/device and PAGE_SHARED defination if NOMMU riscv: stacktrace: Fix undefined reference to `walk_stackframe' riscv: Fix unmet direct dependencies built based on SOC_VIRT riscv: perf: RISCV_BASE_PMU should be independent riscv: perf_event: Make some funciton static
-
David S. Miller authored
Paolo Abeni says: ==================== mptcp: fix MP_JOIN failure handling Currently if we hit an MP_JOIN failure on the third ack, the child socket is closed with reset, but the request socket is not deleted, causing weird behaviors. The main problem is that MPTCP's MP_JOIN code needs to plug it's own 'valid 3rd ack' checks and the current TCP callbacks do not allow that. This series tries to address the above shortcoming introducing a new MPTCP specific bit in a 'struct tcp_request_sock' hole, and leveraging that to allow tcp_check_req releasing the request socket when needed. The above allows cleaning-up a bit current MPTCP hooking in tcp_check_req(). An alternative solution, possibly cleaner but more invasive, would be changing the 'bool *own_req' syn_recv_sock() argument into 'int *req_status' and let MPTCP set it to 'REQ_DROP'. v1 -> v2: - be more conservative about drop_req initialization RFC -> v1: - move the drop_req bit inside tcp_request_sock (Eric) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Currently, on MP_JOIN failure we reset the child socket, but leave the request socket untouched. tcp_check_req will deal with it according to the 'tcp_abort_on_overflow' sysctl value - by default the req socket will stay alive. The above leads to inconsistent behavior on MP JOIN failure, and bad listener overflow accounting. This patch addresses the issue leveraging the infrastructure just introduced to ask the TCP stack to drop the req on failure. The child socket is not freed anymore by subflow_syn_recv_sock(), instead it's moved to a dead state and will be disposed by the next sock_put done by the TCP stack, so that listener overflow accounting is not affected by MP JOIN failure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Move the steps to prepare an inet_connection_sock for forced disposal inside a separate helper. No functional changes inteded, this will just simplify the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
MP_JOIN subflows must not land into the accept queue. Currently tcp_check_req() calls an mptcp specific helper to detect such scenario. Such helper leverages the subflow context to check for MP_JOIN subflows. We need to deal also with MP JOIN failures, even when the subflow context is not available due allocation failure. A possible solution would be changing the syn_recv_sock() signature to allow returning a more descriptive action/ error code and deal with that in tcp_check_req(). Since the above need is MPTCP specific, this patch instead uses a TCP request socket hole to add a MPTCP specific flag. Such flag is used by the MPTCP syn_recv_sock() to tell tcp_check_req() how to deal with the request socket. This change is a no-op for !MPTCP build, and makes the MPTCP code simpler. It allows also the next patch to deal correctly with MP JOIN failure. v1 -> v2: - be more conservative on drop_req initialization (Mat) RFC -> v1: - move the drop_req bit inside tcp_request_sock (Eric) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fix from Catalin Marinas: "Fix flush_icache_range() second argument in machine_kexec() to be an address rather than size" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: fix the flush_icache_range arguments in machine_kexec
-
Oleksij Rempel authored
A typical 100Base-T1 link should be always connected. If the link is in a shot or open state, it is a failure. In most cases, we won't be able to automatically handle this issue, but we need to log it or notify user (if possible). With this patch, the cable will be tested on "ip l s dev .. up" attempt and send ethnl notification to the user space. This patch was tested with TJA1102 PHY and "ethtool --monitor" command. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf 2020-05-15 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 2 day(s) which contain a total of 14 files changed, 137 insertions(+), 43 deletions(-). The main changes are: 1) Fix secid_to_secctx LSM hook default value, from Anders. 2) Fix bug in mmap of bpf array, from Andrii. 3) Restrict bpf_probe_read to archs where they work, from Daniel. 4) Enforce returning 0 for fentry/fexit progs, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kevin Lo authored
The BCM54811 PHY shares many similarities with the already supported BCM54810 PHY but additionally requires some semi-unique configuration. Signed-off-by: Kevin Lo <kevlo@kevlo.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Rahul Lakkireddy says: ==================== cxgb4: improve and tune TC-MQPRIO offload Patch 1 improves the Tx path's credit request and recovery mechanism when running under heavy load. Patch 2 adds ability to tune the burst buffer sizes of all traffic classes to improve performance for <= 1500 MTU, under heavy load. Patch 3 adds support to track EOTIDs and dump software queue contexts used by TC-MQPRIO offload. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rahul Lakkireddy authored
Rework and add support for dumping EOTID software context used by TC-MQPRIO. Also track number of EOTIDs in use. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rahul Lakkireddy authored
For each traffic class, firmware handles up to 4 * MTU amount of data per burst cycle. Under heavy load, this small buffer size is a bottleneck when buffering large TSO packets in <= 1500 MTU case. Increase the burst buffer size to 8 * MTU when supported. Also, keep the driver's traffic class configuration API similar to the firmware API counterpart. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rahul Lakkireddy authored
Request credit update for every half credits consumed, including the current request. Also, avoid re-trying to post packets when there are no credits left. The credit update reply via interrupt will eventually restore the credits and will invoke the Tx path again. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-15 The following pull-request contains BPF updates for your *net-next* tree. We've added 37 non-merge commits during the last 1 day(s) which contain a total of 67 files changed, 741 insertions(+), 252 deletions(-). The main changes are: 1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper. 2) bpftool can probe CONFIG_HZ, from Daniel. 3) CAP_BPF is introduced to isolate user processes that use BPF infra and to secure BPF networking services by dropping CAP_SYS_ADMIN requirement in certain cases, from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
DENG Qingfang authored
Allow DSA to add VLAN entries even if VLAN filtering is disabled, so enabling it will not block the traffic of existent ports in the bridge Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vlad Buslov says: ==================== Implement classifier-action terse dump mode Output rate of current upstream kernel TC filter dump implementation if relatively low (~100k rules/sec depending on configuration). This constraint impacts performance of software switch implementation that rely on TC for their datapath implementation and periodically call TC filter dump to update rules stats. Moreover, TC filter dump output a lot of static data that don't change during the filter lifecycle (filter key, specific action details, etc.) which constitutes significant portion of payload on resulting netlink packets and increases amount of syscalls necessary to dump all filters on particular Qdisc. In order to significantly improve filter dump rate this patch sets implement new mode of TC filter dump operation named "terse dump" mode. In this mode only parameters necessary to identify the filter (handle, action cookie, etc.) and data that can change during filter lifecycle (filter flags, action stats, etc.) are preserved in dump output while everything else is omitted. Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires individual classifier support (new tcf_proto_ops->terse_dump() callback). Support for action terse dump is implemented in act API and don't require changing individual action implementations. The following table provides performance comparison between regular filter dump and new terse dump mode for two classifier-action profiles: one minimal config with L2 flower classifier and single gact action and another heavier config with L2+5tuple flower classifier with tunnel_key+mirred actions. Classifier-action type | dump | terse dump | X improvement | (rules/sec) | (rules/sec) | -----------------------------+-------------+-------------+--------------- L2 with gact | 141.8 | 293.2 | 2.07 L2+5tuple tunnel_key+mirred | 76.4 | 198.8 | 2.60 Benchmark details: to measure the rate tc filter dump and terse dump commands are invoked on ingress Qdisc that have one million filters configured using following commands. > time sudo tc -s filter show dev ens1f0 ingress >/dev/null > time sudo tc -s filter show terse dev ens1f0 ingress >/dev/null Value in results table is calculated by dividing 1000000 total rules by "real" time reported by time command. Setup details: 2x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 32GB memory ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matthieu Baerts authored
"$err" is a variable pointing to a temp file. "$out" is not: only used as a local variable in "check()" and representing the output of a command line. Fixes: eedbc685 (selftests: add PM netlink functional tests) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
Depending on the WRIOP version, the buffer size on the RX path must by a multiple of 64 or 256. Handle this restriction properly by aligning down the buffer size to the necessary value. Also, use the new buffer size dynamically computed instead of the compile time one. Fixes: 27c87486 ("dpaa2-eth: Use a single page per Rx buffer") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Implement two basic tests to verify terse dump functionality of flower classifier: - Test that verifies that terse dump works. - Test that verifies that terse dump doesn't print filter key. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only dump handle, flags and action data in terse mode. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Extend tcf_action_dump() with boolean argument 'terse' that is used to request terse-mode action dump. In terse mode only essential data needed to identify particular action (action kind, cookie, etc.) and its stats is put to resulting skb and everything else is omitted. Implement tcf_exts_terse_dump() helper in cls API that is intended to be used to request terse dump of all exts (actions) attached to the filter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option is intended to be used to improve performance of TC filter dump when userland only needs to obtain stats and not the whole classifier/action data. Extend struct tcf_proto_ops with new terse_dump() callback that must be defined by supporting classifier implementations. Support of the options in specific classifiers and actions is implemented in following patches in the series. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tobias Waldekranz authored
The assumption that a device node is associated either with the netdev's device, or the parent of that device, does not hold for all drivers. E.g. Freescale's DPAA has two layers of platform devices above the netdev. Instead, recursively walk up the tree from the netdev, allowing any parent to match against the sought after node. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Linus Torvalds authored
Merge tag 'hwmon-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fix ADC access synchronization problem with da9052 driver - Fix temperature limit and status reporting in nct7904 driver - Fix drivetemp temperature reporting if SCT is supported but SCT data tables are not. * tag 'hwmon-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (da9052) Synchronize access with mfd hwmon: (nct7904) Fix incorrect range of temperature limit registers hwmon: (nct7904) Read all SMI status registers in probe function hwmon: (drivetemp) Fix SCT support if SCT data tables are not supported
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Things look good and calming down; the only change to ALSA core is the fix for racy rawmidi buffer accesses spotted by syzkaller, and the rest are all small device-specific quirks for HD-audio and USB-audio devices" * tag 'sound-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - Limit int mic boost for Thinkpad T530 ALSA: hda/realtek - Add COEF workaround for ASUS ZenBook UX431DA ALSA: hda/realtek: Enable headset mic of ASUS UX581LV with ALC295 ALSA: hda/realtek - Enable headset mic of ASUS UX550GE with ALC295 ALSA: hda/realtek - Enable headset mic of ASUS GL503VM with ALC295 ALSA: hda/realtek: Add quirk for Samsung Notebook ALSA: rawmidi: Fix racy buffer resize under concurrent accesses ALSA: usb-audio: add mapping for ASRock TRX40 Creator ALSA: hda/realtek - Fix S3 pop noise on Dell Wyse Revert "ALSA: hda/realtek: Fix pop noise on ALC225" ALSA: firewire-lib: fix 'function sizeof not defined' error of tracepoints format ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset
-
git://anongit.freedesktop.org/drm/drmLinus Torvalds authored
Pull drm fixes from Dave Airlie: "As mentioned last week an i915 PR came in late, but I left it, so the i915 bits of this cover 2 weeks, which is why it's likely a bit larger than usual. Otherwise it's mostly amdgpu fixes, one tegra fix, one meson fix. i915: - Handle idling during i915_gem_evict_something busy loops (Chris) - Mark current submissions with a weak-dependency (Chris) - Propagate error from completed fences (Chris) - Fixes on execlist to avoid GPU hang situation (Chris) - Fixes couple deadlocks (Chris) - Timeslice preemption fixes (Chris) - Fix Display Port interrupt handling on Tiger Lake (Imre) - Reduce debug noise around Frame Buffer Compression (Peter) - Fix logic around IPC W/a for Coffee Lake and Kaby Lake (Sultan) - Avoid dereferencing a dead context (Chris) tegra: - tegra120/4 smmu fixes amdgpu: - Clockgating fixes - Fix fbdev with scatter/gather display - S4 fix for navi - Soft recovery for gfx10 - Freesync fixes - Atomic check cursor fix - Add a gfxoff quirk - MST fix amdkfd: - Fix GEM reference counting meson: - error code propogation fix" * tag 'drm-fixes-2020-05-15' of git://anongit.freedesktop.org/drm/drm: (29 commits) drm/i915: Handle idling during i915_gem_evict_something busy loops drm/meson: pm resume add return errno branch drm/amd/amdgpu: Update update_config() logic drm/amd/amdgpu: add raven1 part to the gfxoff quirk list drm/i915: Mark concurrent submissions with a weak-dependency drm/i915: Propagate error from completed fences drm/i915/gvt: Fix kernel oops for 3-level ppgtt guest drm/i915/gvt: Init DPLL/DDI vreg for virtual display instead of inheritance. drm/amd/display: add basic atomic check for cursor plane drm/amd/display: Fix vblank and pageflip event handling for FreeSync drm/amdgpu: implement soft_recovery for gfx10 drm/amdgpu: enable hibernate support on Navi1X drm/amdgpu: Use GEM obj reference for KFD BOs drm/amdgpu: force fbdev into vram drm/amd/powerplay: perform PG ungate prior to CG ungate drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate drm/amdgpu: disable MGCG/MGLS also on gfx CG ungate drm/i915/execlists: Track inflight CCID drm/i915/execlists: Avoid reusing the same logical CCID drm/i915/gem: Remove object_is_locked assertion from unpin_from_display_plane ...
-
Daniel Borkmann authored
Alexei Starovoitov says: ==================== v6->v7: - permit SK_REUSEPORT program type under CAP_BPF as suggested by Marek Majkowski. It's equivalent to SOCKET_FILTER which is unpriv. v5->v6: - split allow_ptr_leaks into four flags. - retain bpf_jit_limit under cap_sys_admin. - fixed few other issues spotted by Daniel. v4->v5: Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN. The user process has to have - CAP_BPF to create maps, do other sys_bpf() commands and load SK_REUSEPORT progs. Note: dev_map, sock_hash, sock_map map types still require CAP_NET_ADMIN. That could be relaxed in the future. - CAP_BPF and CAP_PERFMON to load tracing programs. - CAP_BPF and CAP_NET_ADMIN to load networking programs. (or CAP_SYS_ADMIN for backward compatibility). CAP_BPF solves three main goals: 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF. More on this below. This is the major difference vs v4 set back from Sep 2019. 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN prevents pointer leaks and arbitrary kernel memory access. 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs and making BPF infra more secure. Currently fuzzers run in unpriv. They will be able to run with CAP_BPF. The patchset is long overdue follow-up from the last plumbers conference. Comparing to what was discussed at LPC the CAP* checks at attach time are gone. For tracing progs the CAP_SYS_ADMIN check was done at load time only. There was no check at attach time. For networking and cgroup progs CAP_SYS_ADMIN was required at load time and CAP_NET_ADMIN at attach time, but there are several ways to bypass CAP_NET_ADMIN: - if networking prog is using tail_call writing FD into prog_array will effectively attach it, but bpf_map_update_elem is an unprivileged operation. - freplace prog with CAP_SYS_ADMIN can replace networking prog Consolidating all CAP checks at load time makes security model similar to open() syscall. Once the user got an FD it can do everything with it. read/write/poll don't check permissions. The same way when bpf_prog_load command returns an FD the user can do everything (including attaching, detaching, and bpf_test_run). The important design decision is to allow ID->FD transition for CAP_SYS_ADMIN only. What it means that user processes can run with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each other unless they pass FDs via scm_rights or via pinning in bpffs. ID->FD is a mechanism for human override and introspection. An admin can do 'sudo bpftool prog ...'. It's possible to enforce via LSM that only bpftool binary does bpf syscall with CAP_SYS_ADMIN and the rest of user space processes do bpf syscall with CAP_BPF isolating bpf objects (progs, maps, links) that are owned by such processes from each other. Another significant change from LPC is that the verifier checks are split into four flags. The allow_ptr_leaks flag allows pointer manipulations. The bpf_capable flag enables all modern verifier features like bpf-to-bpf calls, BTF, bounded loops, dead code elimination, etc. All the goodness. The bypass_spec_v1 flag enables indirect stack access from bpf programs and disables speculative analysis and bpf array mitigations. The bypass_spec_v4 flag disables store sanitation. That allows networking progs with CAP_BPF + CAP_NET_ADMIN enjoy modern verifier features while being more secure. Some networking progs may need CAP_BPF + CAP_NET_ADMIN + CAP_PERFMON, since subtracting pointers (like skb->data_end - skb->data) is a pointer leak, but the verifier may get smarter in the future. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Alexei Starovoitov authored
Make all test_verifier test exercise CAP_BPF and CAP_PERFMON Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200513230355.7858-4-alexei.starovoitov@gmail.com
-
Alexei Starovoitov authored
Implement permissions as stated in uapi/linux/capability.h In order to do that the verifier allow_ptr_leaks flag is split into four flags and they are set as: env->allow_ptr_leaks = bpf_allow_ptr_leaks(); env->bypass_spec_v1 = bpf_bypass_spec_v1(); env->bypass_spec_v4 = bpf_bypass_spec_v4(); env->bpf_capable = bpf_capable(); The first three currently equivalent to perfmon_capable(), since leaking kernel pointers and reading kernel memory via side channel attacks is roughly equivalent to reading kernel memory with cap_perfmon. 'bpf_capable' enables bounded loops, precision tracking, bpf to bpf calls and other verifier features. 'allow_ptr_leaks' enable ptr leaks, ptr conversions, subtraction of pointers. 'bypass_spec_v1' disables speculative analysis in the verifier, run time mitigations in bpf array, and enables indirect variable access in bpf programs. 'bypass_spec_v4' disables emission of sanitation code by the verifier. That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will have speculative checks done by the verifier and other spectre mitigation applied. Such networking BPF program will not be able to leak kernel pointers and will not be able to access arbitrary kernel memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com
-
Alexei Starovoitov authored
Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN. For backward compatibility include them in CAP_SYS_ADMIN as well. The end result provides simple safety model for applications that use BPF: - to load tracing program types BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc} use CAP_BPF and CAP_PERFMON - to load networking program types BPF_PROG_TYPE_{SCHED_CLS, XDP, SK_SKB, etc} use CAP_BPF and CAP_NET_ADMIN There are few exceptions from this rule: - bpf_trace_printk() is allowed in networking programs, but it's using tracing mechanism, hence this helper needs additional CAP_PERFMON if networking program is using this helper. - BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only to discourage production use. - BPF HW offload is allowed under CAP_SYS_ADMIN. - bpf_probe_write_user() is allowed under CAP_SYS_ADMIN only. CAPs are not checked at attach/detach time with two exceptions: - loading BPF_PROG_TYPE_CGROUP_SKB is allowed for unprivileged users, hence CAP_NET_ADMIN is required at attach time. - flow_dissector detach doesn't check prog FD at detach, hence CAP_NET_ADMIN is required at detach time. CAP_SYS_ADMIN is required to iterate BPF objects (progs, maps, links) via get_next_id command and convert them to file descriptor via GET_FD_BY_ID command. This restriction guarantees that mutliple tasks with CAP_BPF are not able to affect each other. That leads to clean isolation of tasks. For example: task A with CAP_BPF and CAP_NET_ADMIN loads and attaches a firewall via bpf_link. task B with the same capabilities cannot detach that firewall unless task A explicitly passed link FD to task B via scm_rights or bpffs. CAP_SYS_ADMIN can still detach/unload everything. Two networking user apps with CAP_SYS_ADMIN and CAP_NET_ADMIN can accidentely mess with each other programs and maps. Two networking user apps with CAP_NET_ADMIN and CAP_BPF cannot affect each other. CAP_NET_ADMIN + CAP_BPF allows networking programs access only packet data. Such networking progs cannot access arbitrary kernel memory or leak pointers. bpftool, bpftrace, bcc tools binaries should NOT be installed with CAP_BPF and CAP_PERFMON, since unpriv users will be able to read kernel secrets. But users with these two permissions will be able to use these tracing tools. CAP_PERFMON is least secure, since it allows kprobes and kernel memory access. CAP_NET_ADMIN can stop network traffic via iproute2. CAP_BPF is the safest from security point of view and harmless on its own. Having CAP_BPF and/or CAP_NET_ADMIN is not enough to write into arbitrary map and if that map is used by firewall-like bpf prog. CAP_BPF allows many bpf prog_load commands in parallel. The verifier may consume large amount of memory and significantly slow down the system. Existing unprivileged BPF operations are not affected. In particular unprivileged users are allowed to load socket_filter and cg_skb program types and to create array, hash, prog_array, map-in-map map types. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200513230355.7858-2-alexei.starovoitov@gmail.com
-
Daniel Borkmann authored
In Cilium we've recently switched to make use of bpf_jiffies64() for parts of our tc and XDP datapath since bpf_ktime_get_ns() is more expensive and high-precision is not needed for our timeouts we have anyway. Our agent has a probe manager which picks up the json of bpftool's feature probe and we also use the macro output in our C programs e.g. to have workarounds when helpers are not available on older kernels. Extend the kernel config info dump to also include the kernel's CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a macro dump such that CONFIG_HZ can be propagated to BPF C code as a simple define if available via config. Latter allows to have _compile- time_ resolution of jiffies <-> sec conversion in our code since all are propagated as known constants. Given we cannot generally assume availability of kconfig everywhere, we also have a kernel hz probe [0] as a fallback. Potentially, bpftool could have an integrated probe fallback as well, although to derive it, we might need to place it under 'bpftool feature probe full' or similar given it would slow down the probing process overall. Yet 'full' doesn't fit either for us since we don't want to pollute the kernel log with warning messages from bpf_probe_write_user() and bpf_trace_printk() on agent startup; I've left it out for the time being. [0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.cSigned-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200513075849.20868-1-daniel@iogearbox.net
-
Alexei Starovoitov authored
Daniel Borkmann says: ==================== Small set of fixes in order to restrict BPF helpers for tracing which are broken on archs with overlapping address ranges as per discussion in [0]. I've targetted this for -bpf tree so they can be routed as fixes. Thanks! v1 -> v2: - switch to reusable %pks, %pus format specifiers (Yonghong) - fixate %s on kernel_ds probing for archs with overlapping addr space [0] https://lore.kernel.org/bpf/CAHk-=wjJKo0GVixYLmqPn-Q22WFu0xHaBSjKEo7e7Yw72y5SPQ@mail.gmail.com/T/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Usage of plain %s conversion specifier in bpf_trace_printk() suffers from the very same issue as bpf_probe_read{,str}() helpers, that is, it is broken on archs with overlapping address ranges. While the helpers have been addressed through work in 6ae08ae3 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers"), we need an option for bpf_trace_printk() as well to fix it. Similarly as with the helpers, force users to make an explicit choice by adding %pks and %pus specifier to bpf_trace_printk() which will then pick the corresponding strncpy_from_unsafe*() variant to perform the access under KERNEL_DS or USER_DS. The %pk* (kernel specifier) and %pu* (user specifier) can later also be extended for other objects aside strings that are probed and printed under tracing, and reused out of other facilities like bpf_seq_printf() or BTF based type printing. Existing behavior of %s for current users is still kept working for archs where it is not broken and therefore gated through CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. For archs not having this property we fall-back to pick probing under KERNEL_DS as a sensible default. Fixes: 8d3b7dce ("bpf: add support for %s specifier to bpf_trace_printk()") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Link: https://lore.kernel.org/bpf/20200515101118.6508-4-daniel@iogearbox.net
-
Daniel Borkmann authored
Given bpf_probe_read{,str}() BPF helpers are now only available under CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE, we need to add the drop-in replacements of bpf_probe_read_{kernel,user}_str() to do_refine_retval_range() as well to avoid hitting the same issue as in 849fa506 ("bpf/verifier: refine retval R0 state for bpf_get_stack helper"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200515101118.6508-3-daniel@iogearbox.net
-
Daniel Borkmann authored
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs with overlapping address ranges, we should really take the next step to disable them from BPF use there. To generally fix the situation, we've recently added new helper variants bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str(). For details on them, see 6ae08ae3 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,kernel}_str helpers"). Given bpf_probe_read{,str}() have been around for ~5 years by now, there are plenty of users at least on x86 still relying on them today, so we cannot remove them entirely w/o breaking the BPF tracing ecosystem. However, their use should be restricted to archs with non-overlapping address ranges where they are working in their current form. Therefore, move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and have x86, arm64, arm select it (other archs supporting it can follow-up on it as well). For the remaining archs, they can workaround easily by relying on the feature probe from bpftool which spills out defines that can be used out of BPF C code to implement the drop-in replacement for old/new kernels via: bpftool feature probe macro Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
-
git://anongit.freedesktop.org/drm/drm-miscDave Airlie authored
Just one meson patch this time to propagate an error code Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200514073538.wvdtv5s2mt4wdrdj@gilmour.lan
-