1. 13 Oct, 2015 5 commits
    • Edward Jee's avatar
      sock: support per-packet fwmark · f28ea365
      Edward Jee authored
      It's useful to allow users to set fwmark for an individual packet,
      without changing the socket state. The function this patch adds in
      sock layer can be used by the protocols that need such a feature.
      Signed-off-by: default avatarEdward Hyunkoo Jee <edjee@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f28ea365
    • David S. Miller's avatar
      Merge branch 'bpf-unprivileged' · c1bf5fe0
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf: unprivileged
      
      v1-v2:
      - this set logically depends on cb patch
        "bpf: fix cb access in socket filter programs":
        http://patchwork.ozlabs.org/patch/527391/
        which is must have to allow unprivileged programs.
        Thanks Daniel for finding that issue.
      - refactored sysctl to be similar to 'modules_disabled'
      - dropped bpf_trace_printk
      - split tests into separate patch and added more tests
        based on discussion
      
      v1 cover letter:
      I think it is time to liberate eBPF from CAP_SYS_ADMIN.
      As was discussed when eBPF was first introduced two years ago
      the only piece missing in eBPF verifier is 'pointer leak detection'
      to make it available to non-root users.
      Patch 1 adds this pointer analysis.
      The eBPF programs, obviously, need to see and operate on kernel addresses,
      but with these extra checks they won't be able to pass these addresses
      to user space.
      Patch 2 adds accounting of kernel memory used by programs and maps.
      It changes behavoir for existing root users, but I think it needs
      to be done consistently for both root and non-root, since today
      programs and maps are only limited by number of open FDs (RLIMIT_NOFILE).
      Patch 2 accounts program's and map's kernel memory as RLIMIT_MEMLOCK.
      
      Unprivileged eBPF is only meaningful for 'socket filter'-like programs.
      eBPF programs for tracing and TC classifiers/actions will stay root only.
      
      In parallel the bpf fuzzing effort is ongoing and so far
      we've found only one verifier bug and that was already fixed.
      The 'constant blinding' pass also being worked on.
      It will obfuscate constant-like values that are part of eBPF ISA
      to make jit spraying attacks even harder.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1bf5fe0
    • Alexei Starovoitov's avatar
      bpf: add unprivileged bpf tests · bf508877
      Alexei Starovoitov authored
      Add new tests samples/bpf/test_verifier:
      
      unpriv: return pointer
        checks that pointer cannot be returned from the eBPF program
      
      unpriv: add const to pointer
      unpriv: add pointer to pointer
      unpriv: neg pointer
        checks that pointer arithmetic is disallowed
      
      unpriv: cmp pointer with const
      unpriv: cmp pointer with pointer
        checks that comparison of pointers is disallowed
        Only one case allowed 'void *value = bpf_map_lookup_elem(..); if (value == 0) ...'
      
      unpriv: check that printk is disallowed
        since bpf_trace_printk is not available to unprivileged
      
      unpriv: pass pointer to helper function
        checks that pointers cannot be passed to functions that expect integers
        If function expects a pointer the verifier allows only that type of pointer.
        Like 1st argument of bpf_map_lookup_elem() must be pointer to map.
        (applies to non-root as well)
      
      unpriv: indirectly pass pointer on stack to helper function
        checks that pointer stored into stack cannot be used as part of key
        passed into bpf_map_lookup_elem()
      
      unpriv: mangle pointer on stack 1
      unpriv: mangle pointer on stack 2
        checks that writing into stack slot that already contains a pointer
        is disallowed
      
      unpriv: read pointer from stack in small chunks
        checks that < 8 byte read from stack slot that contains a pointer is
        disallowed
      
      unpriv: write pointer into ctx
        checks that storing pointers into skb->fields is disallowed
      
      unpriv: write pointer into map elem value
        checks that storing pointers into element values is disallowed
        For example:
        int bpf_prog(struct __sk_buff *skb)
        {
          u32 key = 0;
          u64 *value = bpf_map_lookup_elem(&map, &key);
          if (value)
             *value = (u64) skb;
        }
        will be rejected.
      
      unpriv: partial copy of pointer
        checks that doing 32-bit register mov from register containing
        a pointer is disallowed
      
      unpriv: pass pointer to tail_call
        checks that passing pointer as an index into bpf_tail_call
        is disallowed
      
      unpriv: cmp map pointer with zero
        checks that comparing map pointer with constant is disallowed
      
      unpriv: write into frame pointer
        checks that frame pointer is read-only (applies to root too)
      
      unpriv: cmp of frame pointer
        checks that R10 cannot be using in comparison
      
      unpriv: cmp of stack pointer
        checks that Rx = R10 - imm is ok, but comparing Rx is not
      
      unpriv: obfuscate stack pointer
        checks that Rx = R10 - imm is ok, but Rx -= imm is not
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf508877
    • Alexei Starovoitov's avatar
      bpf: charge user for creation of BPF maps and programs · aaac3ba9
      Alexei Starovoitov authored
      since eBPF programs and maps use kernel memory consider it 'locked' memory
      from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
      This limit is typically set to 64Kbytes by distros, so almost all
      bpf+tracing programs would need to increase it, since they use maps,
      but kernel charges maximum map size upfront.
      For example the hash map of 1024 elements will be charged as 64Kbyte.
      It's inconvenient for current users and changes current behavior for root,
      but probably worth doing to be consistent root vs non-root.
      
      Similar accounting logic is done by mmap of perf_event.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaac3ba9
    • Alexei Starovoitov's avatar
      bpf: enable non-root eBPF programs · 1be7f75d
      Alexei Starovoitov authored
      In order to let unprivileged users load and execute eBPF programs
      teach verifier to prevent pointer leaks.
      Verifier will prevent
      - any arithmetic on pointers
        (except R10+Imm which is used to compute stack addresses)
      - comparison of pointers
        (except if (map_value_ptr == 0) ... )
      - passing pointers to helper functions
      - indirectly passing pointers in stack to helper functions
      - returning pointer from bpf program
      - storing pointers into ctx or maps
      
      Spill/fill of pointers into stack is allowed, but mangling
      of pointers stored in the stack or reading them byte by byte is not.
      
      Within bpf programs the pointers do exist, since programs need to
      be able to access maps, pass skb pointer to LD_ABS insns, etc
      but programs cannot pass such pointer values to the outside
      or obfuscate them.
      
      Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
      so that socket filters (tcpdump), af_packet (quic acceleration)
      and future kcm can use it.
      tracing and tc cls/act program types still require root permissions,
      since tracing actually needs to be able to see all kernel pointers
      and tc is for root only.
      
      For example, the following unprivileged socket filter program is allowed:
      int bpf_prog1(struct __sk_buff *skb)
      {
        u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
        u64 *value = bpf_map_lookup_elem(&my_map, &index);
      
        if (value)
      	*value += skb->len;
        return 0;
      }
      
      but the following program is not:
      int bpf_prog1(struct __sk_buff *skb)
      {
        u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
        u64 *value = bpf_map_lookup_elem(&my_map, &index);
      
        if (value)
      	*value += (u64) skb;
        return 0;
      }
      since it would leak the kernel address into the map.
      
      Unprivileged socket filter bpf programs have access to the
      following helper functions:
      - map lookup/update/delete (but they cannot store kernel pointers into them)
      - get_random (it's already exposed to unprivileged user space)
      - get_smp_processor_id
      - tail_call into another socket filter program
      - ktime_get_ns
      
      The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
      This toggle defaults to off (0), but can be set true (1).  Once true,
      bpf programs and maps cannot be accessed from unprivileged process,
      and the toggle cannot be set back to false.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1be7f75d
  2. 12 Oct, 2015 9 commits
  3. 11 Oct, 2015 11 commits
  4. 09 Oct, 2015 15 commits
    • David S. Miller's avatar
      Merge branch 'net-non-modular' · d49ae37c
      David S. Miller authored
      Paul Gortmaker says:
      
      ====================
      make non-modular code explicitly non-modular
      
      [v2: drop m68k patches that Geert converted to modules; add one ARM
       driver patch ; update net-next baseline to today; switch to ARM
       for build testing.]
      
      In a previous merge window, we made changes to allow better
      delineation between modular and non-modular code in commit
      0fd972a7 ("module: relocate module_init
      from init.h to module.h").  This allows us to now ensure module code
      looks modular and non-modular code does not accidentally look modular
      just to avoid suffering build breakage.
      
      Here we target code that is, by nature of their Makefile and/or
      Kconfig settings, only available to be built-in, but implicitly
      presenting itself as being possibly modular by way of using modular
      headers, macros, and functions.
      
      The goal here is to remove that illusion of modularity from these
      files, but in a way that leaves the actual runtime unchanged.
      In doing so, we remove code that has never been tested and adds
      no value to the tree.  And we continue the process of expecting a
      level of consistency between the Kconfig/Makefile of code and the
      code in use itself.
      
      Fortuntately the net subsystem has relatively few instances, given
      the overall amount of code and drivers it contains.  For comparison
      there are over 300 instances tree wide, resulting in a possible net
      removal of on the order of 5000 lines of unused code.
      
      Build tested on net-next from today, on ARM, since that is the arch
      where the one ethernet driver changed here is available.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d49ae37c
    • Paul Gortmaker's avatar
      drivers/net/ethernet: make ti/cpsw-phy-sel.c explicitly non-modular · b3c8ec35
      Paul Gortmaker authored
      The Kconfig currently controlling compilation of this code is:
      
      drivers/net/ethernet/ti/Kconfig:config TI_CPSW_PHY_SEL
      drivers/net/ethernet/ti/Kconfig:        bool "TI CPSW Switch Phy sel Support"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the couple traces of modularity so that when reading the
      driver there is no doubt it is builtin-only.
      
      Since module_platform_driver() uses the same init level priority as
      builtin_platform_driver() the init ordering remains unchanged with
      this commit.
      
      Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code.
      
      We also delete the MODULE_LICENSE tag etc. since all that information
      was (or is now) contained at the top of the file in the comments.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Varka Bhadram <varkabhadram@gmail.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3c8ec35
    • Paul Gortmaker's avatar
      net/sched: make sch_blackhole.c explicitly non-modular · 075640e3
      Paul Gortmaker authored
      The Kconfig currently controlling compilation of this code is:
      
      net/sched/Kconfig:menuconfig NET_SCHED
      net/sched/Kconfig:      bool "QoS and/or fair queueing"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the modular code that is essentially orphaned, so that
      when reading the driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular
      case, the init ordering remains unchanged with this commit.  We can
      change to one of the other priority initcalls (subsys?) at any later
      date, if desired.
      
      We also delete the MODULE_LICENSE tag since all that information
      is already contained at the top of the file in the comments.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      075640e3
    • Paul Gortmaker's avatar
      net/dcb: make dcbnl.c explicitly non-modular · 36b9ad80
      Paul Gortmaker authored
      The Kconfig currently controlling compilation of this code is:
      
      net/dcb/Kconfig:config DCB
      net/dcb/Kconfig:        bool "Data Center Bridging support"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the modular code that is essentially orphaned, so that
      when reading the driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular
      case, the init ordering remains unchanged with this commit.  We can
      change to one of the other priority initcalls (subsys?) at any later
      date, if desired.
      
      We also delete the MODULE_LICENSE tag etc. since all that information
      is (or is now) already contained at the top of the file in the comments.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Anish Bhatt <anish@chelsio.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Shani Michaeli <shanim@mellanox.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36b9ad80
    • Paul Gortmaker's avatar
      net/core: make sock_diag.c explicitly non-modular · b6191aee
      Paul Gortmaker authored
      The Makefile currently controlling compilation of this code lists
      it under "obj-y" ...meaning that it currently is not being built as
      a module by anyone.
      
      Lets remove the modular code that is essentially orphaned, so that
      when reading the driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular
      case, the init ordering remains unchanged with this commit.  We can
      change to one of the other priority initcalls (subsys?) at any later
      date, if desired.
      
      We can't remove module.h since the file uses other module related
      stuff even though it is not modular itself.
      
      We move the information from the MODULE_LICENSE tag to the top of the
      file, since that information is not captured anywhere else.  The
      MODULE_ALIAS_NET_PF_PROTO becomes a no-op in the non modular case, so
      it is removed.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Craig Gallek <kraig@google.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6191aee
    • David S. Miller's avatar
      Merge branch 'net-bool' · 4d886d65
      David S. Miller authored
      Yaowei Bai says:
      
      ====================
      net: small improvement
      
      This patchset makes several functions in net return bool to improve
      readability and/or simplicity because these functions only use one
      or zero as their return value.
      
      No functional changes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d886d65
    • Yaowei Bai's avatar
      net/core: lockdep_rtnl_is_held can be boolean · 0cbf3343
      Yaowei Bai authored
      This patch makes lockdep_rtnl_is_held return bool due to this
      particular function only using either one or zero as its return
      value.
      
      In another patch lockdep_is_held is also made return bool.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cbf3343
    • Yaowei Bai's avatar
      net/inetdevice: bad_mask can be boolean · f06cc7b2
      Yaowei Bai authored
      This patch makes bad_mask return bool due to this particular function
      only using either one or zero as its return value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f06cc7b2
    • Yaowei Bai's avatar
      net/inetdevice: inet_ifa_match can be boolean · c3225164
      Yaowei Bai authored
      This patch makes inet_ifa_match return bool due to this
      particular function only using either one or zero as its return
      value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3225164
    • Yaowei Bai's avatar
      net/dccp: dccp_bad_service_code can be boolean · 45ae74f5
      Yaowei Bai authored
      This patch makes dccp_bad_service_code return bool due to these
      particular functions only using either one or zero as their return
      value.
      
      dccp_list_has_service is also been made return bool in this patchset.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45ae74f5
    • Yaowei Bai's avatar
      net/dccp: dccp_list_has_service can be boolean · 0c6119d9
      Yaowei Bai authored
      This patch makes dccp_list_has_service return bool due to this
      particular function only using either one or zero as its return
      value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c6119d9
    • Yaowei Bai's avatar
      net/can: can_dropped_invalid_skb can be boolean · d6fbaea5
      Yaowei Bai authored
      This patch makes can_dropped_invalid_skb return bool due to this
      particular function only using either one or zero as its return
      value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Acked-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6fbaea5
    • Yaowei Bai's avatar
      net/nfnetlink: lockdep_nfnl_is_held can be boolean · 875e0829
      Yaowei Bai authored
      This patch makes lockdep_nfnl_is_held return bool to improve
      readability due to this particular function only using either
      one or zero as its return value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      875e0829
    • Yaowei Bai's avatar
      net/ieee80211: ieee80211_is_* can be boolean · 35498edc
      Yaowei Bai authored
      This patch makes ieee80211_is_* return bool to improve
      readability due to these particular functions only using either
      one or zero as their return value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35498edc
    • Yaowei Bai's avatar
      net/netlink: lockdep_genl_is_held can be boolean · 61d03535
      Yaowei Bai authored
      This patch makes lockdep_genl_is_held return bool to improve
      readability due to this particular function only using either
      one or zero as its return value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <bywxiaobai@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61d03535