1. 21 Oct, 2017 17 commits
  2. 20 Oct, 2017 23 commits
    • David S. Miller's avatar
      Merge branch 'bpf-lsm-hooks' · 7f9ad2ac
      David S. Miller authored
      Chenbo Feng says:
      
      ====================
      bpf: security: New file mode and LSM hooks for eBPF object permission control
      
      Much like files and sockets, eBPF objects are accessed, controlled, and
      shared via a file descriptor (FD). Unlike files and sockets, the
      existing mechanism for eBPF object access control is very limited.
      Currently there are two options for granting accessing to eBPF
      operations: grant access to all processes, or only CAP_SYS_ADMIN
      processes. The CAP_SYS_ADMIN-only mode is not ideal because most users
      do not have this capability and granting a user CAP_SYS_ADMIN grants too
      many other security-sensitive permissions. It also unnecessarily allows
      all CAP_SYS_ADMIN processes access to eBPF functionality. Allowing all
      processes to access to eBPF objects is also undesirable since it has
      potential to allow unprivileged processes to consume kernel memory, and
      opens up attack surface to the kernel.
      
      Adding LSM hooks maintains the status quo for systems which do not use
      an LSM, preserving compatibility with userspace, while allowing security
      modules to choose how best to handle permissions on eBPF objects. Here
      is a possible use case for the lsm hooks with selinux module:
      
      The network-control daemon (netd) creates and loads an eBPF object for
      network packet filtering and analysis. It passes the object FD to an
      unprivileged network monitor app (netmonitor), which is not allowed to
      create, modify or load eBPF objects, but is allowed to read the traffic
      stats from the map.
      
      Selinux could use these hooks to grant the following permissions:
      allow netd self:bpf_map { create read write};
      allow netmonitor netd:fd use;
      allow netmonitor netd:bpf_map read;
      
      In this patch series, A file mode is added to bpf map to store the
      accessing mode. With this file mode flags, the map can be obtained read
      only, write only or read and write. With the help of this file mode,
      several security hooks can be added to the eBPF syscall implementations
      to do permissions checks. These LSM hooks are mainly focused on checking
      the process privileges before it obtains the fd for a specific bpf
      object. No matter from a file location or from a eBPF id. Besides that,
      a general check hook is also implemented at the start of bpf syscalls so
      that each security module can have their own implementation on the reset
      of bpf object related functionalities.
      
      In order to store the ownership and security information about eBPF
      maps, a security field pointer is added to the struct bpf_map. And the
      last two patch set are implementation of selinux check on these hooks
      introduced, plus an additional check when eBPF object is passed between
      processes using unix socket as well as binder IPC.
      
      Change since V1:
      
       - Whitelist the new bpf flags in the map allocate check.
       - Added bpf selftest for the new flags.
       - Added two new security hooks for copying the security information from
         the bpf object security struct to file security struct
       - Simplified the checking action when bpf fd is passed between processes.
      
       Change since V2:
      
       - Fixed the line break problem for map flags check
       - Fixed the typo in selinux check of file mode.
       - Merge bpf_map and bpf_prog into one selinux class
       - Added bpf_type and bpf_sid into file security struct to store the
         security information when generate fd.
       - Add the hook to bpf_map_new_fd and bpf_prog_new_fd.
      
       Change since V3:
      
       - Return the actual error from security check instead of -EPERM
       - Move the hooks into anon_inode_getfd() to avoid get file again after
         bpf object file is installed with fd.
       - Removed the bpf_sid field inside file_scerity_struct to reduce the
         cache size.
      
       Change since V4:
      
       - Rename bpf av prog_use to prog_run to distinguish from fd_use.
       - Remove the bpf_type field inside file_scerity_struct and use bpf fops
         to indentify bpf object instead.
      
       Change since v5:
      
       - Fixed the incorrect selinux class name for SECCLASS_BPF
      
       Change since v7:
      
       - Fixed the build error caused by xt_bpf module.
       - Add flags check for bpf_obj_get() and bpf_map_get_fd_by_id() to make it
         uapi-wise.
       - Add the flags field to the bpf_obj_get_user function when BPF_SYSCALL
         is not configured.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f9ad2ac
    • Chenbo Feng's avatar
      selinux: bpf: Add addtional check for bpf object file receive · f66e448c
      Chenbo Feng authored
      Introduce a bpf object related check when sending and receiving files
      through unix domain socket as well as binder. It checks if the receiving
      process have privilege to read/write the bpf map or use the bpf program.
      This check is necessary because the bpf maps and programs are using a
      anonymous inode as their shared inode so the normal way of checking the
      files and sockets when passing between processes cannot work properly on
      eBPF object. This check only works when the BPF_SYSCALL is configured.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Reviewed-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f66e448c
    • Chenbo Feng's avatar
      selinux: bpf: Add selinux check for eBPF syscall operations · ec27c356
      Chenbo Feng authored
      Implement the actual checks introduced to eBPF related syscalls. This
      implementation use the security field inside bpf object to store a sid that
      identify the bpf object. And when processes try to access the object,
      selinux will check if processes have the right privileges. The creation
      of eBPF object are also checked at the general bpf check hook and new
      cmd introduced to eBPF domain can also be checked there.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec27c356
    • Chenbo Feng's avatar
      security: bpf: Add LSM hooks for bpf object related syscall · afdb09c7
      Chenbo Feng authored
      Introduce several LSM hooks for the syscalls that will allow the
      userspace to access to eBPF object such as eBPF programs and eBPF maps.
      The security check is aimed to enforce a per object security protection
      for eBPF object so only processes with the right priviliges can
      read/write to a specific map or use a specific eBPF program. Besides
      that, a general security hook is added before the multiplexer of bpf
      syscall to check the cmd and the attribute used for the command. The
      actual security module can decide which command need to be checked and
      how the cmd should be checked.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afdb09c7
    • Chenbo Feng's avatar
      bpf: Add tests for eBPF file mode · e043325b
      Chenbo Feng authored
      Two related tests are added into bpf selftest to test read only map and
      write only map. The tests verified the read only and write only flags
      are working on hash maps.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e043325b
    • Chenbo Feng's avatar
      bpf: Add file mode configuration into bpf maps · 6e71b04a
      Chenbo Feng authored
      Introduce the map read/write flags to the eBPF syscalls that returns the
      map fd. The flags is used to set up the file mode when construct a new
      file descriptor for bpf maps. To not break the backward capability, the
      f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
      it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
      read the map content, it will check the file mode to see if it is
      allowed to make the change.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e71b04a
    • Eric Dumazet's avatar
      net-tun: fix panics at dismantle time · aec72f33
      Eric Dumazet authored
      syzkaller got crashes at dismantle time [1]
      
      It is not correct to test (tun->flags & IFF_NAPI) in tun_napi_disable()
      and tun_napi_del() : Each tun_file can have different mode, depending
      on how they were created.
      
      Similarly I have changed tun_get_user() and tun_poll_controller()
      to use the new tfile->napi_enabled boolean.
      
      [  154.331360] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  154.339220] IP: [<ffffffff9634cad6>] hrtimer_active+0x26/0x60
      [  154.344983] PGD 0
      [  154.347009] Oops: 0000 [#1] SMP
      [  154.350680] gsmi: Log Shutdown Reason 0x03
      [  154.379572] task: ffff994719150dc0 ti: ffff99475c0ae000 task.ti: ffff99475c0ae000
      [  154.387043] RIP: 0010:[<ffffffff9634cad6>]  [<ffffffff9634cad6>] hrtimer_active+0x26/0x60
      [  154.395232] RSP: 0018:ffff99475c0afce8  EFLAGS: 00010246
      [  154.400542] RAX: ffff994754850ac0 RBX: ffff994753e65408 RCX: ffff994753e65388
      [  154.407666] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff994753e65408
      [  154.414790] RBP: ffff99475c0afce8 R08: 0000000000000000 R09: 0000000000000000
      [  154.421921] R10: ffff99475f6f5910 R11: 0000000000000001 R12: 0000000000000000
      [  154.429044] R13: ffff99417deab668 R14: ffff99417deaa780 R15: ffff99475f45dde0
      [  154.436174] FS:  0000000000000000(0000) GS:ffff994767a00000(0000) knlGS:0000000000000000
      [  154.444249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  154.449986] CR2: 0000000000000000 CR3: 00000005a8a0e000 CR4: 0000000000022670
      [  154.457110] Stack:
      [  154.459120]  ffff99475c0afd28 ffffffff9634d614 1000000000000000 0000000000000000
      [  154.466598]  ffffe54240000000 ffff994753e65408 ffff994753e653a8 ffff99417deab668
      [  154.474067]  ffff99475c0afd48 ffffffff9634d6fd ffff99474c2be678 ffff994753e65398
      [  154.481537] Call Trace:
      [  154.483985]  [<ffffffff9634d614>] hrtimer_try_to_cancel+0x24/0xf0
      [  154.490074]  [<ffffffff9634d6fd>] hrtimer_cancel+0x1d/0x30
      [  154.495563]  [<ffffffff96860b3c>] napi_disable+0x3c/0x70
      [  154.500875]  [<ffffffff9678ae62>] __tun_detach+0xd2/0x360
      [  154.506272]  [<ffffffff9678b117>] tun_chr_close+0x27/0x40
      [  154.511669]  [<ffffffff9646ebe6>] __fput+0xd6/0x1e0
      [  154.516548]  [<ffffffff9646ed3e>] ____fput+0xe/0x10
      [  154.521429]  [<ffffffff963035a2>] task_work_run+0x72/0x90
      [  154.526827]  [<ffffffff962e9407>] do_exit+0x317/0xb60
      [  154.531879]  [<ffffffff962e9c8f>] do_group_exit+0x3f/0xa0
      [  154.537275]  [<ffffffff962e9d07>] SyS_exit_group+0x17/0x20
      [  154.542769]  [<ffffffff969784be>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      Fixes: 94317099 ("net-tun: enable NAPI for TUN/TAP driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aec72f33
    • David Ahern's avatar
      net: ipv4: Change fib notifiers to take a fib_alias · 6eba87c7
      David Ahern authored
      All of the notifier data (fib_info, tos, type and table id) are
      contained in the fib_alias. Pass it to the notifier instead of
      each data separately shortening the argument list by 3.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6eba87c7
    • Yuchung Cheng's avatar
      tcp: socket option to set TCP fast open key · 1fba70e5
      Yuchung Cheng authored
      New socket option TCP_FASTOPEN_KEY to allow different keys per
      listener.  The listener by default uses the global key until the
      socket option is set.  The key is a 16 bytes long binary data. This
      option has no effect on regular non-listener TCP sockets.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fba70e5
    • David S. Miller's avatar
      Merge branch 'mlxsw-extack' · ce12f7dd
      David S. Miller authored
      David Ahern says:
      
      ====================
      mlxsw: spectrum_router: Add extack messages for RIF and VRF overflow
      
      Currently, exceeding the number of VRF instances or the number of router
      interfaces either fails with a non-intuitive EBUSY:
          $ ip li set swp1s1.6 vrf vrf-1s1-6 up
          RTNETLINK answers: Device or resource busy
      
      or fails silently (IPv6) since the checks are done in a work queue. This
      set adds support for the address validator notifier to spectrum which
      allows ext-ack based messages to be returned on failure.
      
      To make that happen the IPv6 version needs to be converted from atomic
      to blocking (patch 2), and then support for extack needs to be added
      to the notifier (patch 3). Patch 1 reworks the locking in ipv6_add_addr
      to work better in the atomic and non-atomic code paths. Patches 4 and 5
      add the validator notifier to spectrum and then plumb the extack argument
      through spectrum_router.
      
      With this set, VRF overflows fail with:
         $ ip li set swp1s1.6 vrf vrf-1s1-6 up
         Error: spectrum: Exceeded number of supported VRF.
      
      and RIF overflows fail with:
         $ ip addr add dev swp1s2.191 10.12.191.1/24
         Error: spectrum: Exceeded number of supported router interfaces.
      
      v2 -> v3
      - fix surround context of patch 4 which was altered by c30f5d01
      
      v1 -> v2
      - fix error path in ipv6_add_addr: reset rt to NULL (Ido comment) and
        add in6_dev_put on ifa once the hold has been done
      
      RFC -> v1
      - addressed various comments from Ido
      - refactored ipv6_add_addr to allow ifa's to be allocated with
        GFP_KERNEL as requested by DaveM
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce12f7dd
    • David Ahern's avatar
      mlxsw: spectrum_router: Add extack message for RIF and VRF overflow · f8fa9b4e
      David Ahern authored
      Add extack argument down to mlxsw_sp_rif_create and mlxsw_sp_vr_create
      to set an error message on RIF or VR overflow. Now on overflow of
      either resource the user gets an informative message as opposed to
      failing with EBUSY.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8fa9b4e
    • David Ahern's avatar
      mlxsw: spectrum: router: Add support for address validator notifier · 89d5dd2e
      David Ahern authored
      Add support for inetaddr_validator and inet6addr_validator. The
      notifiers provide a means for validating ipv4 and ipv6 addresses
      before the addresses are installed and on failure the error
      is propagated back to the user.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89d5dd2e
    • David Ahern's avatar
      net: Add extack to validator_info structs used for address notifier · de95e047
      David Ahern authored
      Add extack to in_validator_info and in6_validator_info. Update the one
      user of each, ipvlan, to return an error message for failures.
      
      Only manual configuration of an address is plumbed in the IPv6 code path.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de95e047
    • David Ahern's avatar
      net: ipv6: Make inet6addr_validator a blocking notifier · ff7883ea
      David Ahern authored
      inet6addr_validator chain was added by commit 3ad7d246 ("Ipvlan
      should return an error when an address is already in use") to allow
      address validation before changes are committed and to be able to
      fail the address change with an error back to the user. The address
      validation is not done for addresses received from router
      advertisements.
      
      Handling RAs in softirq context is the only reason for the notifier
      chain to be atomic versus blocking. Since the only current user, ipvlan,
      of the validator chain ignores softirq context, the notifier can be made
      blocking and simply not invoked for softirq path.
      
      The blocking option is needed by spectrum for example to validate
      resources for an adding an address to an interface.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff7883ea
    • David Ahern's avatar
      ipv6: addrconf: cleanup locking in ipv6_add_addr · f3d9832e
      David Ahern authored
      ipv6_add_addr is called in process context with rtnl lock held
      (e.g., manual config of an address) or during softirq processing
      (e.g., autoconf and address from a router advertisement).
      
      Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry
      and does not call unlock until exit, minus the call around the address
      validator notifier. Similarly, addrconf_hash_lock is taken after the
      validator notifier and held until exit. This forces the allocation of
      inet6_ifaddr to always be atomic.
      
      Refactor ipv6_add_addr as follows:
      1. add an input boolean to discriminate the call path (process context
         or softirq). This new flag controls whether the alloc can be done
         with GFP_KERNEL or GFP_ATOMIC.
      
      2. Move the rcu_read_lock_bh and unlock calls only around functions that
         do rcu updates.
      
      3. Remove the in6_dev_hold and put added by 3ad7d246 ("Ipvlan should
         return an error when an address is already in use."). This was done
         presumably because rcu_read_unlock_bh needs to be called before calling
         the validator. Since rcu_read_lock is not needed before the validator
         runs revert the hold and put added by 3ad7d246 and only do the
         hold when setting ifp->idev.
      
      4. move duplicate address check and insertion of new address in the global
         address hash into a helper. The helper is called after an ifa is
         allocated and filled in.
      
      This allows the ifa for manually configured addresses to be done with
      GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held
      and hash table spinlock held.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3d9832e
    • David S. Miller's avatar
      Merge branch 's390-next' · 6b1f8eda
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/net: updates 2017-10-18
      
      please apply some additional robustness fixes and cleanups for 4.15.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b1f8eda
    • Julian Wiedmann's avatar
      s390/qeth: don't dump control cmd twice · 52c44d29
      Julian Wiedmann authored
      A few lines down, qeth_prepare_control_data() makes further changes to
      the control cmd buffer, and then also writes a trace entry for it.
      So the first entry just pollutes the trace file with intermediate data,
      drop it.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52c44d29
    • Julian Wiedmann's avatar
      s390/qeth: support GRO flush timer · 978759e8
      Julian Wiedmann authored
      Switch to napi_complete_done(), and thus enable delayed GRO flushing.
      The timeout is configured via /sys/class/net/<if>/gro_flush_timeout.
      
      Default timeout is 0, so no change in behaviour.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      978759e8
    • Julian Wiedmann's avatar
      s390/qeth: try harder to get packets from RX buffer · 864c17c3
      Julian Wiedmann authored
      Current code bails out when two subsequent buffer elements hold
      insufficient data to contain a qeth_hdr packet descriptor.
      This seems reasonable, but it would be legal for quirky hardware to
      leave a few elements empty and then present packets in a subsequent
      element. These packets would currently be dropped.
      
      So make sure to check all buffer elements, until we hit the LAST_ENTRY
      indication.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      864c17c3
    • Julian Wiedmann's avatar
      s390/qeth: consolidate skb allocation · 8d68af6a
      Julian Wiedmann authored
      Move the allocation of SG skbs into the main path. This allows for
      a little code sharing, and handling ENOMEM from within one place.
      
      As side effect, L2 SG skbs now get the proper amount of additional
      headroom (read: zero) instead of the hard-coded ETH_HLEN.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d68af6a
    • Julian Wiedmann's avatar
      s390/qeth: clean up page frag creation · b6f72f96
      Julian Wiedmann authored
      Replace the open-coded skb_add_rx_frag(), and use a fall-through
      to remove some duplicated code.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6f72f96
    • Julian Wiedmann's avatar
      s390/qeth: no VLAN support on OSM · 9400c53f
      Julian Wiedmann authored
      Instead of silently discarding VLAN registration requests on OSM,
      just indicate that this card type doesn't support VLAN.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9400c53f
    • Julian Wiedmann's avatar
      s390/qeth: don't verify device when setting MAC address · 857d8ee2
      Julian Wiedmann authored
      There's no reason why l2_set_mac_address() should ever be called for
      a netdevice that's not owned by qeth. It's certainly not required for
      VLAN devices, which have their own netdev_ops.
      
      Also:
      1) we don't do such validation for any of the other netdev_ops routines.
      2) the code in question clearly has never been actually exercised;
         it's broken. After determining that the device is not owned
         by qeth, it would still use dev->ml_priv to write a qeth trace entry.
      
      Remove the check, and its helper that walked the global card list.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      857d8ee2