1. 21 Jul, 2018 12 commits
    • David S. Miller's avatar
      Merge branch 'Make-sys-class-net-per-net-namespace-objects-belong-to-container' · c59e18b8
      David S. Miller authored
      Tyler Hicks says:
      
      ====================
      Make /sys/class/net per net namespace objects belong to container
      
      This is a revival of an older patch set from Dmitry Torokhov:
      
       https://lore.kernel.org/lkml/1471386795-32918-1-git-send-email-dmitry.torokhov@gmail.com/
      
      My submission of v2 is here:
      
       https://lore.kernel.org/lkml/1531497949-1766-1-git-send-email-tyhicks@canonical.com/
      
      Here's Dmitry's description:
      
       There are objects in /sys hierarchy (/sys/class/net/) that logically
       belong to a namespace/container. Unfortunately all sysfs objects start
       their life belonging to global root, and while we could change
       ownership manually, keeping tracks of all objects that come and go is
       cumbersome. It would be better if kernel created them using correct
       uid/gid from the beginning.
      
       This series changes kernfs to allow creating object's with arbitrary
       uid/gid, adds get_ownership() callback to ktype structure so subsystems
       could supply their own logic (likely tied to namespace support) for
       determining ownership of kobjects, and adjusts sysfs code to make use
       of this information. Lastly net-sysfs is adjusted to make sure that
       objects in net namespace are owned by the root user from the owning
       user namespace.
      
       Note that we do not adjust ownership of objects moved into a new
       namespace (as when moving a network device into a container) as
       userspace can easily do it.
      
      I'm reviving this patch set because we would like this feature for
      system containers. One specific use case that we have is that libvirt is
      unable to configure its bridge device inside of a system container due
      to the bridge files in /sys/class/net/ being owned by init root instead
      of container root. The last two patches in this set are patches that
      I've added to Dmitry's original set to allow such configuration of the
      bridge device.
      
      Eric had previously provided feedback that he didn't favor these changes
      affecting all layers of the stack and that most of the changes could
      remain local to drivers/base/core.c. That feedback is certainly sensible
      but I wanted to send out v2 of the patch set without making that large
      of a change since quite a bit of time has passed and the bridge changes
      in the last patch of this set shows that not all of the changes will be
      local to drivers/base/core.c. I'm happy to make the changes if the
      original request still stands.
      
      * Changes since v2:
        - Added my Co-Developed-by and Signed-off-by tags to all of Dmitry's
          patches that I've modified
        - Patch 1 received build failure fixes in
          arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
        - Patch 2 was updated to drop the declaration of sysfs_add_file() from
          sysfs.h since the patch removed all other uses of the function
        - Patch 5 is a new patch that prevents tx_maxrate from being written
          to from inside of a container
          + Maybe I'm being too cautious here but the restriction can always
            be loosened up later
        - Patches 6 and 7 were updated to make net_ns_get_ownership() always
          initialize uid and gid, even when the network namespace is NULL, so
          that it isn't a dangerous function to reuse
          + Requested by Christian Brauner
        - I've looked at all sysfs attributes affected by this patch set and
          feel comfortable about the changes. There are quite a few affected
          attributes that don't have any capable()/ns_capable() checks in
          their store operations (per_bond_attrs, at91_sysfs_attrs,
          sysfs_grcan_attrs, ican3_sysfs_attrs, cdc_ncm_sysfs_attrs,
          qmi_wwan_sysfs_attrs) but I think this is acceptable. It means that
          container root, rather than specifically CAP_NET_ADMIN inside of the
          network namespace that the device belongs to, can write to those
          device attributes. It's the same situation that those devices have
          today in that init root is able to write to the attributes without
          necessarily having CAP_NET_ADMIN. I think that this should probably
          be fixed in order to be consistent with what netdev_store() does by
          verifying CAP_NET_ADMIN in the network namespace but that it doesn't
          need to happen in this patch set.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c59e18b8
    • Tyler Hicks's avatar
      bridge: make sure objects belong to container's owner · 705e0dea
      Tyler Hicks authored
      When creating various bridge objects in /sys/class/net/... make sure
      that they belong to the container's owner instead of global root (if
      they belong to a container/namespace).
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      705e0dea
    • Tyler Hicks's avatar
      net: create reusable function for getting ownership info of sysfs inodes · fbdeaed4
      Tyler Hicks authored
      Make net_ns_get_ownership() reusable by networking code outside of core.
      This is useful, for example, to allow bridge related sysfs files to be
      owned by container root.
      
      Add a function comment since this is a potentially dangerous function to
      use given the way that kobject_get_ownership() works by initializing uid
      and gid before calling .get_ownership().
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbdeaed4
    • Dmitry Torokhov's avatar
      net-sysfs: make sure objects belong to container's owner · b0e37c0d
      Dmitry Torokhov authored
      When creating various objects in /sys/class/net/... make sure that they
      belong to container's owner instead of global root (if they belong to a
      container/namespace).
      Co-Developed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0e37c0d
    • Tyler Hicks's avatar
      net-sysfs: require net admin in the init ns for setting tx_maxrate · 3033fced
      Tyler Hicks authored
      An upcoming change will allow container root to open some /sys/class/net
      files for writing. The tx_maxrate attribute can result in changes
      to actual hardware devices so err on the side of caution by requiring
      CAP_NET_ADMIN in the init namespace in the corresponding attribute store
      operation.
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3033fced
    • Dmitry Torokhov's avatar
      driver core: set up ownership of class devices in sysfs · 9944e894
      Dmitry Torokhov authored
      Plumb in get_ownership() callback for devices belonging to a class so that
      they can be created with uid/gid different from global root. This will
      allow network devices in a container to belong to container's root and not
      global root.
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Reviewed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9944e894
    • Dmitry Torokhov's avatar
      kobject: kset_create_and_add() - fetch ownership info from parent · d028b6f7
      Dmitry Torokhov authored
      This change implements get_ownership() for ksets created with
      kset_create_and_add() call by fetching ownership data from parent kobject.
      This is done mostly for benefit of "queues" attribute of net devices so
      that corresponding directory belongs to container's root instead of global
      root for network devices in a container.
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Reviewed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d028b6f7
    • Dmitry Torokhov's avatar
      sysfs, kobject: allow creating kobject belonging to arbitrary users · 5f81880d
      Dmitry Torokhov authored
      Normally kobjects and their sysfs representation belong to global root,
      however it is not necessarily the case for objects in separate namespaces.
      For example, objects in separate network namespace logically belong to the
      container's root and not global root.
      
      This change lays groundwork for allowing network namespace objects
      ownership to be transferred to container's root user by defining
      get_ownership() callback in ktype structure and using it in sysfs code to
      retrieve desired uid/gid when creating sysfs objects for given kobject.
      Co-Developed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f81880d
    • Dmitry Torokhov's avatar
      kernfs: allow creating kernfs objects with arbitrary uid/gid · 488dee96
      Dmitry Torokhov authored
      This change allows creating kernfs files and directories with arbitrary
      uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
      kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
      The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
      and always create objects belonging to the global root.
      
      When creating symlinks ownership (uid/gid) is taken from the target kernfs
      object.
      Co-Developed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      488dee96
    • David S. Miller's avatar
      net: Init backlog NAPI's gro_hash. · 7c4ec749
      David S. Miller authored
      Based upon a patch by Sean Tranchetti.
      
      Fixes: d4546c25 ("net: Convert GRO SKB handling to list_head.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c4ec749
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 99d20a46
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for your net-next
      tree:
      
      1) No need to set ttl from reject action for the bridge family, from
         Taehee Yoo.
      
      2) Use a fixed timeout for flow that are passed up from the flowtable
         to conntrack, from Florian Westphal.
      
      3) More preparation patches for tproxy support for nf_tables, from Mate
         Eckl.
      
      4) Remove unnecessary indirection in core IPv6 checksum function, from
         Florian Westphal.
      
      5) Use nf_ct_get_tuplepr() from openvswitch, instead of opencoding it.
         From Florian Westphal.
      
      6) socket match now selects socket infrastructure, instead of depending
         on it. From Mate Eckl.
      
      7) Patch series to simplify conntrack tuple building/parsing from packet
         path and ctnetlink, from Florian Westphal.
      
      8) Fetch timeout policy from protocol helpers, instead of doing it from
         core, from Florian Westphal.
      
      9) Merge IPv4 and IPv6 protocol trackers into conntrack core, from
         Florian Westphal.
      
      10) Depend on CONFIG_NF_TABLES_IPV6 and CONFIG_IP6_NF_IPTABLES
          respectively, instead of IPV6. Patch from Mate Eckl.
      
      11) Add specific function for garbage collection in conncount,
          from Yi-Hung Wei.
      
      12) Catch number of elements in the connlimit list, from Yi-Hung Wei.
      
      13) Move locking to nf_conncount, from Yi-Hung Wei.
      
      14) Series of patches to add lockless tree traversal in nf_conncount,
          from Yi-Hung Wei.
      
      15) Resolve clash in matching conntracks when race happens, from
          Martynas Pumputis.
      
      16) If connection entry times out, remove template entry from the
          ip_vs_conn_tab table to improve behaviour under flood, from
          Julian Anastasov.
      
      17) Remove useless parameter from nf_ct_helper_ext_add(), from Gao feng.
      
      18) Call abort from 2-phase commit protocol before requesting modules,
          make sure this is done under the mutex, from Florian Westphal.
      
      19) Grab module reference when starting transaction, also from Florian.
      
      20) Dynamically allocate expression info array for pre-parsing, from
          Florian.
      
      21) Add per netns mutex for nf_tables, from Florian Westphal.
      
      22) A couple of patches to simplify and refactor nf_osf code to prepare
          for nft_osf support.
      
      23) Break evaluation on missing socket, from Mate Eckl.
      
      24) Allow to match socket mark from nft_socket, from Mate Eckl.
      
      25) Remove dependency on nf_defrag_ipv6, now that IPv6 tracker is
          built-in into nf_conntrack. From Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99d20a46
    • David S. Miller's avatar
      Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux · c4c5551d
      David S. Miller authored
      All conflicts were trivial overlapping changes, so reasonably
      easy to resolve.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c5551d
  2. 20 Jul, 2018 26 commits
  3. 19 Jul, 2018 2 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · fb7d1bcf
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - Fix crashes that happen when PHY drivers are left disabled in the V3
         Semiconductor, MediaTek, Faraday, Aardvark, DesignWare, Versatile,
         and X-Gene host controller drivers (Sergei Shtylyov)
      
       - Fix a NULL pointer dereference in the endpoint library configfs
         support (Kishon Vijay Abraham I)
      
       - Fix a race condition in Hyper-V IRQ handling (Dexuan Cui)
      
      * tag 'pci-v4.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: v3-semi: Fix I/O space page leak
        PCI: mediatek: Fix I/O space page leak
        PCI: faraday: Fix I/O space page leak
        PCI: aardvark: Fix I/O space page leak
        PCI: designware: Fix I/O space page leak
        PCI: versatile: Fix I/O space page leak
        PCI: xgene: Fix I/O space page leak
        PCI: OF: Fix I/O space page leak
        PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled
        PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
      fb7d1bcf
    • Vineet Gupta's avatar
      ARCv2: [plat-hsdk]: Save accl reg pair by default · af1fc5ba
      Vineet Gupta authored
      This manifsted as strace segfaulting on HSDK because gcc was targetting
      the accumulator registers as GPRs, which kernek was not saving/restoring
      by default.
      
      Cc: stable@vger.kernel.org   #4.14+
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      af1fc5ba