1. 31 Oct, 2017 4 commits
    • Eric W. Biederman's avatar
      userns: Simplify the user and group mapping functions · 3edf652f
      Eric W. Biederman authored
      Consolidate reading the number of extents and computing the return
      value in the map_id_down, map_id_range_down and map_id_range.
      
      This removal of one read of extents makes one smp_rmb unnecessary
      and makes the code safe it is executed during the map write.  Reading
      the number of extents twice and depending on the result being the same
      is not safe, as it could be 0 the first time and > 5 the second time,
      which would lead to misinterpreting the union fields.
      
      The consolidation of the return value just removes a duplicate
      caluculation which should make it easier to understand and maintain
      the code.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      3edf652f
    • Eric W. Biederman's avatar
      userns: Don't special case a count of 0 · 11a8b927
      Eric W. Biederman authored
      We can always use a count of 1 so there is no reason to have
      a special case of a count of 0.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      11a8b927
    • Christian Brauner's avatar
      userns: bump idmap limits to 340 · 6397fac4
      Christian Brauner authored
      There are quite some use cases where users run into the current limit for
      {g,u}id mappings. Consider a user requesting us to map everything but 999, and
      1001 for a given range of 1000000000 with a sub{g,u}id layout of:
      
      some-user:100000:1000000000
      some-user:999:1
      some-user:1000:1
      some-user:1001:1
      some-user:1002:1
      
      This translates to:
      
      MAPPING-TYPE | CONTAINER |    HOST |     RANGE |
      -------------|-----------|---------|-----------|
               uid |       999 |     999 |         1 |
               uid |      1001 |    1001 |         1 |
               uid |         0 | 1000000 |       999 |
               uid |      1000 | 1001000 |         1 |
               uid |      1002 | 1001002 | 999998998 |
      ------------------------------------------------
               gid |       999 |     999 |         1 |
               gid |      1001 |    1001 |         1 |
               gid |         0 | 1000000 |       999 |
               gid |      1000 | 1001000 |         1 |
               gid |      1002 | 1001002 | 999998998 |
      
      which is already the current limit.
      
      As discussed at LPC simply bumping the number of limits is not going to work
      since this would mean that struct uid_gid_map won't fit into a single cache-line
      anymore thereby regressing performance for the base-cases. The same problem
      seems to arise when using a single pointer. So the idea is to use
      
      struct uid_gid_extent {
      	u32 first;
      	u32 lower_first;
      	u32 count;
      };
      
      struct uid_gid_map { /* 64 bytes -- 1 cache line */
      	u32 nr_extents;
      	union {
      		struct uid_gid_extent extent[UID_GID_MAP_MAX_BASE_EXTENTS];
      		struct {
      			struct uid_gid_extent *forward;
      			struct uid_gid_extent *reverse;
      		};
      	};
      };
      
      For the base cases we will only use the struct uid_gid_extent extent member. If
      we go over UID_GID_MAP_MAX_BASE_EXTENTS mappings we perform a single 4k
      kmalloc() which means we can have a maximum of 340 mappings
      (340 * size(struct uid_gid_extent) = 4080). For the latter case we use two
      pointers "forward" and "reverse". The forward pointer points to an array sorted
      by "first" and the reverse pointer points to an array sorted by "lower_first".
      We can then perform binary search on those arrays.
      
      Performance Testing:
      When Eric introduced the extent-based struct uid_gid_map approach he measured
      the performanc impact of his idmap changes:
      
      > My benchmark consisted of going to single user mode where nothing else was
      > running. On an ext4 filesystem opening 1,000,000 files and looping through all
      > of the files 1000 times and calling fstat on the individuals files. This was
      > to ensure I was benchmarking stat times where the inodes were in the kernels
      > cache, but the inode values were not in the processors cache. My results:
      
      > v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
      > v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
      > v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
      
      I used an identical approach on my laptop. Here's a thorough description of what
      I did. I built a 4.14.0-rc4 mainline kernel with my new idmap patches applied. I
      booted into single user mode and used an ext4 filesystem to open/create
      1,000,000 files. Then I looped through all of the files calling fstat() on each
      of them 1000 times and calculated the mean fstat() time for a single file. (The
      test program can be found below.)
      
      Here are the results. For fun, I compared the first version of my patch which
      scaled linearly with the new version of the patch:
      
      |   # MAPPINGS |   PATCH-V1 | PATCH-NEW |
      |--------------|------------|-----------|
      |   0 mappings |     158 ns |   158 ns  |
      |   1 mappings |     164 ns |   157 ns  |
      |   2 mappings |     170 ns |   158 ns  |
      |   3 mappings |     175 ns |   161 ns  |
      |   5 mappings |     187 ns |   165 ns  |
      |  10 mappings |     218 ns |   199 ns  |
      |  50 mappings |     528 ns |   218 ns  |
      | 100 mappings |     980 ns |   229 ns  |
      | 200 mappings |    1880 ns |   239 ns  |
      | 300 mappings |    2760 ns |   240 ns  |
      | 340 mappings | not tested |   248 ns  |
      
      Here's the test program I used. I asked Eric what he did and this is a more
      "advanced" implementation of the idea. It's pretty straight-forward:
      
       #define __GNU_SOURCE
       #define __STDC_FORMAT_MACROS
       #include <errno.h>
       #include <dirent.h>
       #include <fcntl.h>
       #include <inttypes.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <unistd.h>
       #include <sys/stat.h>
       #include <sys/time.h>
       #include <sys/types.h>
      
       int main(int argc, char *argv[])
       {
       	int ret;
       	size_t i, k;
       	int fd[1000000];
       	int times[1000];
       	char pathname[4096];
       	struct stat st;
       	struct timeval t1, t2;
       	uint64_t time_in_mcs;
       	uint64_t sum = 0;
      
       	if (argc != 2) {
       		fprintf(stderr, "Please specify a directory where to create "
       				"the test files\n");
       		exit(EXIT_FAILURE);
       	}
      
       	for (i = 0; i < sizeof(fd) / sizeof(fd[0]); i++) {
       		sprintf(pathname, "%s/idmap_test_%zu", argv[1], i);
       		fd[i]= open(pathname, O_RDWR | O_CREAT, S_IXUSR | S_IXGRP | S_IXOTH);
       		if (fd[i] < 0) {
       			ssize_t j;
       			for (j = i; j >= 0; j--)
       				close(fd[j]);
       			exit(EXIT_FAILURE);
       		}
       	}
      
       	for (k = 0; k < 1000; k++) {
       		ret = gettimeofday(&t1, NULL);
       		if (ret < 0)
       			goto close_all;
      
       		for (i = 0; i < sizeof(fd) / sizeof(fd[0]); i++) {
       			ret = fstat(fd[i], &st);
       			if (ret < 0)
       				goto close_all;
       		}
      
       		ret = gettimeofday(&t2, NULL);
       		if (ret < 0)
       			goto close_all;
      
       		time_in_mcs = (1000000 * t2.tv_sec + t2.tv_usec) -
       			      (1000000 * t1.tv_sec + t1.tv_usec);
       		printf("Total time in micro seconds:       %" PRIu64 "\n",
       		       time_in_mcs);
       		printf("Total time in nanoseconds:         %" PRIu64 "\n",
       		       time_in_mcs * 1000);
       		printf("Time per file in nanoseconds:      %" PRIu64 "\n",
       		       (time_in_mcs * 1000) / 1000000);
       		times[k] = (time_in_mcs * 1000) / 1000000;
       	}
      
       close_all:
       	for (i = 0; i < sizeof(fd) / sizeof(fd[0]); i++)
       		close(fd[i]);
      
       	if (ret < 0)
       		exit(EXIT_FAILURE);
      
       	for (k = 0; k < 1000; k++) {
       		sum += times[k];
       	}
      
       	printf("Mean time per file in nanoseconds: %" PRIu64 "\n", sum / 1000);
      
       	exit(EXIT_SUCCESS);;
       }
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      CC: Serge Hallyn <serge@hallyn.com>
      CC: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      6397fac4
    • Christian Brauner's avatar
      userns: use union in {g,u}idmap struct · aa4bf44d
      Christian Brauner authored
      - Add a struct containing two pointer to extents and wrap both the static extent
        array and the struct into a union. This is done in preparation for bumping the
        {g,u}idmap limits for user namespaces.
      - Add brackets around anonymous union when using designated initializers to
        initialize members in order to please gcc <= 4.4.
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      aa4bf44d
  2. 24 Sep, 2017 14 commits
  3. 23 Sep, 2017 19 commits
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · cd4175b1
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
      
       - Unbreak parisc bootloader by avoiding a gcc-7 optimization to convert
         multiple byte-accesses into one word-access.
      
       - Add missing HWPOISON page fault handler code. I completely missed
         that when I added HWPOISON support during this merge window and it
         only showed up now with the madvise07 LTP test case.
      
       - Fix backtrace unwinding to stop when stack start has been reached.
      
       - Issue warning if initrd has been loaded into memory regions with
         broken RAM modules.
      
       - Fix HPMC handler (parisc hardware fault handler) to comply with
         architecture specification.
      
       - Avoid compiler warnings about too large frame sizes.
      
       - Minor init-section fixes.
      
      * 'parisc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Unbreak bootloader due to gcc-7 optimizations
        parisc: Reintroduce option to gzip-compress the kernel
        parisc: Add HWPOISON page fault handler code
        parisc: Move init_per_cpu() into init section
        parisc: Check if initrd was loaded into broken RAM
        parisc: Add PDCE_CHECK instruction to HPMC handler
        parisc: Add wrapper for pdc_instr() firmware function
        parisc: Move start_parisc() into init section
        parisc: Stop unwinding at start of stack
        parisc: Fix too large frame size warnings
      cd4175b1
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · ded85032
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
      
       - Smattering of miscellanous fixes
      
       - A five patch series for i40iw that had a patch (5/5) that was larger
         than I would like, but I took it because it's needed for large scale
         users
      
       - An 8 patch series for bnxt_re that landed right as I was leaving on
         PTO and so had to wait until now...they are all appropriate fixes for
         -rc IMO
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (22 commits)
        bnxt_re: Don't issue cmd to delete GID for QP1 GID entry before the QP is destroyed
        bnxt_re: Fix memory leak in FRMR path
        bnxt_re: Remove RTNL lock dependency in bnxt_re_query_port
        bnxt_re: Fix race between the netdev register and unregister events
        bnxt_re: Free up devices in module_exit path
        bnxt_re: Fix compare and swap atomic operands
        bnxt_re: Stop issuing further cmds to FW once a cmd times out
        bnxt_re: Fix update of qplib_qp.mtu when modified
        i40iw: Add support for port reuse on active side connections
        i40iw: Add missing VLAN priority
        i40iw: Call i40iw_cm_disconn on modify QP to disconnect
        i40iw: Prevent multiple netdev event notifier registrations
        i40iw: Fail open if there are no available MSI-X vectors
        RDMA/vmw_pvrdma: Fix reporting correct opcodes for completion
        IB/bnxt_re: Fix frame stack compilation warning
        IB/mlx5: fix debugfs cleanup
        IB/ocrdma: fix incorrect fall-through on switch statement
        IB/ipoib: Suppress the retry related completion errors
        iw_cxgb4: remove the stid on listen create failure
        iw_cxgb4: drop listen destroy replies if no ep found
        ...
      ded85032
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 71aa60f6
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix NAPI poll list corruption in enic driver, from Christian
          Lamparter.
      
       2) Fix route use after free, from Eric Dumazet.
      
       3) Fix regression in reuseaddr handling, from Josef Bacik.
      
       4) Assert the size of control messages in compat handling since we copy
          it in from userspace twice. From Meng Xu.
      
       5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.)
          from Ursula Braun.
      
       6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn.
      
       7) Don't use ARRAY_SIZE on spinlock array which might have zero
          entries, from Geert Uytterhoeven.
      
       8) Fix miscomputation of checksum in ipv6 udp code, from Subash Abhinov
          Kasiviswanathan.
      
       9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from Xin
          Long.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
        inet: fix improper empty comparison
        net: use inet6_rcv_saddr to compare sockets
        net: set tb->fast_sk_family
        net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
        MAINTAINERS: update git tree locations for ieee802154 subsystem
        net: prevent dst uses after free
        net: phy: Fix truncation of large IRQ numbers in phy_attached_print()
        net/smc: no close wait in case of process shut down
        net/smc: introduce a delay
        net/smc: terminate link group if out-of-sync is received
        net/smc: longer delay for client link group removal
        net/smc: adapt send request completion notification
        net/smc: adjust net_device refcount
        net/smc: take RCU read lock for routing cache lookup
        net/smc: add receive timeout check
        net/smc: add missing dev_put
        net: stmmac: Cocci spatch "of_table"
        lan78xx: Use default values loaded from EEPROM/OTP after reset
        lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
        lan78xx: Fix for eeprom read/write when device auto suspend
        ...
      71aa60f6
    • Linus Torvalds's avatar
      Merge tag 'apparmor-pr-2017-09-22' of... · 79444df4
      Linus Torvalds authored
      Merge tag 'apparmor-pr-2017-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor updates from John Johansen:
       "This is the apparmor pull request, similar to SELinux and seccomp.
      
        It's the same series that I was sent to James' security tree + one
        regression fix that was found after the series was sent to James and
        would have been sent for v4.14-rc2.
      
        Features:
        - in preparation for secid mapping add support for absolute root view
          based labels
        - add base infastructure for socket mediation
        - add mount mediation
        - add signal mediation
      
        minor cleanups and changes:
        - be defensive, ensure unconfined profiles have dfas initialized
        - add more debug asserts to apparmorfs
        - enable policy unpacking to audit different reasons for failure
        - cleanup conditional check for label in label_print
        - Redundant condition: prev_ns. in [label.c:1498]
      
        Bug Fixes:
        - fix regression in apparmorfs DAC access permissions
        - fix build failure on sparc caused by undeclared signals
        - fix sparse report of incorrect type assignment when freeing label proxies
        - fix race condition in null profile creation
        - Fix an error code in aafs_create()
        - Fix logical error in verify_header()
        - Fix shadowed local variable in unpack_trans_table()"
      
      * tag 'apparmor-pr-2017-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: fix apparmorfs DAC access permissions
        apparmor: fix build failure on sparc caused by undeclared signals
        apparmor: fix incorrect type assignment when freeing proxies
        apparmor: ensure unconfined profiles have dfas initialized
        apparmor: fix race condition in null profile creation
        apparmor: move new_null_profile to after profile lookup fns()
        apparmor: add base infastructure for socket mediation
        apparmor: add more debug asserts to apparmorfs
        apparmor: make policy_unpack able to audit different info messages
        apparmor: add support for absolute root view based labels
        apparmor: cleanup conditional check for label in label_print
        apparmor: add mount mediation
        apparmor: add the ability to mediate signals
        apparmor: Redundant condition: prev_ns. in [label.c:1498]
        apparmor: Fix an error code in aafs_create()
        apparmor: Fix logical error in verify_header()
        apparmor: Fix shadowed local variable in unpack_trans_table()
      79444df4
    • Josh Poimboeuf's avatar
      x86/asm: Fix inline asm call constraints for Clang · f5caf621
      Josh Poimboeuf authored
      For inline asm statements which have a CALL instruction, we list the
      stack pointer as a constraint to convince GCC to ensure the frame
      pointer is set up first:
      
        static inline void foo()
        {
      	register void *__sp asm(_ASM_SP);
      	asm("call bar" : "+r" (__sp))
        }
      
      Unfortunately, that pattern causes Clang to corrupt the stack pointer.
      
      The fix is easy: convert the stack pointer register variable to a global
      variable.
      
      It should be noted that the end result is different based on the GCC
      version.  With GCC 6.4, this patch has exactly the same result as
      before:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9820389		9491555		8816046		8516940
       after	9820389		9491555		8816046		8516940
      
      With GCC 7.2, however, GCC's behavior has changed.  It now changes its
      behavior based on the conversion of the register variable to a global.
      That somehow convinces it to *always* set up the frame pointer before
      inserting *any* inline asm.  (Therefore, listing the variable as an
      output constraint is a no-op and is no longer necessary.)  It's a bit
      overkill, but the performance impact should be negligible.  And in fact,
      there's a nice improvement with frame pointers disabled:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9796316		9468236		9076191		8790305
       after	9796957		9464267		9076381		8785949
      
      So in summary, while listing the stack pointer as an output constraint
      is no longer necessary for newer versions of GCC, it's still needed for
      older versions.
      Suggested-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f5caf621
    • Josh Poimboeuf's avatar
      objtool: Handle another GCC stack pointer adjustment bug · 0d0970ee
      Josh Poimboeuf authored
      The kbuild bot reported the following warning with GCC 4.4 and a
      randconfig:
      
        net/socket.o: warning: objtool: compat_sock_ioctl()+0x1083: stack state mismatch: cfa1=7+160 cfa2=-1+0
      
      This is caused by another GCC non-optimization, where it backs up and
      restores the stack pointer for no apparent reason:
      
          2f91:       48 89 e0                mov    %rsp,%rax
          2f94:       4c 89 e7                mov    %r12,%rdi
          2f97:       4c 89 f6                mov    %r14,%rsi
          2f9a:       ba 20 00 00 00          mov    $0x20,%edx
          2f9f:       48 89 c4                mov    %rax,%rsp
      
      This issue would have been happily ignored before the following commit:
      
        dd88a0a0 ("objtool: Handle GCC stack pointer adjustment bug")
      
      But now that objtool is paying attention to such stack pointer writes
      to/from a register, it needs to understand them properly.  In this case
      that means recognizing that the "mov %rsp, %rax" instruction is
      potentially a backup of the stack pointer.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: dd88a0a0 ("objtool: Handle GCC stack pointer adjustment bug")
      Link: http://lkml.kernel.org/r/8c7aa8e9a36fbbb6655d9d8e7cea58958c912da8.1505942196.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0d0970ee
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c65da8e2
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix the initialization of resources in the ACPI WDAT watchdog
        driver, a recent regression in the ACPI device properties handling, a
        recent change in behavior causing the ACPI_HANDLE() macro to only work
        for GPL code and create a MAINTAINERS entry for ACPI PMIC drivers in
        order to specify the official reviewers for that code.
      
        Specifics:
      
         - Fix the initialization of resources in the ACPI WDAT watchdog
           driver that uses unititialized memory which causes compiler
           warnings to be triggered (Arnd Bergmann).
      
         - Fix a recent regression in the ACPI device properties handling that
           causes some device properties data to be skipped during enumeration
           (Sakari Ailus).
      
         - Fix a recent change in behavior that caused the ACPI_HANDLE() macro
           to stop working for non-GPL code which is a problem for the NVidia
           binary graphics driver, for example (John Hubbard).
      
         - Add a MAINTAINERS entry for the ACPI PMIC drivers to specify the
           official reviewers for that code (Rafael Wysocki)"
      
      * tag 'acpi-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: properties: Return _DSD hierarchical extension (data) sub-nodes correctly
        ACPI / bus: Make ACPI_HANDLE() work for non-GPL code again
        ACPI / watchdog: properly initialize resources
        ACPI / PMIC: Add code reviewers to MAINTAINERS
      c65da8e2
    • David S. Miller's avatar
      Merge branch 'net-fix-reuseaddr-regression' · 4e683f49
      David S. Miller authored
      Josef Bacik says:
      
      ====================
      net: fix reuseaddr regression
      
      I introduced a regression when reworking the fastreuse port stuff that allows
      bind conflicts to occur once a reuseaddr successfully opens on an existing tb.
      The root cause is I reversed an if statement which caused us to set the tb as if
      there were no owners on the socket if there were, which obviously is not
      correct.
      
      Dave could you please queue these changes up for -stable, I've run them through
      the net tests and added another test to check for this problem specifically.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e683f49
    • Josef Bacik's avatar
      inet: fix improper empty comparison · fbed24bc
      Josef Bacik authored
      When doing my reuseport rework I screwed up and changed a
      
      if (hlist_empty(&tb->owners))
      
      to
      
      if (!hlist_empty(&tb->owners))
      
      This is obviously bad as all of the reuseport/reuse logic was reversed,
      which caused weird problems like allowing an ipv4 bind conflict if we
      opened an ipv4 only socket on a port followed by an ipv6 only socket on
      the same port.
      
      Fixes: b9470c27 ("inet: kill smallest_size and smallest_port")
      Reported-by: default avatarCole Robinson <crobinso@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbed24bc
    • Josef Bacik's avatar
      net: use inet6_rcv_saddr to compare sockets · 7a56673b
      Josef Bacik authored
      In ipv6_rcv_saddr_equal() we need to use inet6_rcv_saddr(sk) for the
      ipv6 compare with the fast socket information to make sure we're doing
      the proper comparisons.
      
      Fixes: 637bc8bb ("inet: reset tb->fastreuseport when adding a reuseport sk")
      Reported-and-tested-by: default avatarCole Robinson <crobinso@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a56673b
    • Josef Bacik's avatar
      net: set tb->fast_sk_family · cbb2fb5c
      Josef Bacik authored
      We need to set the tb->fast_sk_family properly so we can use the proper
      comparison function for all subsequent reuseport bind requests.
      
      Fixes: 637bc8bb ("inet: reset tb->fastreuseport when adding a reuseport sk")
      Reported-and-tested-by: default avatarCole Robinson <crobinso@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbb2fb5c
    • Willem de Bruijn's avatar
      net: orphan frags on stand-alone ptype in dev_queue_xmit_nit · 581fe0ea
      Willem de Bruijn authored
      Zerocopy skbs frags are copied when the skb is looped to a local sock.
      Commit 1080e512 ("net: orphan frags on receive") introduced calls
      to skb_orphan_frags to deliver_skb and __netif_receive_skb for this.
      
      With msg_zerocopy, these skbs can also exist in the tx path and thus
      loop from dev_queue_xmit_nit. This already calls deliver_skb in its
      loop. But it does not orphan before a separate pt_prev->func().
      
      Add the missing skb_orphan_frags_rx.
      
      Changes
        v1->v2: handle skb_orphan_frags_rx failure
      
      Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      581fe0ea
    • Linus Torvalds's avatar
      Merge tag 'pm-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 6876eb37
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix a cpufreq regression introduced by recent changes related to
        the generic DT driver, an initialization time memory leak in cpuidle
        on ARM, a PM core bug that may cause system suspend/resume to fail on
        some systems, a request type validation issue in the PM QoS framework
        and two documentation-related issues.
      
        Specifics:
      
         - Fix a regression in cpufreq on systems using DT as the source of
           CPU configuration information where two different code paths
           attempt to create the cpufreq-dt device object (there can be only
           one) and fix up the "compatible" matching for some TI platforms on
           top of that (Viresh Kumar, Dave Gerlach).
      
         - Fix an initialization time memory leak in cpuidle on ARM which
           occurs if the cpuidle driver initialization fails (Stefan Wahren).
      
         - Fix a PM core function that checks whether or not there are any
           system suspend/resume callbacks for a device, but forgets to check
           legacy callbacks which then may be skipped incorrectly and the
           system may crash and/or the device may become unusable after a
           suspend-resume cycle (Rafael Wysocki).
      
         - Fix request type validation for latency tolerance PM QoS requests
           which may lead to unexpected behavior (Jan Schönherr).
      
         - Fix a broken link to PM documentation from a header file and a typo
           in a PM document (Geert Uytterhoeven, Rafael Wysocki)"
      
      * tag 'pm-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: ti-cpufreq: Support additional am43xx platforms
        ARM: cpuidle: Avoid memleak if init fail
        cpufreq: dt-platdev: Add some missing platforms to the blacklist
        PM: core: Fix device_pm_check_callbacks()
        PM: docs: Drop an excess character from devices.rst
        PM / QoS: Use the correct variable to check the QoS request type
        driver core: Fix link to device power management documentation
      6876eb37
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · d32e5f44
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
      
       - fixes for two long standing issues (lock up and a crash) in force
         feedback handling in uinput driver
      
       - tweak to firmware update timing in Elan I2C touchpad driver.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: elan_i2c - extend Flash-Write delay
        Input: uinput - avoid crash when sending FF request to device going away
        Input: uinput - avoid FF flush when destroying device
      d32e5f44
    • Linus Torvalds's avatar
      Merge tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · c0a3a64e
      Linus Torvalds authored
      Pull seccomp updates from Kees Cook:
       "Major additions:
      
         - sysctl and seccomp operation to discover available actions
           (tyhicks)
      
         - new per-filter configurable logging infrastructure and sysctl
           (tyhicks)
      
         - SECCOMP_RET_LOG to log allowed syscalls (tyhicks)
      
         - SECCOMP_RET_KILL_PROCESS as the new strictest possible action
      
         - self-tests for new behaviors"
      
      [ This is the seccomp part of the security pull request during the merge
        window that was nixed due to unrelated problems   - Linus ]
      
      * tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        samples: Unrename SECCOMP_RET_KILL
        selftests/seccomp: Test thread vs process killing
        seccomp: Implement SECCOMP_RET_KILL_PROCESS action
        seccomp: Introduce SECCOMP_RET_KILL_PROCESS
        seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD
        seccomp: Action to log before allowing
        seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
        seccomp: Selftest for detection of filter flag support
        seccomp: Sysctl to configure actions that are allowed to be logged
        seccomp: Operation for checking if an action is available
        seccomp: Sysctl to display available actions
        seccomp: Provide matching filter for introspection
        selftests/seccomp: Refactor RET_ERRNO tests
        selftests/seccomp: Add simple seccomp overhead benchmark
        selftests/seccomp: Add tests for basic ptrace actions
      c0a3a64e
    • Linus Torvalds's avatar
      Merge tag '4.14-smb3-fixes-from-recent-test-events-for-stable' of... · 69c902f5
      Linus Torvalds authored
      Merge tag '4.14-smb3-fixes-from-recent-test-events-for-stable' of git://git.samba.org/sfrench/cifs-2.6
      
      Pull cifs fixes from Steve French:
       "Various SMB3 fixes for stable and security improvements from the
        recently completed SMB3/Samba test events
      
      * tag '4.14-smb3-fixes-from-recent-test-events-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
        SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags
        SMB3: handle new statx fields
        SMB: Validate negotiate (to protect against downgrade) even if signing off
        cifs: release auth_key.response for reconnect.
        cifs: release cifs root_cred after exit_cifs
        CIFS: make arrays static const, reduces object code size
        [SMB3] Update session and share information displayed for debugging SMB2/SMB3
        cifs: show 'soft' in the mount options for hard mounts
        SMB3: Warn user if trying to sign connection that authenticated as guest
        SMB3: Fix endian warning
        Fix SMB3.1.1 guest authentication to Samba
      69c902f5
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.14-rc2' of git://github.com/ceph/ceph-client · b03fcfae
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two small but important fixes: RADOS semantic change in upcoming v12.2.1
        release and a rare NULL dereference in create_session_open_msg()"
      
      * tag 'ceph-for-4.14-rc2' of git://github.com/ceph/ceph-client:
        ceph: avoid panic in create_session_open_msg() if utsname() returns NULL
        libceph: don't allow bidirectional swap of pg-upmap-items
      b03fcfae
    • Stefan Schmidt's avatar
      MAINTAINERS: update git tree locations for ieee802154 subsystem · b9b95da9
      Stefan Schmidt authored
      Patches for ieee802154 will go through my new trees towards netdev from
      now on. The 6LoWPAN subsystem will stay as is (shared between ieee802154
      and bluetooth) and go through the bluetooth tree as usual.
      Signed-off-by: default avatarStefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9b95da9
    • Steve French's avatar
      SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags · 1013e760
      Steve French authored
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      Reviewed-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      1013e760
  4. 22 Sep, 2017 3 commits