1. 19 Dec, 2018 3 commits
    • Christian Borntraeger's avatar
      genwqe: Fix size check · fdd66968
      Christian Borntraeger authored
      Calling the test program genwqe_cksum with the default buffer size of
      2MB triggers the following kernel warning on s390:
      
      WARNING: CPU: 30 PID: 9311 at mm/page_alloc.c:3189 __alloc_pages_nodemask+0x45c/0xbe0
      CPU: 30 PID: 9311 Comm: genwqe_cksum Kdump: loaded Not tainted 3.10.0-957.el7.s390x #1
      task: 00000005e5d13980 ti: 00000005e7c6c000 task.ti: 00000005e7c6c000
      Krnl PSW : 0704c00180000000 00000000002780ac (__alloc_pages_nodemask+0x45c/0xbe0)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
      Krnl GPRS: 00000000002932b8 0000000000b73d7c 0000000000000010 0000000000000009
                 0000000000000041 00000005e7c6f9b8 0000000000000001 00000000000080d0
                 0000000000000000 0000000000b70500 0000000000000001 0000000000000000
                 0000000000b70528 00000000007682c0 0000000000277df2 00000005e7c6f9a0
      Krnl Code: 000000000027809e: de7195001000	ed	1280(114,%r9),0(%r1)
      	   00000000002780a4: a774fead		brc	7,277dfe
      	  #00000000002780a8: a7f40001		brc	15,2780aa
      	  >00000000002780ac: 92011000		mvi	0(%r1),1
      	   00000000002780b0: a7f4fea7		brc	15,277dfe
      	   00000000002780b4: 9101c6b6		tm	1718(%r12),1
      	   00000000002780b8: a784ff3a		brc	8,277f2c
      	   00000000002780bc: a7f4fe2e		brc	15,277d18
      Call Trace:
      ([<0000000000277df2>] __alloc_pages_nodemask+0x1a2/0xbe0)
       [<000000000013afae>] s390_dma_alloc+0xfe/0x310
       [<000003ff8065f362>] __genwqe_alloc_consistent+0xfa/0x148 [genwqe_card]
       [<000003ff80658f7a>] genwqe_mmap+0xca/0x248 [genwqe_card]
       [<00000000002b2712>] mmap_region+0x4e2/0x778
       [<00000000002b2c54>] do_mmap+0x2ac/0x3e0
       [<0000000000292d7e>] vm_mmap_pgoff+0xd6/0x118
       [<00000000002b081c>] SyS_mmap_pgoff+0xdc/0x268
       [<00000000002b0a34>] SyS_old_mmap+0x8c/0xb0
       [<000000000074e518>] sysc_tracego+0x14/0x1e
       [<000003ffacf87dc6>] 0x3ffacf87dc6
      
      turns out the check in __genwqe_alloc_consistent uses "> MAX_ORDER"
      while the mm code uses ">= MAX_ORDER". Fix genwqe.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarFrank Haverkamp <haver@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdd66968
    • Christian Brauner's avatar
      binder: implement binderfs · 3ad20fe3
      Christian Brauner authored
      As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the
      implementation of binderfs.
      
      /* Abstract */
      binderfs is a backwards-compatible filesystem for Android's binder ipc
      mechanism. Each ipc namespace will mount a new binderfs instance. Mounting
      binderfs multiple times at different locations in the same ipc namespace
      will not cause a new super block to be allocated and hence it will be the
      same filesystem instance.
      Each new binderfs mount will have its own set of binder devices only
      visible in the ipc namespace it has been mounted in. All devices in a new
      binderfs mount will follow the scheme binder%d and numbering will always
      start at 0.
      
      /* Backwards compatibility */
      Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the
      initial ipc namespace will work as before. They will be registered via
      misc_register() and appear in the devtmpfs mount. Specifically, the
      standard devices binder, hwbinder, and vndbinder will all appear in their
      standard locations in /dev. Mounting or unmounting the binderfs mount in
      the initial ipc namespace will have no effect on these devices, i.e. they
      will neither show up in the binderfs mount nor will they disappear when the
      binderfs mount is gone.
      
      /* binder-control */
      Each new binderfs instance comes with a binder-control device. No other
      devices will be present at first. The binder-control device can be used to
      dynamically allocate binder devices. All requests operate on the binderfs
      mount the binder-control device resides in.
      Assuming a new instance of binderfs has been mounted at /dev/binderfs
      via mount -t binderfs binderfs /dev/binderfs. Then a request to create a
      new binder device can be made as illustrated in [2].
      Binderfs devices can simply be removed via unlink().
      
      /* Implementation details */
      - dynamic major number allocation:
        When binderfs is registered as a new filesystem it will dynamically
        allocate a new major number. The allocated major number will be returned
        in struct binderfs_device when a new binder device is allocated.
      - global minor number tracking:
        Minor are tracked in a global idr struct that is capped at
        BINDERFS_MAX_MINOR. The minor number tracker is protected by a global
        mutex. This is the only point of contention between binderfs mounts.
      - struct binderfs_info:
        Each binderfs super block has its own struct binderfs_info that tracks
        specific details about a binderfs instance:
        - ipc namespace
        - dentry of the binder-control device
        - root uid and root gid of the user namespace the binderfs instance
          was mounted in
      - mountable by user namespace root:
        binderfs can be mounted by user namespace root in a non-initial user
        namespace. The devices will be owned by user namespace root.
      - binderfs binder devices without misc infrastructure:
        New binder devices associated with a binderfs mount do not use the
        full misc_register() infrastructure.
        The misc_register() infrastructure can only create new devices in the
        host's devtmpfs mount. binderfs does however only make devices appear
        under its own mountpoint and thus allocates new character device nodes
        from the inode of the root dentry of the super block. This will have
        the side-effect that binderfs specific device nodes do not appear in
        sysfs. This behavior is similar to devpts allocated pts devices and
        has no effect on the functionality of the ipc mechanism itself.
      
      [1]: https://goo.gl/JL2tfX
      [2]: program to allocate a new binderfs binder device:
      
           #define _GNU_SOURCE
           #include <errno.h>
           #include <fcntl.h>
           #include <stdio.h>
           #include <stdlib.h>
           #include <string.h>
           #include <sys/ioctl.h>
           #include <sys/stat.h>
           #include <sys/types.h>
           #include <unistd.h>
           #include <linux/android/binder_ctl.h>
      
           int main(int argc, char *argv[])
           {
                   int fd, ret, saved_errno;
                   size_t len;
                   struct binderfs_device device = { 0 };
      
                   if (argc < 2)
                           exit(EXIT_FAILURE);
      
                   len = strlen(argv[1]);
                   if (len > BINDERFS_MAX_NAME)
                           exit(EXIT_FAILURE);
      
                   memcpy(device.name, argv[1], len);
      
                   fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC);
                   if (fd < 0) {
                           printf("%s - Failed to open binder-control device\n",
                                  strerror(errno));
                           exit(EXIT_FAILURE);
                   }
      
                   ret = ioctl(fd, BINDER_CTL_ADD, &device);
                   saved_errno = errno;
                   close(fd);
                   errno = saved_errno;
                   if (ret < 0) {
                           printf("%s - Failed to allocate new binder device\n",
                                  strerror(errno));
                           exit(EXIT_FAILURE);
                   }
      
                   printf("Allocated new binder device with major %d, minor %d, and "
                          "name %s\n", device.major, device.minor,
                          device.name);
      
                   exit(EXIT_SUCCESS);
           }
      
      Cc: Martijn Coenen <maco@android.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ad20fe3
    • Todd Kjos's avatar
      binder: fix use-after-free due to ksys_close() during fdget() · 80cd7956
      Todd Kjos authored
      44d8047f ("binder: use standard functions to allocate fds")
      exposed a pre-existing issue in the binder driver.
      
      fdget() is used in ksys_ioctl() as a performance optimization.
      One of the rules associated with fdget() is that ksys_close() must
      not be called between the fdget() and the fdput(). There is a case
      where this requirement is not met in the binder driver which results
      in the reference count dropping to 0 when the device is still in
      use. This can result in use-after-free or other issues.
      
      If userpace has passed a file-descriptor for the binder driver using
      a BINDER_TYPE_FDA object, then kys_close() is called on it when
      handling a binder_ioctl(BC_FREE_BUFFER) command. This violates
      the assumptions for using fdget().
      
      The problem is fixed by deferring the close using task_work_add(). A
      new variant of __close_fd() was created that returns a struct file
      with a reference. The fput() is deferred instead of using ksys_close().
      
      Fixes: 44d8047f ("binder: use standard functions to allocate fds")
      Suggested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80cd7956
  2. 11 Dec, 2018 1 commit
  3. 10 Dec, 2018 2 commits
  4. 09 Dec, 2018 20 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc6 · 40e020c1
      Linus Torvalds authored
      40e020c1
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · d48f782e
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "A decent batch of fixes here. I'd say about half are for problems that
        have existed for a while, and half are for new regressions added in
        the 4.20 merge window.
      
         1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach.
      
         2) Revert bogus emac driver change, from Benjamin Herrenschmidt.
      
         3) Handle BPF exported data structure with pointers when building
            32-bit userland, from Daniel Borkmann.
      
         4) Memory leak fix in act_police, from Davide Caratti.
      
         5) Check RX checksum offload in RX descriptors properly in aquantia
            driver, from Dmitry Bogdanov.
      
         6) SKB unlink fix in various spots, from Edward Cree.
      
         7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from
            Eric Dumazet.
      
         8) Fix FID leak in mlxsw driver, from Ido Schimmel.
      
         9) IOTLB locking fix in vhost, from Jean-Philippe Brucker.
      
        10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory
            limits otherwise namespace exit can hang. From Jiri Wiesner.
      
        11) Address block parsing length fixes in x25 from Martin Schiller.
      
        12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan.
      
        13) For tun interfaces, only iface delete works with rtnl ops, enforce
            this by disallowing add. From Nicolas Dichtel.
      
        14) Use after free in liquidio, from Pan Bian.
      
        15) Fix SKB use after passing to netif_receive_skb(), from Prashant
            Bhole.
      
        16) Static key accounting and other fixes in XPS from Sabrina Dubroca.
      
        17) Partially initialized flow key passed to ip6_route_output(), from
            Shmulik Ladkani.
      
        18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas
            Falcon.
      
        19) Several small TCP fixes (off-by-one on window probe abort, NULL
            deref in tail loss probe, SNMP mis-estimations) from Yuchung
            Cheng"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
        net/sched: cls_flower: Reject duplicated rules also under skip_sw
        bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips.
        bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips.
        bnxt_en: Keep track of reserved IRQs.
        bnxt_en: Fix CNP CoS queue regression.
        net/mlx4_core: Correctly set PFC param if global pause is turned off.
        Revert "net/ibm/emac: wrong bit is used for STA control"
        neighbour: Avoid writing before skb->head in neigh_hh_output()
        ipv6: Check available headroom in ip6_xmit() even without options
        tcp: lack of available data can also cause TSO defer
        ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
        mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl
        mlxsw: spectrum_router: Relax GRE decap matching check
        mlxsw: spectrum_switchdev: Avoid leaking FID's reference count
        mlxsw: spectrum_nve: Remove easily triggerable warnings
        ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
        sctp: frag_point sanity check
        tcp: fix NULL ref in tail loss probe
        tcp: Do not underestimate rwnd_limited
        net: use skb_list_del_init() to remove from RX sublists
        ...
      d48f782e
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8586ca8a
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Three fixes: a boot parameter re-(re-)fix, a retpoline build artifact
        fix and an LLVM workaround"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/vdso: Drop implicit common-page-size linker flag
        x86/build: Fix compiler support check for CONFIG_RETPOLINE
        x86/boot: Clear RSDP address in boot_params for broken loaders
      8586ca8a
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ebbd3000
      Linus Torvalds authored
      Pull kprobes fixes from Ingo Molnar:
       "Two kprobes fixes: a blacklist fix and an instruction patching related
        corruption fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kprobes/x86: Blacklist non-attachable interrupt functions
        kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
      ebbd3000
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b04e73a
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Two fixes: a large-system fix and an earlyprintk fix with certain
        resolutions"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/earlyprintk/efi: Fix infinite loop on some screen widths
        x86/efi: Allocate e820 buffer before calling efi_exit_boot_service
      4b04e73a
    • Or Gerlitz's avatar
      net/sched: cls_flower: Reject duplicated rules also under skip_sw · 35cc3cef
      Or Gerlitz authored
      Currently, duplicated rules are rejected only for skip_hw or "none",
      hence allowing users to push duplicates into HW for no reason.
      
      Use the flower tables to protect for that.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35cc3cef
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · d4b60e94
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      The first patch fixes a regression on CoS queue setup, introduced
      recently by the 57500 new chip support patches.  The rest are
      fixes related to ring and resource accounting on the new 57500 chips.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4b60e94
    • Michael Chan's avatar
      bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips. · e30fbc33
      Michael Chan authored
      The CP rings are accounted differently on the new 57500 chips.  There
      must be enough CP rings for the sum of RX and TX rings on the new
      chips.  The current logic may be over-estimating the RX and TX rings.
      
      The output parameter max_cp should be the maximum NQs capped by
      MSIX vectors available for networking in the context of 57500 chips.
      The existing code which uses CMPL rings capped by the MSIX vectors
      works most of the time but is not always correct.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e30fbc33
    • Michael Chan's avatar
      bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips. · c0b8cda0
      Michael Chan authored
      The new 57500 chips have introduced the NQ structure in addition to
      the existing CP rings in all chips.  We need to introduce a new
      bnxt_nq_rings_in_use().  On legacy chips, the 2 functions are the
      same and one will just call the other.  On the new chips, they
      refer to the 2 separate ring structures.  The new function is now
      called to determine the resource (NQ or CP rings) associated with
      MSIX that are in use.
      
      On 57500 chips, the RDMA driver does not use the CP rings so
      we don't need to do the subtraction adjustment.
      
      Fixes: 41e8d798 ("bnxt_en: Modify the ring reservation functions for 57500 series chips.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0b8cda0
    • Michael Chan's avatar
      bnxt_en: Keep track of reserved IRQs. · 75720e63
      Michael Chan authored
      The new 57500 chips use 1 NQ per MSIX vector, whereas legacy chips use
      1 CP ring per MSIX vector.  To better unify this, add a resv_irqs
      field to struct bnxt_hw_resc.  On legacy chips, we initialize resv_irqs
      with resv_cp_rings.  On new chips, we initialize it with the allocated
      MSIX resources.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75720e63
    • Michael Chan's avatar
      bnxt_en: Fix CNP CoS queue regression. · 804fba4e
      Michael Chan authored
      Recent changes to support the 57500 devices have created this
      regression.  The bnxt_hwrm_queue_qportcfg() call was moved to be
      called earlier before the RDMA support was determined, causing
      the CoS queues configuration to be set before knowing whether RDMA
      was supported or not.  Fix it by moving it to the right place right
      after RDMA support is determined.
      
      Fixes: 98f04cf0 ("bnxt_en: Check context memory requirements from firmware.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      804fba4e
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 0844895a
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small driver fixes for 4.20-rc6.
      
        There is a hyperv fix that for some reaon took forever to get into a
        shape that could be applied to the tree properly, but resolves a much
        reported issue. The others are some gnss patches, one a bugfix and the
        two others updates to the MAINTAINERS file to properly match the gnss
        files in the tree.
      
        All have been in linux-next for a while with no reported issues"
      
      * tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching
        MAINTAINERS: add gnss scm tree
        gnss: sirf: fix activation retry handling
        Drivers: hv: vmbus: Offload the handling of channels to two workqueues
      0844895a
    • Linus Torvalds's avatar
      Merge tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 47dcb080
      Linus Torvalds authored
      Pull staging fixes from Greg KH:
       "Here are two staging driver bugfixes for 4.20-rc6.
      
        One is a revert of a previously incorrect patch that was merged a
        while ago, and the other resolves a possible buffer overrun that was
        found by code inspection.
      
        Both of these have been in the linux-next tree with no reported
        issues"
      
      * tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        Revert commit ef9209b6 "staging: rtl8723bs: Fix indenting errors and an off-by-one mistake in core/rtw_mlme_ext.c"
        staging: rtl8712: Fix possible buffer overrun
      47dcb080
    • Linus Torvalds's avatar
      Merge tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 822b7683
      Linus Torvalds authored
      Pull tty driver fixes from Greg KH:
       "Here are three small tty driver fixes for 4.20-rc6
      
        Nothing major, just some bug fixes for reported issues. Full details
        are in the shortlog.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        kgdboc: fix KASAN global-out-of-bounds bug in param_set_kgdboc_var()
        tty: serial: 8250_mtk: always resume the device in probe.
        tty: do not set TTY_IO_ERROR flag if console port
      822b7683
    • Linus Torvalds's avatar
      Merge tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 50a5528a
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 4.20-rc6
      
        The "largest" here are some xhci fixes for reported issues. Also here
        is a USB core fix, some quirk additions, and a usb-serial fix which
        required the export of one of the tty layer's functions to prevent
        code duplication. The tty maintainer agreed with this change.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        xhci: Prevent U1/U2 link pm states if exit latency is too long
        xhci: workaround CSS timeout on AMD SNPS 3.0 xHC
        USB: check usb_get_extra_descriptor for proper size
        USB: serial: console: fix reported terminal settings
        usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device
        USB: Fix invalid-free bug in port_over_current_notify()
        usb: appledisplay: Add 27" Apple Cinema Display
      50a5528a
    • Linus Torvalds's avatar
      Merge tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · bc4caf18
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small fixes: a fix for smb3 direct i/o, a fix for CIFS DFS for
        stable and a minor cifs Kconfig fix"
      
      * tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Avoid returning EBUSY to upper layer VFS
        cifs: Fix separator when building path from dentry
        cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
      bc4caf18
    • Linus Torvalds's avatar
      Merge tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · fa82dcbf
      Linus Torvalds authored
      Pull dax fixes from Dan Williams:
       "The last of the known regression fixes and fallout from the Xarray
        conversion of the filesystem-dax implementation.
      
        On the path to debugging why the dax memory-failure injection test
        started failing after the Xarray conversion a couple more fixes for
        the dax_lock_mapping_entry(), now called dax_lock_page(), surfaced.
        Those plus the bug that started the hunt are now addressed. These
        patches have appeared in a -next release with no issues reported.
      
        Note the touches to mm/memory-failure.c are just the conversion to the
        new function signature for dax_lock_page().
      
        Summary:
      
         - Fix the Xarray conversion of fsdax to properly handle
           dax_lock_mapping_entry() in the presense of pmd entries
      
         - Fix inode destruction racing a new lock request"
      
      * tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        dax: Fix unlock mismatch with updated API
        dax: Don't access a freed inode
        dax: Check page->mapping isn't NULL
      fa82dcbf
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · bd799eb6
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "A regression fix for the Address Range Scrub implementation, yes
        another one, and support for platforms that misalign persistent memory
        relative to the Linux memory hotplug section constraint. Longer term,
        support for sub-section memory hotplug would alleviate alignment
        waste, but until then this hack allows a 'struct page' memmap to be
        established for these misaligned memory regions.
      
        These have all appeared in a -next release, and thanks to Patrick for
        reporting and testing the alignment padding fix.
      
        Summary:
      
         - Unless and until the core mm handles memory hotplug units smaller
           than a section (128M), persistent memory namespaces must be padded
           to section alignment.
      
           The libnvdimm core already handled section collision with "System
           RAM", but some configurations overlap independent "Persistent
           Memory" ranges within a section, so additional padding injection is
           added for that case.
      
         - The recent reworks of the ARS (address range scrub) state machine
           to reduce the number of state flags inadvertantly missed a
           conversion of acpi_nfit_ars_rescan() call sites. Fix the regression
           whereby user-requested ARS results in a "short" scrub rather than a
           "long" scrub.
      
         - Fixup the unit tests to handle / test the 128M section alignment of
           mocked test resources.
      
      * tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short"
        libnvdimm, pfn: Pad pfn namespaces relative to other regions
        tools/testing/nvdimm: Align test resources to 128M
      bd799eb6
    • Tarick Bedeir's avatar
      net/mlx4_core: Correctly set PFC param if global pause is turned off. · bd5122cd
      Tarick Bedeir authored
      rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1.
      
      Fixes: 6e8814ce ("net/mlx4_en: Fix mixed PFC and Global pause user control requests")
      Signed-off-by: default avatarTarick Bedeir <tarick@google.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd5122cd
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal · 6ec067e3
      Linus Torvalds authored
      Pull thermal SoC fixes from Eduardo Valentin:
       "Fixes for armada and broadcom thermal drivers"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
        thermal: broadcom: constify thermal_zone_of_device_ops structure
        thermal: armada: constify thermal_zone_of_device_ops structure
        thermal: bcm2835: Switch to SPDX identifier
        thermal: armada: fix legacy resource fixup
        thermal: armada: fix legacy validity test sense
      6ec067e3
  5. 08 Dec, 2018 9 commits
    • Linus Torvalds's avatar
      Merge tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic · 8214bdf7
      Linus Torvalds authored
      Pull asm-generic fix from Arnd Bergmann:
       "Multiple people reported a bug I introduced in asm-generic/unistd.h in
        4.20, this is the obvious bugfix to get glibc and others to correctly
        build again on new architectures that no longer provide the old
        fstatat64() family of system calls"
      
      * tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        asm-generic: unistd.h: fixup broken macro include.
      8214bdf7
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 570c9139
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "A few clk driver fixes this time:
      
         - Introduce protected-clock DT binding to fix breakage on qcom
           sdm845-mtp boards where the qspi clks introduced this merge window
           cause the firmware on those boards to take down the system if we
           try to read the clk registers
      
         - Fix a couple off-by-one errors found by Dan Carpenter
      
         - Handle failure in zynq fixed factor clk driver to avoid using
           uninitialized data"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: zynqmp: Off by one in zynqmp_is_valid_clock()
        clk: mmp: Off by one in mmp_clk_add()
        clk: mvebu: Off by one bugs in cp110_of_clk_get()
        arm64: dts: qcom: sdm845-mtp: Mark protected gcc clocks
        clk: qcom: Support 'protected-clocks' property
        dt-bindings: clk: Introduce 'protected-clocks' property
        clk: zynqmp: handle fixed factor param query error
      570c9139
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f896adc4
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Here are hopefully the last set of fixes for 4.20.
      
        There's a fix for a longstanding statfs reporting problem with project
        quotas, a correction for page cache invalidation behaviors when
        fallocating near EOF, and a fix for a broken metadata verifier return
        code.
      
        Finally, the most important fix is to the pipe splicing code (aka the
        generic copy_file_range fallback) to avoid pointless short directio
        reads by only asking the filesystem for as much data as there are
        available pages in the pipe buffer. Our previous fix (simulated short
        directio reads because the number of pages didn't match the length of
        the read requested) caused subtle problems on overlayfs, so that part
        is reverted.
      
        Anyhow, this series passes fstests -g all on xfs and overlay+xfs, and
        has passed 17 billion fsx operations problem-free since I started
        testing
      
        Summary:
      
         - Fix broken project quota inode counts
      
         - Fix incorrect PAGE_MASK/PAGE_SIZE usage
      
         - Fix incorrect return value in btree verifier
      
         - Fix WARN_ON remap flags false positive
      
         - Fix splice read overflows"
      
      * tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: partially revert 4721a601 (simulated directio short read on EFAULT)
        splice: don't read more than available pipe space
        vfs: allow some remap flags to be passed to vfs_clone_file_range
        xfs: fix inverted return from xfs_btree_sblock_verify_crc
        xfs: fix PAGE_MASK usage in xfs_free_file_space
        fs/xfs: fix f_ffree value for statfs when project quota is set
      f896adc4
    • David Rientjes's avatar
      Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask" · 356ff8a9
      David Rientjes authored
      This reverts commit 89c83fb5.
      
      This should have been done as part of 2f0799a0 ("mm, thp: restore
      node-local hugepage allocations").  The movement of the thp allocation
      policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
      intended to only set __GFP_THISNODE for mempolicies that are not
      MPOL_BIND whereas the revert could set this regardless of mempolicy.
      
      While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
      and alloc_pages_vma() was racy, that has since been removed since the
      revert.  What is left is the possibility to use __GFP_THISNODE in
      policy_node() when it is unexpected because the special handling for
      hugepages in alloc_pages_vma()  was removed as part of the consolidation.
      
      Secondly, prior to 89c83fb5, alloc_pages_vma() implemented a somewhat
      different policy for hugepage allocations, which were allocated through
      alloc_hugepage_vma().  For hugepage allocations, if the allocating
      process's node is in the set of allowed nodes, allocate with
      __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
      __GFP_THISNODE instead).  This was changed for shmem_alloc_hugepage() to
      allow fallback to other nodes in 89c83fb5 as it did for new_page() in
      mm/mempolicy.c which is functionally different behavior and removes the
      requirement to only allocate hugepages locally.
      
      So this commit does a full revert of 89c83fb5 instead of the partial
      revert that was done in 2f0799a0.  The result is the same thp
      allocation policy for 4.20 that was in 4.19.
      
      Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
      Fixes: 2f0799a0 ("mm, thp: restore node-local hugepage allocations")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      356ff8a9
    • Benjamin Herrenschmidt's avatar
      Revert "net/ibm/emac: wrong bit is used for STA control" · 5b3279e2
      Benjamin Herrenschmidt authored
      This reverts commit 624ca9c3.
      
      This commit is completely bogus. The STACR register has two formats, old
      and new, depending on the version of the IP block used. There's a pair of
      device-tree properties that can be used to specify the format used:
      
      	has-inverted-stacr-oc
      	has-new-stacr-staopc
      
      What this commit did was to change the bit definition used with the old
      parts to match the new parts. This of course breaks the driver on all
      the old ones.
      
      Instead, the author should have set the appropriate properties in the
      device-tree for the variant used on his board.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b3279e2
    • David S. Miller's avatar
      Merge branch 'skb-headroom-slab-out-of-bounds' · 8b78903b
      David S. Miller authored
      Stefano Brivio says:
      
      ====================
      Fix slab out-of-bounds on insufficient headroom for IPv6 packets
      
      Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
      IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.
      
      Patch 2/2 makes sure we avoid writing before the allocated buffer in
      neigh_hh_output() in case the headroom is enough for the unaligned hardware
      header size, but not enough for the aligned one, and that we warn if we hit
      this condition.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b78903b
    • Stefano Brivio's avatar
      neighbour: Avoid writing before skb->head in neigh_hh_output() · e6ac64d4
      Stefano Brivio authored
      While skb_push() makes the kernel panic if the skb headroom is less than
      the unaligned hardware header size, it will proceed normally in case we
      copy more than that because of alignment, and we'll silently corrupt
      adjacent slabs.
      
      In the case fixed by the previous patch,
      "ipv6: Check available headroom in ip6_xmit() even without options", we
      end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
      header and write 16 bytes, starting 2 bytes before the allocated buffer.
      
      Always check we're not writing before skb->head and, if the headroom is
      not enough, warn and drop the packet.
      
      v2:
       - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
         (Eric Dumazet)
       - if we avoid the panic, though, we need to explicitly check the headroom
         before the memcpy(), otherwise we'll have corrupted slabs on a running
         kernel, after we warn
       - use __skb_push() instead of skb_push(), as the headroom check is
         already implemented here explicitly (Eric Dumazet)
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6ac64d4
    • Stefano Brivio's avatar
      ipv6: Check available headroom in ip6_xmit() even without options · 66033f47
      Stefano Brivio authored
      Even if we send an IPv6 packet without options, MAX_HEADER might not be
      enough to account for the additional headroom required by alignment of
      hardware headers.
      
      On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
      sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
      100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
      bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().
      
      Those would be enough to append our 14 bytes header, but we're going to
      align that to 16 bytes, and write 2 bytes out of the allocated slab in
      neigh_hh_output().
      
      KASan says:
      
      [  264.967848] ==================================================================
      [  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
      [  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
      [  264.967870]
      [  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
      [  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      [  264.967887] Call Trace:
      [  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
      [  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
      [  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
      [  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
      [  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
      [  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
      [  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
      [  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
      [  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
      [  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
      [  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
      [  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
      [  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
      [  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
      [  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
      [  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
      [  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
      [  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
      [  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
      [  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
      [  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
      [  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
      [  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
      [  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
      [  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
      [  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
      [  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
      [  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
      [  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0
      
      [...]
      
      Just like ip_finish_output2() does for IPv4, check that we have enough
      headroom in ip6_xmit(), and reallocate it if we don't.
      
      This issue is older than git history.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66033f47
    • Eric Dumazet's avatar
      tcp: lack of available data can also cause TSO defer · f9bfe4e6
      Eric Dumazet authored
      tcp_tso_should_defer() can return true in three different cases :
      
       1) We are cwnd-limited
       2) We are rwnd-limited
       3) We are application limited.
      
      Neal pointed out that my recent fix went too far, since
      it assumed that if we were not in 1) case, we must be rwnd-limited
      
      Fix this by properly populating the is_cwnd_limited and
      is_rwnd_limited booleans.
      
      After this change, we can finally move the silly check for FIN
      flag only for the application-limited case.
      
      The same move for EOR bit will be handled in net-next,
      since commit 1c09f7d0 ("tcp: do not try to defer skbs
      with eor mark (MSG_EOR)") is scheduled for linux-4.21
      
      Tested by running 200 concurrent netperf -t TCP_RR -- -r 60000,100
      and checking none of them was rwnd_limited in the chrono_stat
      output from "ss -ti" command.
      
      Fixes: 41727549 ("tcp: Do not underestimate rwnd_limited")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9bfe4e6
  6. 07 Dec, 2018 5 commits