1. 22 Dec, 2017 8 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 9ad95bda
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "This contains two fixes for running under Xen:
      
         - a fix avoiding resource conflicts between adding mmio areas and
           memory hotplug
      
         - a fix setting NX bits in page table entries copied from Xen when
           running a PV guest"
      
      * tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/balloon: Mark unallocated host memory as UNUSABLE
        x86-64/Xen: eliminate W+X mappings
      9ad95bda
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · fca0e39b
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Here are some XFS fixes for 4.15-rc5. Apologies for the unusually
        large number of patches this late, but I wanted to make sure the
        corruption fixes were really ready to go.
      
        Changes since last update:
      
         - Fix a locking problem during xattr block conversion that could lead
           to the log checkpointing thread to try to write an incomplete
           buffer to disk, which leads to a corruption shutdown
      
         - Fix a null pointer dereference when removing delayed allocation
           extents
      
         - Remove post-eof speculative allocations when reflinking a block
           past current inode size so that we don't just leave them there and
           assert on inode reclaim
      
         - Relax an assert which didn't accurately reflect the way locking
           works and would trigger under heavy io load
      
         - Avoid infinite loop when cancelling copy on write extents after a
           writeback failure
      
         - Try to avoid copy on write transaction reservation overflows when
           remapping after a successful write
      
         - Fix various problems with the copy-on-write reservation automatic
           garbage collection not being cleaned up properly during a ro
           remount
      
         - Fix problems with rmap log items being processed in the wrong
           order, leading to corruption shutdowns
      
         - Fix problems with EFI recovery wherein the "remove any rmapping if
           present" mechanism wasn't actually doing anything, which would lead
           to corruption problems later when the extent is reallocated,
           leading to multiple rmaps for the same extent"
      
      * tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: only skip rmap owner checks for unknown-owner rmap removal
        xfs: always honor OWN_UNKNOWN rmap removal requests
        xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order
        xfs: set cowblocks tag for direct cow writes too
        xfs: remove leftover CoW reservations when remounting ro
        xfs: don't be so eager to clear the cowblocks tag on truncate
        xfs: track cowblocks separately in i_flags
        xfs: allow CoW remap transactions to use reserve blocks
        xfs: avoid infinite loop when cancelling CoW blocks after writeback failure
        xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping
        xfs: remove dest file's post-eof preallocations before reflinking
        xfs: move xfs_iext_insert tracepoint to report useful information
        xfs: account for null transactions in bunmapi
        xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute
        xfs: add the ability to join a held buffer to a defer_ops
      fca0e39b
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 0fc0f18b
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "This fixes the following issues:
      
         - fix chacha20 crash on zero-length input due to unset IV
      
         - fix potential race conditions in mcryptd with spinlock
      
         - only wait once at top of algif recvmsg to avoid inconsistencies
      
         - fix potential use-after-free in algif_aead/algif_skcipher"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: af_alg - fix race accessing cipher request
        crypto: mcryptd - protect the per-CPU queue with a lock
        crypto: af_alg - wait for data at beginning of recvmsg
        crypto: skcipher - set walk.iv for zero-length inputs
      0fc0f18b
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 6ed16756
      Linus Torvalds authored
      Pull pin control fix from Linus Walleij:
       "A single pin control fix for Intel machines, affecting a bunch of
        Chromebooks. Nothing else collected up amazingly"
      
      * tag 'pinctrl-v4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
      6ed16756
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.15-rc5' of git://people.freedesktop.org/~airlied/linux · e7ae59cb
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "I've got most of two weeks worth of fixes here due to being on
        holidays last week.
      
        The main things are:
      
        - Core:
           * Syncobj fd reference count fix
           * Leasing ioctl misuse fix
      
         - nouveau regression fixes
      
         - further amdgpu DC fixes
      
         - sun4i regression fixes
      
        I'm not sure I'll see many fixes over next couple of weeks, we'll see
        how we go"
      
      * tag 'drm-fixes-for-v4.15-rc5' of git://people.freedesktop.org/~airlied/linux: (27 commits)
        drm/syncobj: Stop reusing the same struct file for all syncobj -> fd
        drm: move lease init after validation in drm_lease_create
        drm/plane: Make framebuffer refcounting the responsibility of setplane_internal callers
        drm/sun4i: hdmi: Move the mode_valid callback to the encoder
        drm/nouveau: fix obvious memory leak
        drm/i915: Protect DDI port to DPLL map from theoretical race.
        drm/i915/lpe: Remove double-encapsulation of info string
        drm/sun4i: Fix error path handling
        drm/nouveau: use alternate memory type for system-memory buffers with kind != 0
        drm/nouveau: avoid GPU page sizes > PAGE_SIZE for buffer objects in host memory
        drm/nouveau/mmu/gp10b: use correct implementation
        drm/nouveau/pci: do a msi rearm on init
        drm/nouveau/imem/nv50: fix refcount_t warning
        drm/nouveau/bios/dp: support DP Info Table 2.0
        drm/nouveau/fbcon: fix NULL pointer access in nouveau_fbcon_destroy
        drm/amd/display: Fix rehook MST display not light back on
        drm/amd/display: fix missing pixel clock adjustment for dongle
        drm/amd/display: set chroma taps to 1 when not scaling
        drm/amd/display: add pipe locking before front end programing
        drm/sun4i: validate modes for HDMI
        ...
      e7ae59cb
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 7edc3f20
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Here's a trio of fixes:
      
         - The runtime PM clk patches that landed this merge window forgot to
           runtime resume devices that may be off while recalculating and
           setting rates of child clks of whatever clk is changing rates.
      
         - We had a NULL pointer deref in an old clk tracepoint when
           clk_set_parent() is called with a NULL parent pointer. This
           shouldn't really happen, but it's best to avoid this regardless.
      
         - The sun9i-mmc clk driver didn't provide 'reset' support, just
           'assert' and 'deassert' so the MMC driver stopped probing when the
           probe was changed to do a reset instead of assert/deassert pair.
           This implements the reset so things work again"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
        clk: fix a panic error caused by accessing NULL pointer
        clk: Manage proper runtime PM state in clk_change_rate()
      7edc3f20
    • Chris Wilson's avatar
      drm/syncobj: Stop reusing the same struct file for all syncobj -> fd · e7cdf5c8
      Chris Wilson authored
      The vk cts test:
      dEQP-VK.api.external.semaphore.opaque_fd.export_multiple_times_temporary
      
      triggers a lot of
      VFS: Close: file count is 0
      
      Dave pointed out that clearing the syncobj->file from
      drm_syncobj_file_release() was sufficient to silence the test, but that
      opens a can of worm since we assumed that the syncobj->file was never
      unset. Stop trying to reuse the same struct file for every fd pointing
      to the drm_syncobj, and allocate one file for each fd instead.
      
      v2: Fixup return handling of drm_syncobj_fd_to_handle
      v2.1: [airlied: fix possible syncobj ref race]
      Reported-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: default avatarDave Airlie <airlied@redhat.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      e7cdf5c8
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2017-12-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 12e412d7
      Dave Airlie authored
      drm-misc-fixes before holidays:
      
      - fixup for the lease fixup (Keith)
      - fb leak in the ww mutex fallback code (Maarten)
      - sun4i fixes (Maxime, Hans)
      
      * tag 'drm-misc-fixes-2017-12-21' of git://anongit.freedesktop.org/drm/drm-misc:
        drm: move lease init after validation in drm_lease_create
        drm/plane: Make framebuffer refcounting the responsibility of setplane_internal callers
        drm/sun4i: hdmi: Move the mode_valid callback to the encoder
        drm/sun4i: Fix error path handling
        drm/sun4i: validate modes for HDMI
      12e412d7
  2. 21 Dec, 2017 32 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ead68f21
      Linus Torvalds authored
      Pull networking fixes from David Miller"
       "What's a holiday weekend without some networking bug fixes? [1]
      
         1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function
            calls, from Daniel Borkmann.
      
         2) Fix regression from errata limiting change to marvell PHY driver,
            from Zhao Qiang.
      
         3) Fix u16 overflow in SCTP, from Xin Long.
      
         4) Fix potential memory leak during bridge newlink, from Nikolay
            Aleksandrov.
      
         5) Fix BPF selftest build on s390, from Hendrik Brueckner.
      
         6) Don't append to cfg80211 automatically generated certs file,
            always write new ones from scratch. From Thierry Reding.
      
         7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai.
      
         8) Fix hang on tg3 MTU change with certain chips, from Brian King.
      
         9) Add stall detection to arc emac driver and reset chip when this
            happens, from Alexander Kochetkov.
      
        10) Fix MTU limitng in GRE tunnel drivers, from Xin Long.
      
        11) Fix stmmac timestamping bug due to mis-shifting of field. From
            Fredrik Hallenberg.
      
        12) Fix metrics match when deleting an ipv4 route. The kernel sets
            some internal metrics bits which the user isn't going to set when
            it makes the delete request. From Phil Sutter.
      
        13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix
            from Yelena Krivosheev.
      
        14) Fix double free and memory corruption in get_net_ns_by_id, from
            Eric W. Biederman.
      
        15) Flush ipv4 FIB tables in the reverse order. Some tables can share
            their actual backing data, in particular this happens for the MAIN
            and LOCAL tables. We have to kill the LOCAL table first, because
            it uses MAIN's backing memory. Fix from Ido Schimmel.
      
        16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann
            Horn, and Alexei Starovoitov.
      
        17) Make changes to ipv6 autoflowlabel sysctl really propagate to
            sockets, unless the socket has set the per-socket value
            explicitly. From Shaohua Li.
      
        18) Fix leaks and double callback invocations of zerocopy SKBs, from
            Willem de Bruijn"
      
      [1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
        skbuff: skb_copy_ubufs must release uarg even without user frags
        skbuff: orphan frags before zerocopy clone
        net: reevalulate autoflowlabel setting after sysctl setting
        openvswitch: Fix pop_vlan action for double tagged frames
        ipv6: Honor specified parameters in fibmatch lookup
        bpf: do not allow root to mangle valid pointers
        selftests/bpf: add tests for recent bugfixes
        bpf: fix integer overflows
        bpf: don't prune branches when a scalar is replaced with a pointer
        bpf: force strict alignment checks for stack pointers
        bpf: fix missing error return in check_stack_boundary()
        bpf: fix 32-bit ALU op verification
        bpf: fix incorrect tracking of register size truncation
        bpf: fix incorrect sign extension in check_alu_op()
        bpf/verifier: fix bounds calculation on BPF_RSH
        ipv4: Fix use-after-free when flushing FIB tables
        s390/qeth: fix error handling in checksum cmd callback
        tipc: remove joining group member from congested list
        selftests: net: Adding config fragment CONFIG_NUMA=y
        nfp: bpf: keep track of the offloaded program
        ...
      ead68f21
    • David S. Miller's avatar
      Merge branch 'net-zerocopy-fixes' · c50b7c47
      David S. Miller authored
      Saeed Mahameed says:
      
      ===================
      Mellanox, mlx5 fixes 2017-12-19
      
      The follwoing series includes some fixes for mlx5 core and etherent
      driver.
      
      Please pull and let me know if there is any problem.
      
      This series doesn't introduce any conflict with the ongoing mlx5 for-next
      submission.
      
      For -stable:
      
      kernels >= v4.7.y
          ("net/mlx5e: Fix possible deadlock of VXLAN lock")
          ("net/mlx5e: Add refcount to VXLAN structure")
          ("net/mlx5e: Prevent possible races in VXLAN control flow")
          ("net/mlx5e: Fix features check of IPv6 traffic")
      
      kernels >= v4.9.y
          ("net/mlx5: Fix error flow in CREATE_QP command")
          ("net/mlx5: Fix rate limit packet pacing naming and struct")
      
      kernels >= v4.13.y
          ("net/mlx5: FPGA, return -EINVAL if size is zero")
      
      kernels >= v4.14.y
          ("Revert "mlx5: move affinity hints assignments to generic code")
      
      All above patches apply and compile with no issues on corresponding -stable.
      ===================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c50b7c47
    • Willem de Bruijn's avatar
      skbuff: skb_copy_ubufs must release uarg even without user frags · b90ddd56
      Willem de Bruijn authored
      skb_copy_ubufs creates a private copy of frags[] to release its hold
      on user frags, then calls uarg->callback to notify the owner.
      
      Call uarg->callback even when no frags exist. This edge case can
      happen when zerocopy_sg_from_iter finds enough room in skb_headlen
      to copy all the data.
      
      Fixes: 3ece7826 ("sock: skb_copy_ubufs support for compound pages")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b90ddd56
    • Willem de Bruijn's avatar
      skbuff: orphan frags before zerocopy clone · 268b7906
      Willem de Bruijn authored
      Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
      calls to skb_uarg(skb)->callback for the same data.
      
      skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
      with each segment. This is only safe for uargs that do refcounting,
      which is those that pass skb_orphan_frags without dropping their
      shared frags. For others, skb_orphan_frags drops the user frags and
      sets the uarg to NULL, after which sock_zerocopy_clone has no effect.
      
      Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
      calls for the same data causing the vhost_net_ubuf_ref_>refcount to
      drop below zero.
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
      Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY")
      Reported-by: default avatarAndreas Hartmann <andihartmann@01019freenet.de>
      Reported-by: default avatarDavid Hill <dhill@redhat.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      268b7906
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 9035a896
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "It's been a few weeks, so here's a small collection of fixes that
        should go into the current series.
      
        This contains:
      
         - NVMe pull request from Christoph, with a few important fixes.
      
         - kyber hang fix from Omar.
      
         - A blk-throttl fix from Shaohua, fixing a case where we double
           charge a bio.
      
         - Two call_single_data alignment fixes from me, fixing up some
           unfortunate changes that went into 4.14 without being properly
           reviewed on the block side (since nobody was CC'ed on the
           patch...).
      
         - A bounce buffer fix in two parts, one from me and one from Ming.
      
         - Revert bdi debug error handling patch. It's causing boot issues for
           some folks, and a week down the line, we're still no closer to a
           fix. Revert this patch for now until it's figured out, then we can
           retry for 4.16"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        Revert "bdi: add error handle for bdi_debug_register"
        null_blk: unalign call_single_data
        block: unalign call_single_data in struct request
        block-throttle: avoid double charge
        block: fix blk_rq_append_bio
        block: don't let passthrough IO go into .make_request_fn()
        nvme: setup streams after initializing namespace head
        nvme: check hw sectors before setting chunk sectors
        nvme: call blk_integrity_unregister after queue is cleaned up
        nvme-fc: remove double put reference if admin connect fails
        nvme: set discard_alignment to zero
        kyber: fix another domain token wait queue hang
      9035a896
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 409232a4
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "ARM fixes:
         - A bug in handling of SPE state for non-vhe systems
         - A fix for a crash on system shutdown
         - Three timer fixes, introduced by the timer optimizations for v4.15
      
        x86 fixes:
         - fix for a WARN that was introduced in 4.15
         - fix for SMM when guest uses PCID
         - fixes for several bugs found by syzkaller
      
        ... and a dozen papercut fixes for the kvm_stat tool"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
        tools/kvm_stat: sort '-f help' output
        kvm: x86: fix RSM when PCID is non-zero
        KVM: Fix stack-out-of-bounds read in write_mmio
        KVM: arm/arm64: Fix timer enable flow
        KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state
        KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
        KVM: arm/arm64: Fix HYP unmapping going off limits
        arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu
        KVM/x86: Check input paging mode when cs.l is set
        tools/kvm_stat: add line for totals
        tools/kvm_stat: stop ignoring unhandled arguments
        tools/kvm_stat: suppress usage information on command line errors
        tools/kvm_stat: handle invalid regular expressions
        tools/kvm_stat: add hint on '-f help' to man page
        tools/kvm_stat: fix child trace events accounting
        tools/kvm_stat: fix extra handling of 'help' with fields filter
        tools/kvm_stat: fix missing field update after filter change
        tools/kvm_stat: fix drilldown in events-by-guests mode
        tools/kvm_stat: fix command line option '-g'
        kvm: x86: fix WARN due to uninitialized guest FPU state
        ...
      409232a4
    • Shaohua Li's avatar
      net: reevalulate autoflowlabel setting after sysctl setting · 513674b5
      Shaohua Li authored
      sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
      If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
      supposed to not include flowlabel. This is true for normal packet, but
      not for reset packet.
      
      The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
      we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
      changed, so the sock will keep the old behavior in terms of auto
      flowlabel. Reset packet is suffering from this problem, because reset
      packet is sent from a special control socket, which is created at boot
      time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
      socket will always have its ipv6_pinfo.autoflowlabel set, even after
      user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
      have flowlabel. Normal sock created before sysctl setting suffers from
      the same issue. We can't even turn off autoflowlabel unless we kill all
      socks in the hosts.
      
      To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
      autoflowlabel setting from user, otherwise we always call
      ip6_default_np_autolabel() which has the new settings of sysctl.
      
      Note, this changes behavior a little bit. Before commit 42240901
      (ipv6: Implement different admin modes for automatic flow labels), the
      autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
      existing connection will change autoflowlabel behavior. After that
      commit, autoflowlabel behavior is sticky in the whole life of the sock.
      With this patch, the behavior isn't sticky again.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      513674b5
    • Eric Garver's avatar
      openvswitch: Fix pop_vlan action for double tagged frames · c48e7473
      Eric Garver authored
      skb_vlan_pop() expects skb->protocol to be a valid TPID for double
      tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
      shift the true ethertype into position for us.
      
      Fixes: 5108bbad ("openvswitch: add processing of L3 packets")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c48e7473
    • Jens Axboe's avatar
      Revert "bdi: add error handle for bdi_debug_register" · 6d0e4827
      Jens Axboe authored
      This reverts commit a0747a85.
      
      It breaks some booting for some users, and more than a week
      into this, there's still no good fix. Revert this commit
      for now until a solution has been found.
      Reported-by: default avatarLaura Abbott <labbott@redhat.com>
      Reported-by: default avatarBruno Wolff III <bruno@wolff.to>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6d0e4827
    • Ido Schimmel's avatar
      ipv6: Honor specified parameters in fibmatch lookup · 58acfd71
      Ido Schimmel authored
      Currently, parameters such as oif and source address are not taken into
      account during fibmatch lookup. Example (IPv4 for reference) before
      patch:
      
      $ ip -4 route show
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
      
      $ ip -6 route show
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
      fe80::/64 dev dummy0 proto kernel metric 256 pref medium
      fe80::/64 dev dummy1 proto kernel metric 256 pref medium
      
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy0
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy1
      RTNETLINK answers: No route to host
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      
      After:
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      RTNETLINK answers: Network is unreachable
      
      The problem stems from the fact that the necessary route lookup flags
      are not set based on these parameters.
      
      Instead of duplicating the same logic for fibmatch, we can simply
      resolve the original route from its copy and dump it instead.
      
      Fixes: 18c3a61c ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58acfd71
    • Darrick J. Wong's avatar
      xfs: only skip rmap owner checks for unknown-owner rmap removal · 68c58e9b
      Darrick J. Wong authored
      For rmap removal, refactor the rmap owner checks into a separate
      function, then skip the checks if we are performing an unknown-owner
      removal.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      68c58e9b
    • Darrick J. Wong's avatar
      xfs: always honor OWN_UNKNOWN rmap removal requests · 33df3a9c
      Darrick J. Wong authored
      Calling xfs_rmap_free with an unknown owner is supposed to remove any
      rmaps covering that range regardless of owner.  This is used by the EFI
      recovery code to say "we're freeing this, it mustn't be owned by
      anything anymore", but for whatever reason xfs_free_ag_extent filters
      them out.
      
      Therefore, remove the filter and make xfs_rmap_unmap actually treat it
      as a wildcard owner -- free anything that's already there, and if
      there's no owner at all then that's fine too.
      
      There are two existing callers of bmap_add_free that take care the rmap
      deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
      cleanup; convert these to use OWN_NULL (via helpers), and now we really
      require that an RUI (if any) gets added to the defer ops before any EFI.
      
      Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
      growfs will have to consult directly with the rmap to ensure that there
      aren't any rmaps in the grown region.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      33df3a9c
    • Darrick J. Wong's avatar
      xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order · 0525e952
      Darrick J. Wong authored
      Under the deferred rmap operation scheme, there's a certain order in
      which the rmap deferred ops have to be queued to maintain integrity
      during log replay.  For alloc/map operations that order is cui -> rui;
      for free/unmap operations that order is cui -> rui -> efi.  However, the
      initial refcount code got the ordering wrong in the free side of things
      because it queued refcount free op and an EFI and the refcount free op
      queued a rmap free op, resulting in the order cui -> efi -> rui.
      
      If we fail before the efd finishes, the efi recovery will try to do a
      wildcard rmap removal and the subsequent rui will fail to find the rmap
      and blow up.  This didn't ever happen due to other screws up in handling
      unknown owner rmap removals, but those other screw ups broke recovery in
      other ways, so fix the ordering to follow the intended rules.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      0525e952
    • Darrick J. Wong's avatar
      xfs: set cowblocks tag for direct cow writes too · 86d692bf
      Darrick J. Wong authored
      If a user performs a direct CoW write, we end up loading the CoW fork
      with preallocated extents.  Therefore, we must set the cowblocks tag so
      that they can be cleared out if we run low on space.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      86d692bf
    • Darrick J. Wong's avatar
      xfs: remove leftover CoW reservations when remounting ro · 10ddf64e
      Darrick J. Wong authored
      When we're remounting the filesystem readonly, remove all CoW
      preallocations prior to going ro.  If the fs goes down after the ro
      remount, we never clean up the staging extents, which means xfs_check
      will trip over them on a subsequent run.  Practically speaking, the next
      mount will clean them up too, so this is unlikely to be seen.  Since we
      shut down the cowblocks cleaner on remount-ro, we also have to make sure
      we start it back up if/when we remount-rw.
      
      Found by adding clonerange to fsstress and running xfs/017.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      10ddf64e
    • Darrick J. Wong's avatar
      xfs: don't be so eager to clear the cowblocks tag on truncate · 363e59ba
      Darrick J. Wong authored
      Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents
      is zero.  This is wrong, since i_cnextents only tracks real extents in
      the CoW fork, which means that we could have some delayed CoW
      reservations still in there that will now never get cleaned.
      
      Fix a further bug where we /don't/ clear the reflink iflag if there are
      any attribute blocks -- really, it's only safe to clear the reflink flag
      if there are no data fork extents and no cow fork extents.
      
      Found by adding clonerange to fsstress in xfs/017.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      363e59ba
    • Stefan Raspl's avatar
      tools/kvm_stat: sort '-f help' output · aa12f594
      Stefan Raspl authored
      Sort the fields returned by specifying '-f help' on the command line.
      While at it, simplify the code a bit, indent the output and eliminate an
      extra blank line at the beginning.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      aa12f594
    • Paolo Bonzini's avatar
      kvm: x86: fix RSM when PCID is non-zero · fae1a3e7
      Paolo Bonzini authored
      rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then
      CR4 & ~PCIDE, then CR0, then CR4.
      
      However, setting CR4.PCIDE fails if CR3[11:0] != 0.  It's probably easier
      in the long run to replace rsm_enter_protected_mode() with an emulator
      callback that sets all the special registers (like KVM_SET_SREGS would
      do).  For now, set the PCID field of CR3 only after CR4.PCIDE is 1.
      Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Tested-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Fixes: 660a5d51
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fae1a3e7
    • Keith Packard's avatar
      drm: move lease init after validation in drm_lease_create · d2a48e52
      Keith Packard authored
      Patch bd36d3ba fixed a deadlock in the
      failure path of drm_lease_create. This made the partially initialized
      lease object visible for a short window of time.
      
      To avoid having the lessee state appear transiently, I've rearranged
      the code so that the lessor fields are not filled in until the
      parameters are all validated and the function will succeed.
      Signed-off-by: default avatarKeith Packard <keithp@keithp.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171221065424.1304-1-keithp@keithp.com
      d2a48e52
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 8b6ca2bf
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2017-12-21
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix multiple security issues in the BPF verifier mostly related
         to the value and min/max bounds tracking rework in 4.14. Issues
         range from incorrect bounds calculation in some BPF_RSH cases,
         to improper sign extension and reg size handling on 32 bit
         ALU ops, missing strict alignment checks on stack pointers, and
         several others that got fixed, from Jann, Alexei and Edward.
      
      2) Fix various build failures in BPF selftests on sparc64. More
         specifically, librt needed to be added to the libs to link
         against and few format string fixups for sizeof, from David.
      
      3) Fix one last remaining issue from BPF selftest build that was
         still occuring on s390x from the asm/bpf_perf_event.h include
         which could not find the asm/ptrace.h copy, from Hendrik.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b6ca2bf
    • Alexei Starovoitov's avatar
      bpf: do not allow root to mangle valid pointers · 82abbf8d
      Alexei Starovoitov authored
      Do not allow root to convert valid pointers into unknown scalars.
      In particular disallow:
       ptr &= reg
       ptr <<= reg
       ptr += ptr
      and explicitly allow:
       ptr -= ptr
      since pkt_end - pkt == length
      
      1.
      This minimizes amount of address leaks root can do.
      In the future may need to further tighten the leaks with kptr_restrict.
      
      2.
      If program has such pointer math it's likely a user mistake and
      when verifier complains about it right away instead of many instructions
      later on invalid memory access it's easier for users to fix their progs.
      
      3.
      when register holding a pointer cannot change to scalar it allows JITs to
      optimize better. Like 32-bit archs could use single register for pointers
      instead of a pair required to hold 64-bit scalars.
      
      4.
      reduces architecture dependent behavior. Since code:
      r1 = r10;
      r1 &= 0xff;
      if (r1 ...)
      will behave differently arm64 vs x64 and offloaded vs native.
      
      A significant chunk of ptr mangling was allowed by
      commit f1174f77 ("bpf/verifier: rework value tracking")
      yet some of it was allowed even earlier.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      82abbf8d
    • Daniel Borkmann's avatar
      Merge branch 'bpf-verifier-sec-fixes' · 3db9128f
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      This patch set addresses a set of security vulnerabilities
      in bpf verifier logic discovered by Jann Horn.
      All of the patches are candidates for 4.14 stable.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3db9128f
    • Jann Horn's avatar
      selftests/bpf: add tests for recent bugfixes · 2255f8d5
      Jann Horn authored
      These tests should cover the following cases:
      
       - MOV with both zero-extended and sign-extended immediates
       - implicit truncation of register contents via ALU32/MOV32
       - implicit 32-bit truncation of ALU32 output
       - oversized register source operand for ALU32 shift
       - right-shift of a number that could be positive or negative
       - map access where adding the operation size to the offset causes signed
         32-bit overflow
       - direct stack access at a ~4GiB offset
      
      Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests
      that should fail independent of what flags userspace passes.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2255f8d5
    • Alexei Starovoitov's avatar
      bpf: fix integer overflows · bb7f0f98
      Alexei Starovoitov authored
      There were various issues related to the limited size of integers used in
      the verifier:
       - `off + size` overflow in __check_map_access()
       - `off + reg->off` overflow in check_mem_access()
       - `off + reg->var_off.value` overflow or 32-bit truncation of
         `reg->var_off.value` in check_mem_access()
       - 32-bit truncation in check_stack_boundary()
      
      Make sure that any integer math cannot overflow by not allowing
      pointer math with large values.
      
      Also reduce the scope of "scalar op scalar" tracking.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bb7f0f98
    • Jann Horn's avatar
      bpf: don't prune branches when a scalar is replaced with a pointer · 179d1c56
      Jann Horn authored
      This could be made safe by passing through a reference to env and checking
      for env->allow_ptr_leaks, but it would only work one way and is probably
      not worth the hassle - not doing it will not directly lead to program
      rejection.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      179d1c56
    • Jann Horn's avatar
      bpf: force strict alignment checks for stack pointers · a5ec6ae1
      Jann Horn authored
      Force strict alignment checks for stack pointers because the tracking of
      stack spills relies on it; unaligned stack accesses can lead to corruption
      of spilled registers, which is exploitable.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a5ec6ae1
    • Jann Horn's avatar
      bpf: fix missing error return in check_stack_boundary() · ea25f914
      Jann Horn authored
      Prevent indirect stack accesses at non-constant addresses, which would
      permit reading and corrupting spilled pointers.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ea25f914
    • Jann Horn's avatar
      bpf: fix 32-bit ALU op verification · 468f6eaf
      Jann Horn authored
      32-bit ALU ops operate on 32-bit values and have 32-bit outputs.
      Adjust the verifier accordingly.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      468f6eaf
    • Jann Horn's avatar
      bpf: fix incorrect tracking of register size truncation · 0c17d1d2
      Jann Horn authored
      Properly handle register truncation to a smaller size.
      
      The old code first mirrors the clearing of the high 32 bits in the bitwise
      tristate representation, which is correct. But then, it computes the new
      arithmetic bounds as the intersection between the old arithmetic bounds and
      the bounds resulting from the bitwise tristate representation. Therefore,
      when coerce_reg_to_32() is called on a number with bounds
      [0xffff'fff8, 0x1'0000'0007], the verifier computes
      [0xffff'fff8, 0xffff'ffff] as bounds of the truncated number.
      This is incorrect: The truncated number could also be in the range [0, 7],
      and no meaningful arithmetic bounds can be computed in that case apart from
      the obvious [0, 0xffff'ffff].
      
      Starting with v4.14, this is exploitable by unprivileged users as long as
      the unprivileged_bpf_disabled sysctl isn't set.
      
      Debian assigned CVE-2017-16996 for this issue.
      
      v2:
       - flip the mask during arithmetic bounds calculation (Ben Hutchings)
      v3:
       - add CVE number (Ben Hutchings)
      
      Fixes: b03c9f9f ("bpf/verifier: track signed and unsigned min/max values")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0c17d1d2
    • Jann Horn's avatar
      bpf: fix incorrect sign extension in check_alu_op() · 95a762e2
      Jann Horn authored
      Distinguish between
      BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
      and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
      only perform sign extension in the first case.
      
      Starting with v4.14, this is exploitable by unprivileged users as long as
      the unprivileged_bpf_disabled sysctl isn't set.
      
      Debian assigned CVE-2017-16995 for this issue.
      
      v3:
       - add CVE number (Ben Hutchings)
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      95a762e2
    • Edward Cree's avatar
      bpf/verifier: fix bounds calculation on BPF_RSH · 4374f256
      Edward Cree authored
      Incorrect signed bounds were being computed.
      If the old upper signed bound was positive and the old lower signed bound was
      negative, this could cause the new upper signed bound to be too low,
      leading to security issues.
      
      Fixes: b03c9f9f ("bpf/verifier: track signed and unsigned min/max values")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [jannh@google.com: changed description to reflect bug impact]
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4374f256
    • Darrick J. Wong's avatar
      xfs: track cowblocks separately in i_flags · 91aae6be
      Darrick J. Wong authored
      The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them
      with separate i_flags.  Right now we're abusing IEOFBLOCKS for both,
      which is totally bogus because we won't tag the inode with COWBLOCKS if
      IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS.
      Found by wiring up clonerange to fsstress in xfs/017.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      91aae6be