1. 04 Jun, 2020 11 commits
    • Ahmed S. Darwish's avatar
      net: phy: fixed_phy: Remove unused seqcount · 79cbb6bc
      Ahmed S. Darwish authored
      Commit bf7afb29 ("phy: improve safety of fixed-phy MII register
      reading") protected the fixed PHY status with a sequence counter.
      
      Two years later, commit d2b97793 ("net: phy: fixed-phy: remove
      fixed_phy_update_state()") removed the sequence counter's write side
      critical section -- neutralizing its read side retry loop.
      
      Remove the unused seqcount.
      Signed-off-by: default avatarAhmed S. Darwish <a.darwish@linutronix.de>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79cbb6bc
    • Ahmed S. Darwish's avatar
      net: core: device_rename: Use rwsem instead of a seqcount · 11d6011c
      Ahmed S. Darwish authored
      Sequence counters write paths are critical sections that must never be
      preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
      
      Commit 5dbe7c17 ("net: fix kernel deadlock with interface rename and
      netdev name retrieval.") handled a deadlock, observed with
      CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
      infinitely spinning: it got scheduled after the seqcount write side
      blocked inside its own critical section.
      
      To fix that deadlock, among other issues, the commit added a
      cond_resched() inside the read side section. While this will get the
      non-preemptible kernel eventually unstuck, the seqcount reader is fully
      exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
      
      The fix is also still broken: if the seqcount reader belongs to a
      real-time scheduling policy, it can spin forever and the kernel will
      livelock.
      
      Disabling preemption over the seqcount write side critical section will
      not work: inside it are a number of GFP_KERNEL allocations and mutex
      locking through the drivers/base/ :: device_rename() call chain.
      
      >From all the above, replace the seqcount with a rwsem.
      
      Fixes: 5dbe7c17 (net: fix kernel deadlock with interface rename and netdev name retrieval.)
      Fixes: 30e6c9fa (net: devnet_rename_seq should be a seqcount)
      Fixes: c91f6df2 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
      Cc: <stable@vger.kernel.org>
      Reported-by: kbuild test robot <lkp@intel.com> [ v1 missing up_read() on error exit ]
      Reported-by: Dan Carpenter <dan.carpenter@oracle.com> [ v1 missing up_read() on error exit ]
      Signed-off-by: default avatarAhmed S. Darwish <a.darwish@linutronix.de>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11d6011c
    • Michal Vokáč's avatar
      net: dsa: qca8k: Fix "Unexpected gfp" kernel exception · 67122a79
      Michal Vokáč authored
      Commit 7e99e347 ("net: dsa: remove dsa_switch_alloc helper")
      replaced the dsa_switch_alloc helper by devm_kzalloc in all DSA
      drivers. Unfortunately it introduced a typo in qca8k.c driver and
      wrong argument is passed to the devm_kzalloc function.
      
      This fix mitigates the following kernel exception:
      
        Unexpected gfp: 0x6 (__GFP_HIGHMEM|GFP_DMA32). Fixing up to gfp: 0x101 (GFP_DMA|__GFP_ZERO). Fix your code!
        CPU: 1 PID: 44 Comm: kworker/1:1 Not tainted 5.5.9-yocto-ua #1
        Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
        Workqueue: events deferred_probe_work_func
        [<c0014924>] (unwind_backtrace) from [<c00123bc>] (show_stack+0x10/0x14)
        [<c00123bc>] (show_stack) from [<c04c8fb4>] (dump_stack+0x90/0xa4)
        [<c04c8fb4>] (dump_stack) from [<c00e1b10>] (new_slab+0x20c/0x214)
        [<c00e1b10>] (new_slab) from [<c00e1cd0>] (___slab_alloc.constprop.0+0x1b8/0x540)
        [<c00e1cd0>] (___slab_alloc.constprop.0) from [<c00e2074>] (__slab_alloc.constprop.0+0x1c/0x24)
        [<c00e2074>] (__slab_alloc.constprop.0) from [<c00e4538>] (__kmalloc_track_caller+0x1b0/0x298)
        [<c00e4538>] (__kmalloc_track_caller) from [<c02cccac>] (devm_kmalloc+0x24/0x70)
        [<c02cccac>] (devm_kmalloc) from [<c030d888>] (qca8k_sw_probe+0x94/0x1ac)
        [<c030d888>] (qca8k_sw_probe) from [<c0304788>] (mdio_probe+0x30/0x54)
        [<c0304788>] (mdio_probe) from [<c02c93bc>] (really_probe+0x1e0/0x348)
        [<c02c93bc>] (really_probe) from [<c02c9884>] (driver_probe_device+0x60/0x16c)
        [<c02c9884>] (driver_probe_device) from [<c02c7fb0>] (bus_for_each_drv+0x70/0x94)
        [<c02c7fb0>] (bus_for_each_drv) from [<c02c9708>] (__device_attach+0xb4/0x11c)
        [<c02c9708>] (__device_attach) from [<c02c8148>] (bus_probe_device+0x84/0x8c)
        [<c02c8148>] (bus_probe_device) from [<c02c8cec>] (deferred_probe_work_func+0x64/0x90)
        [<c02c8cec>] (deferred_probe_work_func) from [<c0033c14>] (process_one_work+0x1d4/0x41c)
        [<c0033c14>] (process_one_work) from [<c00340a4>] (worker_thread+0x248/0x528)
        [<c00340a4>] (worker_thread) from [<c0039148>] (kthread+0x124/0x150)
        [<c0039148>] (kthread) from [<c00090d8>] (ret_from_fork+0x14/0x3c)
        Exception stack(0xee1b5fb0 to 0xee1b5ff8)
        5fa0:                                     00000000 00000000 00000000 00000000
        5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
        5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
        qca8k 2188000.ethernet-1:0a: Using legacy PHYLIB callbacks. Please migrate to PHYLINK!
        qca8k 2188000.ethernet-1:0a eth2 (uninitialized): PHY [2188000.ethernet-1:01] driver [Generic PHY]
        qca8k 2188000.ethernet-1:0a eth1 (uninitialized): PHY [2188000.ethernet-1:02] driver [Generic PHY]
      
      Fixes: 7e99e347 ("net: dsa: remove dsa_switch_alloc helper")
      Signed-off-by: default avatarMichal Vokáč <michal.vokac@ysoft.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67122a79
    • Jiri Benc's avatar
      geneve: change from tx_error to tx_dropped on missing metadata · 9d149045
      Jiri Benc authored
      If the geneve interface is in collect_md (external) mode, it can't send any
      packets submitted directly to its net interface, as such packets won't have
      metadata attached. This is expected.
      
      However, the kernel itself sends some packets to the interface, most
      notably, IPv6 DAD, IPv6 multicast listener reports, etc. This is not wrong,
      as tunnel metadata can be specified in routing table (although technically,
      that has never worked for IPv6, but hopefully will be fixed eventually) and
      then the interface must correctly participate in IPv6 housekeeping.
      
      The problem is that any such attempt increases the tx_error counter. Just
      bringing up a geneve interface with IPv6 enabled is enough to see a number
      of tx_errors. That causes confusion among users, prompting them to find
      a network error where there is none.
      
      Change the counter used to tx_dropped. That better conveys the meaning
      (there's nothing wrong going on, just some packets are getting dropped) and
      hopefully will make admins panic less.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d149045
    • David S. Miller's avatar
      Merge branch 'ena-xdp-fixes' · a9a7d129
      David S. Miller authored
      Sameeh Jubran says:
      
      ====================
      Fix xdp in ena driver
      
      This patchset includes 2 XDP related bug fixes
      
      Difference from v1:
      * Fixed "Fixes" tag
      ====================
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9a7d129
    • Sameeh Jubran's avatar
      net: ena: xdp: update napi budget for DROP and ABORTED · 3921a81c
      Sameeh Jubran authored
      This patch fixes two issues with XDP:
      
      1. If the XDP verdict is XDP_ABORTED we break the loop, which results in
         us handling one buffer per napi cycle instead of the total budget
         (usually 64). To overcome this simply change the xdp_verdict check to
         != XDP_PASS. When the verdict is XDP_PASS, the skb is not expected to
         be NULL.
      
      2. Update the residual budget for XDP_DROP and XDP_ABORTED, since
         packets are handled in these cases.
      
      Fixes: 548c4940 ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarSameeh Jubran <sameehj@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3921a81c
    • Sameeh Jubran's avatar
      net: ena: xdp: XDP_TX: fix memory leak · cd07eccc
      Sameeh Jubran authored
      When sending very high packet rate, the XDP tx queues can get full and
      start dropping packets. In this case we don't free the pages which
      results in ena driver draining the system memory.
      
      Fix:
      Simply free the pages when necessary.
      
      Fixes: 548c4940 ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarSameeh Jubran <sameehj@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd07eccc
    • Ahmed Abdelsalam's avatar
      seg6: fix seg6_validate_srh() to avoid slab-out-of-bounds · bb986a50
      Ahmed Abdelsalam authored
      The seg6_validate_srh() is used to validate SRH for three cases:
      
      case1: SRH of data-plane SRv6 packets to be processed by the Linux kernel.
      Case2: SRH of the netlink message received  from user-space (iproute2)
      Case3: SRH injected into packets through setsockopt
      
      In case1, the SRH can be encoded in the Reduced way (i.e., first SID is
      carried in DA only and not represented as SID in the SRH) and the
      seg6_validate_srh() now handles this case correctly.
      
      In case2 and case3, the SRH shouldn’t be encoded in the Reduced way
      otherwise we lose the first segment (i.e., the first hop).
      
      The current implementation of the seg6_validate_srh() allow SRH of case2
      and case3 to be encoded in the Reduced way. This leads a slab-out-of-bounds
      problem.
      
      This patch verifies SRH of case1, case2 and case3. Allowing case1 to be
      reduced while preventing SRH of case2 and case3 from being reduced .
      
      Reported-by: syzbot+e8c028b62439eac42073@syzkaller.appspotmail.com
      Reported-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Fixes: 0cb7498f ("seg6: fix SRH processing to comply with RFC8754")
      Signed-off-by: default avatarAhmed Abdelsalam <ahabdels@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb986a50
    • Tuong Lien's avatar
      tipc: fix NULL pointer dereference in streaming · 5e9eeccc
      Tuong Lien authored
      syzbot found the following crash:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000019: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf]
      CPU: 1 PID: 7060 Comm: syz-executor394 Not tainted 5.7.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__tipc_sendstream+0xbde/0x11f0 net/tipc/socket.c:1591
      Code: 00 00 00 00 48 39 5c 24 28 48 0f 44 d8 e8 fa 3e db f9 48 b8 00 00 00 00 00 fc ff df 48 8d bb c8 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 e2 04 00 00 48 8b 9b c8 00 00 00 48 b8 00 00 00
      RSP: 0018:ffffc90003ef7818 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8797fd9d
      RDX: 0000000000000019 RSI: ffffffff8797fde6 RDI: 00000000000000c8
      RBP: ffff888099848040 R08: ffff88809a5f6440 R09: fffffbfff1860b4c
      R10: ffffffff8c305a5f R11: fffffbfff1860b4b R12: ffff88809984857e
      R13: 0000000000000000 R14: ffff888086aa4000 R15: 0000000000000000
      FS:  00000000009b4880(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000140 CR3: 00000000a7fdf000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tipc_sendstream+0x4c/0x70 net/tipc/socket.c:1533
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x32f/0x810 net/socket.c:2352
       ___sys_sendmsg+0x100/0x170 net/socket.c:2406
       __sys_sendmmsg+0x195/0x480 net/socket.c:2496
       __do_sys_sendmmsg net/socket.c:2525 [inline]
       __se_sys_sendmmsg net/socket.c:2522 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2522
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x440199
      ...
      
      This bug was bisected to commit 0a3e060f ("tipc: add test for Nagle
      algorithm effectiveness"). However, it is not the case, the trouble was
      from the base in the case of zero data length message sending, we would
      unexpectedly make an empty 'txq' queue after the 'tipc_msg_append()' in
      Nagle mode.
      
      A similar crash can be generated even without the bisected patch but at
      the link layer when it accesses the empty queue.
      
      We solve the issues by building at least one buffer to go with socket's
      header and an optional data section that may be empty like what we had
      with the 'tipc_msg_build()'.
      
      Note: the previous commit 4c21daae ("tipc: Fix NULL pointer
      dereference in __tipc_sendstream()") is obsoleted by this one since the
      'txq' will be never empty and the check of 'skb != NULL' is unnecessary
      but it is safe anyway.
      
      Reported-by: syzbot+8eac6d030e7807c21d32@syzkaller.appspotmail.com
      Fixes: c0bceb97 ("tipc: add smart nagle feature")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e9eeccc
    • Cong Wang's avatar
      genetlink: fix memory leaks in genl_family_rcv_msg_dumpit() · c36f0555
      Cong Wang authored
      There are two kinds of memory leaks in genl_family_rcv_msg_dumpit():
      
      1. Before we call ops->start(), whenever an error happens, we forget
         to free the memory allocated in genl_family_rcv_msg_dumpit().
      
      2. When ops->start() fails, the 'info' has been already installed on
         the per socket control block, so we should not free it here. More
         importantly, nlk->cb_running is still false at this point, so
         netlink_sock_destruct() cannot free it either.
      
      The first kind of memory leaks is easier to resolve, but the second
      one requires some deeper thoughts.
      
      After reviewing how netfilter handles this, the most elegant solution
      I find is just to use a similar way to allocate the memory, that is,
      moving memory allocations from caller into ops->start(). With this,
      we can solve both kinds of memory leaks: for 1), no memory allocation
      happens before ops->start(); for 2), ops->start() handles its own
      failures and 'info' is installed to the socket control block only
      when success. The only ugliness here is we have to pass all local
      variables on stack via a struct, but this is not hard to understand.
      
      Alternatively, we can introduce a ops->free() to solve this too,
      but it is overkill as only genetlink has this problem so far.
      
      Fixes: 1927f41a ("net: genetlink: introduce dump info struct to be available during dumpit op")
      Reported-by: syzbot+21f04f481f449c8db840@syzkaller.appspotmail.com
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Cc: Shaochun Chen <cscnull@gmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c36f0555
    • Rohit Maheshwari's avatar
      crypto/chcr: error seen if CONFIG_CHELSIO_TLS_DEVICE isn't set · ef1c7559
      Rohit Maheshwari authored
      cxgb4_uld_in_use() is used only by cxgb4_ktls_det_feature() which
      is under CONFIG_CHELSIO_TLS_DEVICE macro.
      
      Fixes: a3ac249a ("cxgb4/chcr: Enable ktls settings at run time")
      Signed-off-by: default avatarRohit Maheshwari <rohitm@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef1c7559
  2. 03 Jun, 2020 26 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · cb8e59cc
      Linus Torvalds authored
      Pull networking updates from David Miller:
      
       1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
          Augusto von Dentz.
      
       2) Add GSO partial support to igc, from Sasha Neftin.
      
       3) Several cleanups and improvements to r8169 from Heiner Kallweit.
      
       4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
          device self-test. From Andrew Lunn.
      
       5) Start moving away from custom driver versions, use the globally
          defined kernel version instead, from Leon Romanovsky.
      
       6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
      
       7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
      
       8) Add sriov and vf support to hinic, from Luo bin.
      
       9) Support Media Redundancy Protocol (MRP) in the bridging code, from
          Horatiu Vultur.
      
      10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
      
      11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
          Dubroca. Also add ipv6 support for espintcp.
      
      12) Lots of ReST conversions of the networking documentation, from Mauro
          Carvalho Chehab.
      
      13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
          from Doug Berger.
      
      14) Allow to dump cgroup id and filter by it in inet_diag code, from
          Dmitry Yakunin.
      
      15) Add infrastructure to export netlink attribute policies to
          userspace, from Johannes Berg.
      
      16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
      
      17) Fallback to the default qdisc if qdisc init fails because otherwise
          a packet scheduler init failure will make a device inoperative. From
          Jesper Dangaard Brouer.
      
      18) Several RISCV bpf jit optimizations, from Luke Nelson.
      
      19) Correct the return type of the ->ndo_start_xmit() method in several
          drivers, it's netdev_tx_t but many drivers were using
          'int'. From Yunjian Wang.
      
      20) Add an ethtool interface for PHY master/slave config, from Oleksij
          Rempel.
      
      21) Add BPF iterators, from Yonghang Song.
      
      22) Add cable test infrastructure, including ethool interfaces, from
          Andrew Lunn. Marvell PHY driver is the first to support this
          facility.
      
      23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
      
      24) Calculate and maintain an explicit frame size in XDP, from Jesper
          Dangaard Brouer.
      
      25) Add CAP_BPF, from Alexei Starovoitov.
      
      26) Support terse dumps in the packet scheduler, from Vlad Buslov.
      
      27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
      
      28) Add devm_register_netdev(), from Bartosz Golaszewski.
      
      29) Minimize qdisc resets, from Cong Wang.
      
      30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
          eliminate set_fs/get_fs calls. From Christoph Hellwig.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
        selftests: net: ip_defrag: ignore EPERM
        net_failover: fixed rollback in net_failover_open()
        Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
        Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
        vmxnet3: allow rx flow hash ops only when rss is enabled
        hinic: add set_channels ethtool_ops support
        selftests/bpf: Add a default $(CXX) value
        tools/bpf: Don't use $(COMPILE.c)
        bpf, selftests: Use bpf_probe_read_kernel
        s390/bpf: Use bcr 0,%0 as tail call nop filler
        s390/bpf: Maintain 8-byte stack alignment
        selftests/bpf: Fix verifier test
        selftests/bpf: Fix sample_cnt shared between two threads
        bpf, selftests: Adapt cls_redirect to call csum_level helper
        bpf: Add csum_level helper for fixing up csum levels
        bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
        sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
        crypto/chtls: IPv6 support for inline TLS
        Crypto/chcr: Fixes a coccinile check error
        Crypto/chcr: Fixes compilations warnings
        ...
      cb8e59cc
    • Linus Torvalds's avatar
      Merge branch 'uaccess.comedi' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 2e63f6ce
      Linus Torvalds authored
      Pull comedi uaccess cleanups from Al Viro:
       "Comedi compat ioctls done saner - killing the single biggest pile of
        __get_user/__put_user outside of arch/* in the process"
      
      * 'uaccess.comedi' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        comedi: get rid of compat_alloc_user_space() mess in COMEDI_CMD{,TEST} compat
        comedi: do_cmd_ioctl(): lift copyin/copyout into the caller
        comedi: do_cmdtest_ioctl(): lift copyin/copyout into the caller
        comedi: lift copy_from_user() into callers of __comedi_get_user_cmd()
        comedi: get rid of compat_alloc_user_space() mess in COMEDI_INSNLIST compat
        comedi: get rid of compat_alloc_user_space() mess in COMEDI_INSN compat
        comedi: get rid of compat_alloc_user_space() mess in COMEDI_RANGEINFO compat
        comedi: get rid of compat_alloc_user_space() mess in COMEDI_CHANINFO compat
        comedi: get rid of indirection via translated_ioctl()
        comedi: move compat ioctl handling to native fops
      2e63f6ce
    • Linus Torvalds's avatar
      Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ae03c53d
      Linus Torvalds authored
      Pull splice updates from Al Viro:
       "Christoph's assorted splice cleanups"
      
      * 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: rename pipe_buf ->steal to ->try_steal
        fs: make the pipe_buf_operations ->confirm operation optional
        fs: make the pipe_buf_operations ->steal operation optional
        trace: remove tracing_pipe_buf_ops
        pipe: merge anon_pipe_buf*_ops
        fs: simplify do_splice_from
        fs: simplify do_splice_to
      ae03c53d
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 039aeb9d
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "ARM:
         - Move the arch-specific code into arch/arm64/kvm
      
         - Start the post-32bit cleanup
      
         - Cherry-pick a few non-invasive pre-NV patches
      
        x86:
         - Rework of TLB flushing
      
         - Rework of event injection, especially with respect to nested
           virtualization
      
         - Nested AMD event injection facelift, building on the rework of
           generic code and fixing a lot of corner cases
      
         - Nested AMD live migration support
      
         - Optimization for TSC deadline MSR writes and IPIs
      
         - Various cleanups
      
         - Asynchronous page fault cleanups (from tglx, common topic branch
           with tip tree)
      
         - Interrupt-based delivery of asynchronous "page ready" events (host
           side)
      
         - Hyper-V MSRs and hypercalls for guest debugging
      
         - VMX preemption timer fixes
      
        s390:
         - Cleanups
      
        Generic:
         - switch vCPU thread wakeup from swait to rcuwait
      
        The other architectures, and the guest side of the asynchronous page
        fault work, will come next week"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
        KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
        KVM: check userspace_addr for all memslots
        KVM: selftests: update hyperv_cpuid with SynDBG tests
        x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
        x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
        x86/kvm/hyper-v: Add support for synthetic debugger interface
        x86/hyper-v: Add synthetic debugger definitions
        KVM: selftests: VMX preemption timer migration test
        KVM: nVMX: Fix VMX preemption timer migration
        x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
        KVM: x86/pmu: Support full width counting
        KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
        KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
        KVM: x86: acknowledgment mechanism for async pf page ready notifications
        KVM: x86: interrupt based APF 'page ready' event delivery
        KVM: introduce kvm_read_guest_offset_cached()
        KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
        KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
        Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
        KVM: VMX: Replace zero-length array with flexible-array
        ...
      039aeb9d
    • Linus Torvalds's avatar
      Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux · 6b2591c2
      Linus Torvalds authored
      Pull hyper-v updates from Wei Liu:
      
       - a series from Andrea to support channel reassignment
      
       - a series from Vitaly to clean up Vmbus message handling
      
       - a series from Michael to clean up and augment hyperv-tlfs.h
      
       - patches from Andy to clean up GUID usage in Hyper-V code
      
       - a few other misc patches
      
      * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (29 commits)
        Drivers: hv: vmbus: Resolve more races involving init_vp_index()
        Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug
        vmbus: Replace zero-length array with flexible-array
        Driver: hv: vmbus: drop a no long applicable comment
        hyper-v: Switch to use UUID types directly
        hyper-v: Replace open-coded variant of %*phN specifier
        hyper-v: Supply GUID pointer to printf() like functions
        hyper-v: Use UUID API for exporting the GUID (part 2)
        asm-generic/hyperv: Add definitions for Get/SetVpRegister hypercalls
        x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent files
        x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #defines
        KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page
        drivers: hv: remove redundant assignment to pointer primary_channel
        scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
        Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type
        Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
        Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic
        PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
        Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal
        hv_utils: Always execute the fcopy and vss callbacks in a tasklet
        ...
      6b2591c2
    • Linus Torvalds's avatar
      Merge tag 'kgdb-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · f1e45535
      Linus Torvalds authored
      Pull kgdb updates from Daniel Thompson:
       "By far the biggest change in this cycle are the changes that allow
        much earlier debug of systems that are hooked up via UART by taking
        advantage of the earlycon framework to implement the kgdb I/O hooks
        before handing over to the regular polling I/O drivers once they are
        available. When discussing Doug's work we also found and fixed an
        broken raw_smp_processor_id() sequence in in_dbg_master().
      
        Also included are a collection of much smaller fixes and tweaks: a
        couple of tweaks to ged rid of doc gen or coccicheck warnings, future
        proof some internal calculations that made implicit power-of-2
        assumptions and eliminate some rather weird handling of magic
        environment variables in kdb"
      
      * tag 'kgdb-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
        kdb: Remove the misfeature 'KDBFLAGS'
        kdb: Cleanup math with KDB_CMD_HISTORY_COUNT
        serial: amba-pl011: Support kgdboc_earlycon
        serial: 8250_early: Support kgdboc_earlycon
        serial: qcom_geni_serial: Support kgdboc_earlycon
        serial: kgdboc: Allow earlycon initialization to be deferred
        Documentation: kgdboc: Document new kgdboc_earlycon parameter
        kgdb: Don't call the deinit under spinlock
        kgdboc: Disable all the early code when kgdboc is a module
        kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles
        kgdboc: Remove useless #ifdef CONFIG_KGDB_SERIAL_CONSOLE in kgdboc
        kgdb: Prevent infinite recursive entries to the debugger
        kgdb: Delay "kgdbwait" to dbg_late_init() by default
        kgdboc: Use a platform device to handle tty drivers showing up late
        Revert "kgdboc: disable the console lock when in kgdb"
        kgdb: Disable WARN_CONSOLE_UNLOCKED for all kgdb
        kgdb: Return true in kgdb_nmi_poll_knock()
        kgdb: Drop malformed kernel doc comment
        kgdb: Fix spurious true from in_dbg_master()
      f1e45535
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20200603' of git://github.com/jcmvbkbc/linux-xtensa · 38696e33
      Linus Torvalds authored
      Pull Xtensa updates from Max Filippov:
      
       - fix __user annotations in asm/uaccess.h
      
       - fix comments in entry.S
      
      * tag 'xtensa-20200603' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: Fix spelling/grammar in comment
        xtensa: add missing __user annotations to asm/uaccess.h
        xtensa: fix error paths in __get_user_{check,size}
        xtensa: fix type conversion in __get_user_size
        xtensa: add missing __user annotations to __{get,put}_user_check
      38696e33
    • Linus Torvalds's avatar
      Merge branch 'parisc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 44e40e96
      Linus Torvalds authored
      Pull parsic updates from Helge Deller:
       "Enable the sysctl file interface for panic_on_stackoverflow for
        parisc, a warning fix and a bunch of documentation updates since the
        parisc website is now at https://parisc.wiki.kernel.org"
      
      * 'parisc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: MAINTAINERS: Update references to parisc website
        parisc: module: Update references to parisc website
        parisc: hardware: Update references to parisc website
        parisc: firmware: Update references to parisc website
        parisc: Kconfig: Update references to parisc website
        parisc: add sysctl file interface panic_on_stackoverflow
        parisc: use -fno-strict-aliasing for decompressor
        parisc: suppress error messages for 'make clean'
      44e40e96
    • Linus Torvalds's avatar
      Merge tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 8226f113
      Linus Torvalds authored
      Pull MIPS updates from Thomas Bogendoerfer:
      
       - added support for MIPSr5 and P5600 cores
      
       - converted Loongson PCI driver into a PCI host driver using the
         generic PCI framework
      
       - added emulation of CPUCFG command for Loogonson64 cpus
      
       - removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
      
       - ioremap cleanup
      
       - fix for a race between two threads faulting the same page
      
       - various cleanups and fixes
      
      * tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (143 commits)
        MIPS: ralink: drop ralink_clk_init for mt7621
        MIPS: ralink: bootrom: mark a function as __init to save some memory
        MIPS: Loongson64: Reorder CPUCFG model match arms
        MIPS: Expose Loongson CPUCFG availability via HWCAP
        MIPS: Loongson64: Guard against future cores without CPUCFG
        MIPS: Fix build warning about "PTR_STR" redefinition
        MIPS: Loongson64: Remove not used pci.c
        MIPS: Loongson64: Define PCI_IOBASE
        MIPS: CPU_LOONGSON2EF need software to maintain cache consistency
        MIPS: DTS: Fix build errors used with various configs
        MIPS: Loongson64: select NO_EXCEPT_FILL
        MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe()
        MIPS: mm: add page valid judgement in function pte_modify
        mm/memory.c: Add memory read privilege on page fault handling
        mm/memory.c: Update local TLB if PTE entry exists
        MIPS: Do not flush tlb page when updating PTE entry
        MIPS: ingenic: Default to a generic board
        MIPS: ingenic: Add support for GCW Zero prototype
        MIPS: ingenic: DTS: Add memory info of GCW Zero
        MIPS: Loongson64: Switch to generic PCI driver
        ...
      8226f113
    • Linus Torvalds's avatar
      Merge branch 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · e8f4abf8
      Linus Torvalds authored
      Pull ia64 build regression fix from Al Viro:
       "Fix a braino in ia64 uaccess csum changes"
      
      * 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix a braino in ia64 uaccess csum changes
      e8f4abf8
    • Al Viro's avatar
      fix a braino in ia64 uaccess csum changes · 174e1ea8
      Al Viro authored
      Fixes: cc03f19c (ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user())
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      174e1ea8
    • Linus Torvalds's avatar
      Merge tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · e7c93cbf
      Linus Torvalds authored
      Pull thread updates from Christian Brauner:
       "We have been discussing using pidfds to attach to namespaces for quite
        a while and the patches have in one form or another already existed
        for about a year. But I wanted to wait to see how the general api
        would be received and adopted.
      
        This contains the changes to make it possible to use pidfds to attach
        to the namespaces of a process, i.e. they can be passed as the first
        argument to the setns() syscall.
      
        When only a single namespace type is specified the semantics are
        equivalent to passing an nsfd. That means setns(nsfd, CLONE_NEWNET)
        equals setns(pidfd, CLONE_NEWNET).
      
        However, when a pidfd is passed, multiple namespace flags can be
        specified in the second setns() argument and setns() will attach the
        caller to all the specified namespaces all at once or to none of them.
      
        Specifying 0 is not valid together with a pidfd. Here are just two
        obvious examples:
      
          setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET);
          setns(pidfd, CLONE_NEWUSER);
      
        Allowing to also attach subsets of namespaces supports various
        use-cases where callers setns to a subset of namespaces to retain
        privilege, perform an action and then re-attach another subset of
        namespaces.
      
        Apart from significantly reducing the number of syscalls needed to
        attach to all currently supported namespaces (eight "open+setns"
        sequences vs just a single "setns()"), this also allows atomic setns
        to a set of namespaces, i.e. either attaching to all namespaces
        succeeds or we fail without having changed anything.
      
        This is centered around a new internal struct nsset which holds all
        information necessary for a task to switch to a new set of namespaces
        atomically. Fwiw, with this change a pidfd becomes the only token
        needed to interact with a container. I'm expecting this to be
        picked-up by util-linux for nsenter rather soon.
      
        Associated with this change is a shiny new test-suite dedicated to
        setns() (for pidfds and nsfds alike)"
      
      * tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        selftests/pidfd: add pidfd setns tests
        nsproxy: attach to namespaces via pidfds
        nsproxy: add struct nsset
      e7c93cbf
    • Linus Torvalds's avatar
      Merge tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d479c5a1
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "The changes in this cycle are:
      
         - Optimize the task wakeup CPU selection logic, to improve
           scalability and reduce wakeup latency spikes
      
         - PELT enhancements
      
         - CFS bandwidth handling fixes
      
         - Optimize the wakeup path by remove rq->wake_list and replacing it
           with ->ttwu_pending
      
         - Optimize IPI cross-calls by making flush_smp_call_function_queue()
           process sync callbacks first.
      
         - Misc fixes and enhancements"
      
      * tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
        irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
        sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
        sched: Replace rq::wake_list
        sched: Add rq::ttwu_pending
        irq_work, smp: Allow irq_work on call_single_queue
        smp: Optimize send_call_function_single_ipi()
        smp: Move irq_work_run() out of flush_smp_call_function_queue()
        smp: Optimize flush_smp_call_function_queue()
        sched: Fix smp_call_function_single_async() usage for ILB
        sched/core: Offload wakee task activation if it the wakee is descheduling
        sched/core: Optimize ttwu() spinning on p->on_cpu
        sched: Defend cfs and rt bandwidth quota against overflow
        sched/cpuacct: Fix charge cpuacct.usage_sys
        sched/fair: Replace zero-length array with flexible-array
        sched/pelt: Sync util/runnable_sum with PELT window when propagating
        sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
        sched/fair: Optimize enqueue_task_fair()
        sched: Make scheduler_ipi inline
        sched: Clean up scheduler_ipi()
        sched/core: Simplify sched_init()
        ...
      d479c5a1
    • Linus Torvalds's avatar
      Merge tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f6aee505
      Linus Torvalds authored
      Pull x86 timer updates from Thomas Gleixner:
       "X86 timer specific updates:
      
         - Add TPAUSE based delay which allows the CPU to enter an optimized
           power state while waiting for the delay to pass. The delay is based
           on TSC cycles.
      
         - Add tsc_early_khz command line parameter to workaround the problem
           that overclocked CPUs can report the wrong frequency via CPUID.16h
           which causes the refined calibration to fail because the delta to
           the initial frequency value is too big. With the parameter users
           can provide an halfways accurate initial value"
      
      * tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tsc: Add tsc_early_khz command line parameter
        x86/delay: Introduce TPAUSE delay
        x86/delay: Refactor delay_mwaitx() for TPAUSE support
        x86/delay: Preparatory code cleanup
      f6aee505
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dabc4df2
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "The truly boring timer and clocksource updates for 5.8:
      
         - Not a single new clocksource or clockevent driver!
      
         - Device tree updates for various chips
      
         - Fixes and improvements and cleanups all over the place"
      
      * tag 'timers-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
        dt-bindings: timer: Add renesas,em-sti bindings
        clocksource/drivers/timer-versatile: Clear OF_POPULATED flag
        clocksource: mips-gic-timer: Mark GIC timer as unstable if ref clock changes
        clocksource: mips-gic-timer: Register as sched_clock
        clocksource: dw_apb_timer_of: Fix missing clockevent timers
        clocksource: dw_apb_timer: Affiliate of-based timer with any CPU
        clocksource: dw_apb_timer: Make CPU-affiliation being optional
        dt-bindings: timer: Move snps,dw-apb-timer DT schema from rtc
        dt-bindings: rtc: Convert snps,dw-apb-timer to DT schema
        clocksource/drivers/timer-ti-dm: Do one override clock parent in prepare()
        clocksource/drivers/timer-ti-dm: Fix spelling mistake "detectt" -> "detect"
        clocksource/drivers/timer-ti-dm: Fix warning for set but not used
        clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support
        clocksource/drivers/timer-ti-32k: Add support for initializing directly
        drivers/clocksource/arm_arch_timer: Remove duplicate error message
        clocksource/drivers/arc_timer: Remove duplicate error message
        clocksource/drivers/rda: drop redundant Kconfig dependency
        clocksource/drivers/timer-ti-dm: Fix warning for set but not used
        clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support
        clocksource/drivers/timer-ti-32k: Add support for initializing directly
        ...
      dabc4df2
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f6606d0c
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "The generic interrupt departement provides:
      
         - Cleanup of the irq_domain API
      
         - Overhaul of the interrupt chip simulator
      
         - The usual pile of new interrupt chip drivers
      
         - Cleanups, improvements and fixes all over the place"
      
      * tag 'irq-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
        irqchip: Fix "Loongson HyperTransport Vector support" driver build on all non-MIPS platforms
        dt-bindings: interrupt-controller: Add Loongson PCH MSI
        irqchip: Add Loongson PCH MSI controller
        dt-bindings: interrupt-controller: Add Loongson PCH PIC
        irqchip: Add Loongson PCH PIC controller
        dt-bindings: interrupt-controller: Add Loongson HTVEC
        irqchip: Add Loongson HyperTransport Vector support
        genirq: Check irq_data_get_irq_chip() return value before use
        irqchip/sifive-plic: Improve boot prints for multiple PLIC instances
        irqchip/sifive-plic: Setup cpuhp once after boot CPU handler is present
        irqchip/sifive-plic: Set default irq affinity in plic_irqdomain_map()
        irqchip/gic-v2, v3: Drop extra IRQ_NOAUTOEN setting for (E)PPIs
        irqdomain: Allow software nodes for IRQ domain creation
        irqdomain: Get rid of special treatment for ACPI in __irq_domain_add()
        irqdomain: Make __irq_domain_add() less OF-dependent
        iio: dummy_evgen: Fix use after free on error in iio_dummy_evgen_create()
        irqchip/gic-v3-its: Balance initial LPI affinity across CPUs
        irqchip/gic-v3-its: Track LPI distribution on a per CPU basis
        genirq/irq_sim: Simplify the API
        irqdomain: Make irq_domain_reset_irq_data() available to  non-hierarchical users
        ...
      f6606d0c
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · d6f9469a
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "The most interesting part is the new mount api conversion, which is
        actually a old patch already pending for several cycles. And the
        others are recent trivial cleanups here.
      
        Summary:
      
         - Convert to use the new mount apis
      
         - Some random cleanup patches"
      
      * tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: suppress false positive last_block warning
        erofs: convert to use the new mount fs_context api
        erofs: code cleanup by removing ifdef macro surrounding
      d6f9469a
    • Linus Torvalds's avatar
      Merge tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggy · cadf3223
      Linus Torvalds authored
      Pull JFS update from David Kleikamp:
       "Replace zero-length array in JFS"
      
      * tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggy:
        jfs: Replace zero-length array with flexible-array member
      cadf3223
    • Linus Torvalds's avatar
      Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · f3cdc8ae
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "Highlights:
      
         - speedup dead root detection during orphan cleanup, eg. when there
           are many deleted subvolumes waiting to be cleaned, the trees are
           now looked up in radix tree instead of a O(N^2) search
      
         - snapshot creation with inherited qgroup will mark the qgroup
           inconsistent, requires a rescan
      
         - send will emit file capabilities after chown, this produces a
           stream that does not need postprocessing to set the capabilities
           again
      
         - direct io ported to iomap infrastructure, cleaned up and simplified
           code, notably removing last use of struct buffer_head in btrfs code
      
        Core changes:
      
         - factor out backreference iteration, to be used by ordinary
           backreferences and relocation code
      
         - improved global block reserve utilization
            * better logic to serialize requests
            * increased maximum available for unlink
            * improved handling on large pages (64K)
      
         - direct io cleanups and fixes
            * simplify layering, where cloned bios were unnecessarily created
              for some cases
            * error handling fixes (submit, endio)
            * remove repair worker thread, used to avoid deadlocks during
              repair
      
         - refactored block group reading code, preparatory work for new type
           of block group storage that should improve mount time on large
           filesystems
      
        Cleanups:
      
         - cleaned up (and slightly sped up) set/get helpers for metadata data
           structure members
      
         - root bit REF_COWS got renamed to SHAREABLE to reflect the that the
           blocks of the tree get shared either among subvolumes or with the
           relocation trees
      
        Fixes:
      
         - when subvolume deletion fails due to ENOSPC, the filesystem is not
           turned read-only
      
         - device scan deals with devices from other filesystems that changed
           ownership due to overwrite (mkfs)
      
         - fix a race between scrub and block group removal/allocation
      
         - fix long standing bug of a runaway balance operation, printing the
           same line to the syslog, caused by a stale status bit on a reloc
           tree that prevented progress
      
         - fix corrupt log due to concurrent fsync of inodes with shared
           extents
      
         - fix space underflow for NODATACOW and buffered writes when it for
           some reason needs to fallback to COW mode"
      
      * tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
        btrfs: fix space_info bytes_may_use underflow during space cache writeout
        btrfs: fix space_info bytes_may_use underflow after nocow buffered write
        btrfs: fix wrong file range cleanup after an error filling dealloc range
        btrfs: remove redundant local variable in read_block_for_search
        btrfs: open code key_search
        btrfs: split btrfs_direct_IO to read and write part
        btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
        fs: remove dio_end_io()
        btrfs: switch to iomap_dio_rw() for dio
        iomap: remove lockdep_assert_held()
        iomap: add a filesystem hook for direct I/O bio submission
        fs: export generic_file_buffered_read()
        btrfs: turn space cache writeout failure messages into debug messages
        btrfs: include error on messages about failure to write space/inode caches
        btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
        btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
        btrfs: make checksum item extension more efficient
        btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
        btrfs: unexport btrfs_compress_set_level()
        btrfs: simplify iget helpers
        ...
      f3cdc8ae
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 8eeae5ba
      Linus Torvalds authored
      Pull DAX updates part two from Darrick Wong:
       "This time around, we're hoisting the DONTCACHE flag from XFS into the
        VFS so that we can make the incore DAX mode changes become effective
        sooner.
      
        We can't change the file data access mode on a live inode because we
        don't have a safe way to change the file ops pointers. The incore
        state change becomes effective at inode loading time, which can happen
        if the inode is evicted. Therefore, we're making it so that
        filesystems can ask the VFS to evict the inode as soon as the last
        holder drops.
      
        The per-fs changes to make this call this will be in subsequent pull
        requests from Ted and myself.
      
        Summary:
      
         - Introduce DONTCACHE flags for dentries and inodes. This hint will
           cause the VFS to drop the associated objects immediately after the
           last put, so that we can change the file access mode (DAX or page
           cache) on the fly"
      
      * tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fs: Introduce DCACHE_DONTCACHE
        fs: Lift XFS_IDONTCACHE to the VFS layer
      8eeae5ba
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 96ed320d
      Linus Torvalds authored
      Pull DAX updates part one from Darrick Wong:
       "After many years of LKML-wrangling about how to enable programs to
        query and influence the file data access mode (DAX) when a filesystem
        resides on storage devices such as persistent memory, Ira Weiny has
        emerged with a proposed set of standard behaviors that has not been
        shot down by anyone! We're more or less standardizing on the current
        XFS behavior and adapting ext4 to do the same.
      
        This is the first of a handful pull requests that will make ext4 and
        XFS present a consistent interface for user programs that care about
        DAX. We add a statx attribute that programs can check to see if DAX is
        enabled on a particular file. Then, we update the DAX documentation to
        spell out the user-visible behaviors that filesystems will guarantee
        (until the next storage industry shakeup). The on-disk inode flag has
        been in XFS for a few years now.
      
        Summary:
      
         - Clean up io_is_direct.
      
         - Add a new statx flag to indicate when file data access is being
           done via DAX (as opposed to the page cache).
      
         - Update the documentation for how system administrators and
           application programmers can take advantage of the (still
           experimental DAX) feature"
      
      Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/
      
      * tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        Documentation/dax: Update Usage section
        fs/stat: Define DAX statx attribute
        fs: Remove unneeded IS_DAX() check in io_is_direct()
      96ed320d
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 16d91548
      Linus Torvalds authored
      Pull xfs updates from Darrick Wong:
       "Most of the changes this cycle are refactoring of existing code in
        preparation for things landing in the future.
      
        We also fixed various problems and deficiencies in the quota
        implementation, and (I hope) the last of the stale read vectors by
        forcing write allocations to go through the unwritten state until the
        write completes.
      
        Summary:
      
         - Various cleanups to remove dead code, unnecessary conditionals,
           asserts, etc.
      
         - Fix a linker warning caused by xfs stuffing '-g' into CFLAGS
           redundantly.
      
         - Tighten up our dmesg logging to ensure that everything is prefixed
           with 'XFS' for easier grepping.
      
         - Kill a bunch of typedefs.
      
         - Refactor the deferred ops code to reduce indirect function calls.
      
         - Increase type-safety with the deferred ops code.
      
         - Make the DAX mount options a tri-state.
      
         - Fix some error handling problems in the inode flush code and clean
           up other inode flush warts.
      
         - Refactor log recovery so that each log item recovery functions now
           live with the other log item processing code.
      
         - Fix some SPDX forms.
      
         - Fix quota counter corruption if the fs crashes after running
           quotacheck but before any dquots get logged.
      
         - Don't fail metadata verification on zero-entry attr leaf blocks,
           since they're just part of the disk format now due to a historic
           lack of log atomicity.
      
         - Don't allow SWAPEXT between files with different [ugp]id when
           quotas are enabled.
      
         - Refactor inode fork reading and verification to run directly from
           the inode-from-disk function. This means that we now actually
           guarantee that _iget'ted inodes are totally verified and ready to
           go.
      
         - Move the incore inode fork format and extent counts to the ifork
           structure.
      
         - Scalability improvements by reducing cacheline pingponging in
           struct xfs_mount.
      
         - More scalability improvements by removing m_active_trans from the
           hot path.
      
         - Fix inode counter update sanity checking to run /only/ on debug
           kernels.
      
         - Fix longstanding inconsistency in what error code we return when a
           program hits project quota limits (ENOSPC).
      
         - Fix group quota returning the wrong error code when a program hits
           group quota limits.
      
         - Fix per-type quota limits and grace periods for group and project
           quotas so that they actually work.
      
         - Allow extension of individual grace periods.
      
         - Refactor the non-reclaim inode radix tree walking code to remove a
           bunch of stupid little functions and straighten out the
           inconsistent naming schemes.
      
         - Fix a bug in speculative preallocation where we measured a new
           allocation based on the last extent mapping in the file instead of
           looking farther for the last contiguous space allocation.
      
         - Force delalloc writes to unwritten extents. This closes a stale
           disk contents exposure vector if the system goes down before the
           write completes.
      
         - More lockdep whackamole"
      
      * tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (129 commits)
        xfs: more lockdep whackamole with kmem_alloc*
        xfs: force writes to delalloc regions to unwritten
        xfs: refactor xfs_iomap_prealloc_size
        xfs: measure all contiguous previous extents for prealloc size
        xfs: don't fail unwritten extent conversion on writeback due to edquot
        xfs: rearrange xfs_inode_walk_ag parameters
        xfs: straighten out all the naming around incore inode tree walks
        xfs: move xfs_inode_ag_iterator to be closer to the perag walking code
        xfs: use bool for done in xfs_inode_ag_walk
        xfs: fix inode ag walk predicate function return values
        xfs: refactor eofb matching into a single helper
        xfs: remove __xfs_icache_free_eofblocks
        xfs: remove flags argument from xfs_inode_ag_walk
        xfs: remove xfs_inode_ag_iterator_flags
        xfs: remove unused xfs_inode_ag_iterator function
        xfs: replace open-coded XFS_ICI_NO_TAG
        xfs: move eofblocks conversion function to xfs_ioctl.c
        xfs: allow individual quota grace period extension
        xfs: per-type quota timers and warn limits
        xfs: switch xfs_get_defquota to take explicit type
        ...
      16d91548
    • Linus Torvalds's avatar
      Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · d9afbb35
      Linus Torvalds authored
      Pull lockdown update from James Morris:
       "An update for the security subsystem to allow unprivileged users
        to see the status of the lockdown feature. From Jeremy Cline"
      
      Also an added comment to describe CAP_SETFCAP.
      
      * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        capabilities: add description for CAP_SETFCAP
        lockdown: Allow unprivileged users to see lockdown status
      d9afbb35
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · f41030a2
      Linus Torvalds authored
      Pull SELinux updates from Paul Moore:
       "The highlights:
      
         - A number of improvements to various SELinux internal data
           structures to help improve performance. We move the role
           transitions into a hash table. In the content structure we shift
           from hashing the content string (aka SELinux label) to the
           structure itself, when it is valid. This last change not only
           offers a speedup, but it helps us simplify the code some as well.
      
         - Add a new SELinux policy version which allows for a more space
           efficient way of storing the filename transitions in the binary
           policy. Given the default Fedora SELinux policy with the unconfined
           module enabled, this change drops the policy size from ~7.6MB to
           ~3.3MB. The kernel policy load time dropped as well.
      
         - Some fixes to the error handling code in the policy parser to
           properly return error codes when things go wrong"
      
      * tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: netlabel: Remove unused inline function
        selinux: do not allocate hashtabs dynamically
        selinux: fix return value on error in policydb_read()
        selinux: simplify range_write()
        selinux: fix error return code in policydb_read()
        selinux: don't produce incorrect filename_trans_count
        selinux: implement new format of filename transitions
        selinux: move context hashing under sidtab
        selinux: hash context structure directly
        selinux: store role transitions in a hash table
        selinux: drop unnecessary smp_load_acquire() call
        selinux: fix warning Comparison to bool
      f41030a2
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 9d99b164
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "Summary of the significant patches:
      
         - Record information about binds/unbinds to the audit multicast
           socket. This helps identify which processes have/had access to the
           information in the audit stream.
      
         - Cleanup and add some additional information to the netfilter
           configuration events collected by audit.
      
         - Fix some of the audit error handling code so we don't leak network
           namespace references"
      
      * tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: add subj creds to NETFILTER_CFG record to
        audit: Replace zero-length array with flexible-array
        audit: make symbol 'audit_nfcfgs' static
        netfilter: add audit table unregister actions
        audit: tidy and extend netfilter_cfg x_tables
        audit: log audit netlink multicast bind and unbind
        audit: fix a net reference leak in audit_list_rules_send()
        audit: fix a net reference leak in audit_send_reply()
      9d99b164
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 · 91681e84
      Linus Torvalds authored
      Pull tomoyo update from Tetsuo Handa:
       "One patch for suppressing coccicheck's warning"
      
      * tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
        tomoyo: use true for bool variable
      91681e84
  3. 02 Jun, 2020 3 commits
    • Stefan Hajnoczi's avatar
      capabilities: add description for CAP_SETFCAP · 56f2e3b7
      Stefan Hajnoczi authored
      Document the purpose of CAP_SETFCAP.  For some reason this capability
      had no description while the others did.
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      56f2e3b7
    • Thadeu Lima de Souza Cascardo's avatar
      selftests: net: ip_defrag: ignore EPERM · 065fcfd4
      Thadeu Lima de Souza Cascardo authored
      When running with conntrack rules, the dropped overlap fragments may cause
      EPERM to be returned to sendto. Instead of completely failing, just ignore
      those errors and continue. If this causes packets with overlap fragments to
      be dropped as expected, that is okay. And if it causes packets that are
      expected to be received to be dropped, which should not happen, it will be
      detected as failure.
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      065fcfd4
    • Linus Torvalds's avatar
      Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block · 1ee08de1
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
       "A relatively quiet round, mostly just fixes and code improvements. In
      particular:
      
         - Make statx just use the generic statx handler, instead of open
           coding it. We don't need that anymore, as we always call it async
           safe (Bijan)
      
         - Enable closing of the ring itself. Also fixes O_PATH closure (me)
      
         - Properly name completion members (me)
      
         - Batch reap of dead file registrations (me)
      
         - Allow IORING_OP_POLL with double waitqueues (me)
      
         - Add tee(2) support (Pavel)
      
         - Remove double off read (Pavel)
      
         - Fix overflow cancellations (Pavel)
      
         - Improve CQ timeouts (Pavel)
      
         - Async defer drain fixes (Pavel)
      
         - Add support for enabling/disabling notifications on a registered
           eventfd (Stefano)
      
         - Remove dead state parameter (Xiaoguang)
      
         - Disable SQPOLL submit on dying ctx (Xiaoguang)
      
         - Various code cleanups"
      
      * tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
        io_uring: fix overflowed reqs cancellation
        io_uring: off timeouts based only on completions
        io_uring: move timeouts flushing to a helper
        statx: hide interfaces no longer used by io_uring
        io_uring: call statx directly
        statx: allow system call to be invoked from io_uring
        io_uring: add io_statx structure
        io_uring: get rid of manual punting in io_close
        io_uring: separate DRAIN flushing into a cold path
        io_uring: don't re-read sqe->off in timeout_prep()
        io_uring: simplify io_timeout locking
        io_uring: fix flush req->refs underflow
        io_uring: don't submit sqes when ctx->refs is dying
        io_uring: async task poll trigger cleanup
        io_uring: add tee(2) support
        splice: export do_tee()
        io_uring: don't repeat valid flag list
        io_uring: rename io_file_put()
        io_uring: remove req->needs_fixed_files
        io_uring: cleanup io_poll_remove_one() logic
        ...
      1ee08de1