1. 03 May, 2019 1 commit
  2. 02 May, 2019 7 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ea986679
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Out of bounds access in xfrm IPSEC policy unlink, from Yue Haibing.
      
       2) Missing length check for esp4 UDP encap, from Sabrina Dubroca.
      
       3) Fix byte order of RX STBC access in mac80211, from Johannes Berg.
      
       4) Inifnite loop in bpftool map create, from Alban Crequy.
      
       5) Register mark fix in ebpf verifier after pkt/null checks, from Paul
          Chaignon.
      
       6) Properly use rcu_dereference_sk_user_data in L2TP code, from Eric
          Dumazet.
      
       7) Buffer overrun in marvell phy driver, from Andrew Lunn.
      
       8) Several crash and statistics handling fixes to bnxt_en driver, from
          Michael Chan and Vasundhara Volam.
      
       9) Several fixes to the TLS layer from Jakub Kicinski (copying negative
          amounts of data in reencrypt, reencrypt frag copying, blind nskb->sk
          NULL deref, etc).
      
      10) Several UDP GRO fixes, from Paolo Abeni and Eric Dumazet.
      
      11) PID/UID checks on ipv6 flow labels are inverted, from Willem de
          Bruijn.
      
      12) Use after free in l2tp, from Eric Dumazet.
      
      13) IPV6 route destroy races, also from Eric Dumazet.
      
      14) SCTP state machine can erroneously run recursively, fix from Xin
          Long.
      
      15) Adjust AF_PACKET msg_name length checks, add padding bytes if
          necessary. From Willem de Bruijn.
      
      16) Preserve skb_iif, so that forwarded packets have consistent values
          even if fragmentation is involved. From Shmulik Ladkani.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
        udp: fix GRO packet of death
        ipv6: A few fixes on dereferencing rt->from
        rds: ib: force endiannes annotation
        selftests: fib_rule_tests: print the result and return 1 if any tests failed
        ipv4: ip_do_fragment: Preserve skb_iif during fragmentation
        net/tls: avoid NULL pointer deref on nskb->sk in fallback
        selftests: fib_rule_tests: Fix icmp proto with ipv6
        packet: validate msg_namelen in send directly
        packet: in recvmsg msg_name return at least sizeof sockaddr_ll
        sctp: avoid running the sctp state machine recursively
        stmmac: pci: Fix typo in IOT2000 comment
        Documentation: fix netdev-FAQ.rst markup warning
        ipv6: fix races in ip6_dst_destroy()
        l2ip: fix possible use-after-free
        appletalk: Set error code if register_snap_client failed
        net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc
        rxrpc: Fix net namespace cleanup
        ipv6/flowlabel: wait rcu grace period before put_pid()
        vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach
        tcp: add sanity tests in tcp_add_backlog()
        ...
      ea986679
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-block · 5ce3307b
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "This is mostly io_uring fixes/tweaks. Most of these were actually done
        in time for the last -rc, but I wanted to ensure that everything
        tested out great before including them. The code delta looks larger
        than it really is, as it's mostly just comment additions/changes.
      
        Outside of the comment additions/changes, this is mostly removal of
        unnecessary barriers. In all, this pull request contains:
      
         - Tweak to how we handle errors at submission time. We now post a
           completion event if the error occurs on behalf of an sqe, instead
           of returning it through the system call. If the error happens
           outside of a specific sqe, we return the error through the system
           call. This makes it nicer to use and makes the "normal" use case
           behave the same as the offload cases. (me)
      
         - Fix for a missing req reference drop from async context (me)
      
         - If an sqe is submitted with RWF_NOWAIT, don't punt it to async
           context. Return -EAGAIN directly, instead of using it as a hint to
           do async punt. (Stefan)
      
         - Fix notes on barriers (Stefan)
      
         - Remove unnecessary barriers (Stefan)
      
         - Fix potential double free of memory in setup error (Mark)
      
         - Further improve sq poll CPU validation (Mark)
      
         - Fix page allocation warning and leak on buffer registration error
           (Mark)
      
         - Fix iov_iter_type() for new no-ref flag (Ming)
      
         - Fix a case where dio doesn't honor bio no-page-ref (Ming)"
      
      * tag 'for-linus-20190502' of git://git.kernel.dk/linux-block:
        io_uring: avoid page allocation warnings
        iov_iter: fix iov_iter_type
        block: fix handling for BIO_NO_PAGE_REF
        io_uring: drop req submit reference always in async punt
        io_uring: free allocated io_memory once
        io_uring: fix SQPOLL cpu validation
        io_uring: have submission side sqe errors post a cqe
        io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP
        io_uring: remove unnecessary barrier after incrementing dropped counter
        io_uring: remove unnecessary barrier before reading SQ tail
        io_uring: remove unnecessary barrier after updating SQ head
        io_uring: remove unnecessary barrier before reading cq head
        io_uring: remove unnecessary barrier before wq_has_sleeper
        io_uring: fix notes on barriers
        io_uring: fix handling SQEs requesting NOWAIT
      5ce3307b
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b7a5b22b
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "I apologize for sending these so late in the cycle. We went back and
        forth about how to deal with the unexpected logging of intentional
        link state changes and finally decided to just config them off by
        default.
      
        PCI fixes:
      
         - Stop ignoring "pci=disable_acs_redir" parameter (Logan Gunthorpe)
      
         - Use shared MSI/MSI-X vector for Link Bandwidth Management (Alex
           Williamson)
      
         - Add Kconfig option for Link Bandwidth notification messages (Keith
           Busch)"
      
      * tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/LINK: Add Kconfig option (default off)
        PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management
        PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
      b7a5b22b
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · e2a4b102
      Linus Torvalds authored
      Pull MTD fix from Richard Weinberger:
       "A single regression fix for the marvell nand driver"
      
      * tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: rawnand: marvell: Clean the controller state before each operation
      e2a4b102
    • Keith Busch's avatar
      PCI/LINK: Add Kconfig option (default off) · 2078e1e7
      Keith Busch authored
      e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth
      notification") added dmesg logging whenever a link changes speed or width
      to a state that is considered degraded.  Unfortunately, it cannot
      differentiate signal integrity-related link changes from those
      intentionally initiated by an endpoint driver, including drivers that may
      live in userspace or VMs when making use of vfio-pci.  Some GPU drivers
      actively manage the link state to save power, which generates a stream of
      messages like this:
      
        vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link)
      
      Since we can't distinguish the intentional changes from the signal
      integrity issues, leave the reporting turned off by default.  Add a Kconfig
      option to turn it on if desired.
      
      Fixes: e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth notification")
      Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.comSigned-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      2078e1e7
    • Esben Haabendal's avatar
      net: ll_temac: Fix typo bug for 32-bit · 26f146ed
      Esben Haabendal authored
      Fixes: d84aec42 ("net: ll_temac: Fix support for 64-bit platforms")
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26f146ed
    • Eric Dumazet's avatar
      udp: fix GRO packet of death · 4dd2b82d
      Eric Dumazet authored
      syzbot was able to crash host by sending UDP packets with a 0 payload.
      
      TCP does not have this issue since we do not aggregate packets without
      payload.
      
      Since dev_gro_receive() sets gso_size based on skb_gro_len(skb)
      it seems not worth trying to cope with padded packets.
      
      BUG: KASAN: slab-out-of-bounds in skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
      Read of size 16 at addr ffff88808893fff0 by task syz-executor612/7889
      
      CPU: 0 PID: 7889 Comm: syz-executor612 Not tainted 5.1.0-rc7+ #96
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       __asan_report_load16_noabort+0x14/0x20 mm/kasan/generic_report.c:133
       skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
       udp_gro_receive_segment net/ipv4/udp_offload.c:382 [inline]
       call_gro_receive include/linux/netdevice.h:2349 [inline]
       udp_gro_receive+0xb61/0xfd0 net/ipv4/udp_offload.c:414
       udp4_gro_receive+0x763/0xeb0 net/ipv4/udp_offload.c:478
       inet_gro_receive+0xe72/0x1110 net/ipv4/af_inet.c:1510
       dev_gro_receive+0x1cd0/0x23c0 net/core/dev.c:5581
       napi_gro_frags+0x36b/0xd10 net/core/dev.c:5843
       tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
       call_write_iter include/linux/fs.h:1866 [inline]
       do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
       do_iter_write fs/read_write.c:957 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:938
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
       do_writev+0x15e/0x370 fs/read_write.c:1037
       __do_sys_writev fs/read_write.c:1110 [inline]
       __se_sys_writev fs/read_write.c:1107 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441cc0
      Code: 05 48 3d 01 f0 ff ff 0f 83 9d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d 51 93 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 74 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
      RSP: 002b:00007ffe8c716118 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 00007ffe8c716150 RCX: 0000000000441cc0
      RDX: 0000000000000001 RSI: 00007ffe8c716170 RDI: 00000000000000f0
      RBP: 0000000000000000 R08: 000000000000ffff R09: 0000000000a64668
      R10: 0000000020000040 R11: 0000000000000246 R12: 000000000000c2d9
      R13: 0000000000402b50 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 5143:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc mm/kasan/common.c:497 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc mm/slab.c:3393 [inline]
       kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
       mm_alloc+0x1d/0xd0 kernel/fork.c:1030
       bprm_mm_init fs/exec.c:363 [inline]
       __do_execve_file.isra.0+0xaa3/0x23f0 fs/exec.c:1791
       do_execveat_common fs/exec.c:1865 [inline]
       do_execve fs/exec.c:1882 [inline]
       __do_sys_execve fs/exec.c:1958 [inline]
       __se_sys_execve fs/exec.c:1953 [inline]
       __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5351:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
       __cache_free mm/slab.c:3499 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3765
       __mmdrop+0x238/0x320 kernel/fork.c:677
       mmdrop include/linux/sched/mm.h:49 [inline]
       finish_task_switch+0x47b/0x780 kernel/sched/core.c:2746
       context_switch kernel/sched/core.c:2880 [inline]
       __schedule+0x81b/0x1cc0 kernel/sched/core.c:3518
       preempt_schedule_irq+0xb5/0x140 kernel/sched/core.c:3745
       retint_kernel+0x1b/0x2d
       arch_local_irq_restore arch/x86/include/asm/paravirt.h:767 [inline]
       kmem_cache_free+0xab/0x260 mm/slab.c:3766
       anon_vma_chain_free mm/rmap.c:134 [inline]
       unlink_anon_vmas+0x2ba/0x870 mm/rmap.c:401
       free_pgtables+0x1af/0x2f0 mm/memory.c:394
       exit_mmap+0x2d1/0x530 mm/mmap.c:3144
       __mmput kernel/fork.c:1046 [inline]
       mmput+0x15f/0x4c0 kernel/fork.c:1067
       exec_mmap fs/exec.c:1046 [inline]
       flush_old_exec+0x8d9/0x1c20 fs/exec.c:1279
       load_elf_binary+0x9bc/0x53f0 fs/binfmt_elf.c:864
       search_binary_handler fs/exec.c:1656 [inline]
       search_binary_handler+0x17f/0x570 fs/exec.c:1634
       exec_binprm fs/exec.c:1698 [inline]
       __do_execve_file.isra.0+0x1394/0x23f0 fs/exec.c:1818
       do_execveat_common fs/exec.c:1865 [inline]
       do_execve fs/exec.c:1882 [inline]
       __do_sys_execve fs/exec.c:1958 [inline]
       __se_sys_execve fs/exec.c:1953 [inline]
       __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff88808893f7c0
       which belongs to the cache mm_struct of size 1496
      The buggy address is located 600 bytes to the right of
       1496-byte region [ffff88808893f7c0, ffff88808893fd98)
      The buggy address belongs to the page:
      page:ffffea0002224f80 count:1 mapcount:0 mapping:ffff88821bc40ac0 index:0xffff88808893f7c0 compound_mapcount: 0
      flags: 0x1fffc0000010200(slab|head)
      raw: 01fffc0000010200 ffffea00025b4f08 ffffea00027b9d08 ffff88821bc40ac0
      raw: ffff88808893f7c0 ffff88808893e440 0000000100000001 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88808893fe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88808893ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88808893ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                                   ^
       ffff888088940000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff888088940080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fixes: e20cf8d3 ("udp: implement GRO for plain UDP sockets.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dd2b82d
  3. 01 May, 2019 32 commits
    • Linus Torvalds's avatar
      Merge tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · 600d7258
      Linus Torvalds authored
      Pull power supply fixes from Sebastian Reichel:
       "Two more fixes for the 5.1 cycle.
      
        One division by zero fix in a specific driver and one core workaround
        for bad userspace behaviour from systemd regarding uevents. IMHO this
        can be considered to be a userspace bug, but the debug messages are
        useless anyways
      
         - cpcap-battery: fix a division by zero
      
         - core: fix systemd issue due to log messages produced by uevent"
      
      * tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
        power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
        power: supply: cpcap-battery: Fix division by zero
      600d7258
    • Martin KaFai Lau's avatar
      ipv6: A few fixes on dereferencing rt->from · 886b7a50
      Martin KaFai Lau authored
      It is a followup after the fix in
      commit 9c69a132 ("route: Avoid crash from dereferencing NULL rt->from")
      
      rt6_do_redirect():
      1. NULL checking is needed on rt->from because a parallel
         fib6_info delete could happen that sets rt->from to NULL.
         (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
      
      2. fib6_info_hold() is not enough.  Same reason as (1).
         Meaning, holding dst->__refcnt cannot ensure
         rt->from is not NULL or rt->from->fib6_ref is not 0.
      
         Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
         is already doing, this patch chooses to extend the rcu section
         to keep "from" dereference-able after checking for NULL.
      
      inet6_rtm_getroute():
      1. NULL checking is also needed on rt->from for a similar reason.
         Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      886b7a50
    • Nicholas Mc Guire's avatar
      rds: ib: force endiannes annotation · f3505745
      Nicholas Mc Guire authored
      While the endiannes is being handled correctly as indicated by the comment
      above the offending line - sparse was unhappy with the missing annotation
      as be64_to_cpu() expects a __be64 argument. To mitigate this annotation
      all involved variables are changed to a consistent __le64 and the
       conversion to uint64_t delayed to the call to rds_cong_map_updated().
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3505745
    • David S. Miller's avatar
      Merge branch 'net-mvpp2-cls-Add-classification' · f76c4b57
      David S. Miller authored
      Maxime Chevallier says:
      
      ====================
      net: mvpp2: cls: Add classification
      
      This series is a rework of the previously standalone patch adding
      classification support for mvpp2 :
      
      https://lore.kernel.org/netdev/20190423075031.26074-1-maxime.chevallier@bootlin.com/
      
      This patch has been reworked according to Saeed's review, to make sure
      that the location of the rule is always respected and serves as a way to
      prioritize rules between each other. This the 3rd iteration of this
      submission, but since it's now a series, I reset the revision numbering.
      
      This series implements that in a limited configuration for now, since we
      limit the total number of rules per port to 4.
      
      The main factors for this limitation are that :
       - We share the classification tables between all ports (4 max, although
         one is only used for internal loopback), hence we have to perform a
         logical separation between rules, which is done today by dedicated
         ranges for each port in each table
      
       - The "Flow table", which dictates which lookups operations are
         performed for an ingress packet, in subdivided into 22 "sub flows",
         each corresponding to a traffic type based on the L3 proto, L4
         proto, the presence or not of a VLAN tag and the L3 fragmentation.
      
         This makes so that when adding a rule, it has to be added into each
         of these subflows, introducing duplications of entries and limiting
         our max number of entries.
      
      These limitations can be overcomed in several ways, but for readability
      sake, I'd rather submit basic classification offload support for now,
      and improve it gradually.
      
      This series also adds a small cosmetic cleanup patch (1), and also adds
      support for the "Drop" action compared to the first submission of this
      feature. It is simple enough to be added with this basic support.
      
      Compared to the first submissions, the NETIF_F_NTUPLE flag was also
      removed, following Saeed's comment.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f76c4b57
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Allow dropping packets with classification offload · bec2d46d
      Maxime Chevallier authored
      This commit introduces support for the "Drop" action in classification
      offload. This corresponds to the "-1" action with ethtool -N.
      
      This is achieved using the color marking actions available in the C2
      engine, which associate a color to a packet. These colors can be either
      Green, Yellow or Red, Red meaning that the packet should be dropped.
      
      Green and Yellow colors are interpreted by the Policer, which isn't
      supported yet.
      
      This method of dropping using the Classifier is different than the
      already existing early-drop features, such as VLAN filtering and MAC
      UC/MC filtering, which are performed during the Parsing step, and
      therefore take precedence over classification actions.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bec2d46d
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Add Classification offload support · 90b509b3
      Maxime Chevallier authored
      This commit introduces basic classification offloading support for the
      PPv2 controller.
      
      The PPv2 classifier has many classification engines, for now we only use
      the C2 TCAM match engine.
      
      This engine allows to perform ternary lookups on 64 bits keys (called
      Header Extracted Key), that are built by extracting fields from the packet
      header and concatenating them. At most 4 fields can be extracted for a
      single lookup.
      
      This basic implementation allows to build the HEK from the following
      fields :
       - L4 source and destination ports (for UDP and TCP)
      
      More fields are to be added in the future.
      
      Classification flows are added through the ethtool interface, using the
      newly introduced flow_rule infrastructure as an internal rule
      representation, allowing to more easily implement tc flower rules if
      need be.
      
      The internal design for now allocates one range of 4 rules per port
      due to the internal design of the flow table, which uses 22 sub-flows.
      
      When inserting a classification rule, the rule is created in every
      relevant sub-flow.
      
      This low rule-count is a very simple design which reaches quickly the
      limitations of the flow table ordering, but guarantees that the rule
      ordering will always be respected.
      
      This commit only introduces support for the "steer to rxq" action.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90b509b3
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Use a bitfield to represent the flow_type · 84e90b0b
      Maxime Chevallier authored
      As of today, the classification code is used only for RSS. We split the
      incoming traffic into multiple flows, that correspond to the ethtool
      flow_type parameter.
      
      We don't want to use the ethtool flow definitions such as TCP_V4_FLOW,
      for several reason :
      
       - We want to decorrelate the driver code from ethtool as much as
         possible, so that we can easily use other interfaces such as tc flower,
      
       - We want the flow_type to be a bitfield, so that we can match flows
         embedded into each other, such as TCP4 which is a subset of IP4.
      
      This commit does the conversion to the newer type.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84e90b0b
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Remove extra whitespace in mvpp2_cls_flow_write · 6f16a465
      Maxime Chevallier authored
      Cosmetic patch removing extra whitespaces when writing the flow_table
      entries
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f16a465
    • Linus Torvalds's avatar
      Merge tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 65beea4c
      Linus Torvalds authored
      Pull ARC fixes from Vineet Gupta:
       "A few minor fixes for ARC.
      
         - regression in memset if line size !64
      
         - avoid panic if PAE and IOC"
      
      * tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: memset: fix build with L1_CACHE_SHIFT != 6
        ARC: [hsdk] Make it easier to add PAE40 region to DTB
        ARC: PAE40: don't panic and instead turn off hw ioc
      65beea4c
    • Alex Williamson's avatar
      PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management · 15d2aba7
      Alex Williamson authored
      The Interrupt Message Number in the PCIe Capabilities register (PCIe r4.0,
      sec 7.5.3.2) indicates which MSI/MSI-X vector is shared by interrupts
      related to the PCIe Capability, including Link Bandwidth Management and
      Link Autonomous Bandwidth Interrupts (Link Control, 7.5.3.7), Command
      Completed and Hot-Plug Interrupts (Slot Control, 7.5.3.10), and the PME
      Interrupt (Root Control, 7.5.3.12).
      
      pcie_message_numbers() checked whether we want to enable PME or Hot-Plug
      interrupts but neglected to check for Link Bandwidth Management, so if we
      only wanted the Bandwidth Management interrupts, it decided we didn't need
      any vectors at all.  Then pcie_port_enable_irq_vec() tried to reallocate
      zero vectors, which failed, resulting in fallback to INTx.
      
      On some systems, e.g., an X79-based workstation, that INTx seems broken or
      not handled correctly, so we got spurious IRQ16 interrupts for Bandwidth
      Management events.
      
      Change pcie_message_numbers() so that if we want Link Bandwidth Management
      interrupts, we use the shared MSI/MSI-X vector from the PCIe Capabilities
      register.
      
      Fixes: e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth notification")
      Link: https://lore.kernel.org/lkml/155597243666.19387.1205950870601742062.stgit@gimli.homeSigned-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      [bhelgaas: changelog]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      15d2aba7
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fb0af61d
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Revert a recent ACPICA change that caused initialization to fail on
        systems with Thunderbolt docking stations connected at the init time"
      
      * tag 'acpi-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "ACPICA: Clear status of GPEs before enabling them"
      fb0af61d
    • Linus Torvalds's avatar
      gcc-9: don't warn about uninitialized btrfs extent_type variable · 7e74e235
      Linus Torvalds authored
      The 'extent_type' variable does seem to be reliably initialized, but
      it's _very_ non-obvious, since there's a "goto next" case that jumps
      over the normal initialization.  That will then always trigger the
      "start >= extent_end" test, which will end up never falling through to
      the use of that variable.
      
      But the code is certainly not obvious, and the compiler warning looks
      reasonable.  Make 'extent_type' an int, and initialize it to an invalid
      negative value, which seems to be the common pattern in other places.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e74e235
    • David S. Miller's avatar
      Merge branch 'net-ll_temac-x86_64-support' · 2a369ae0
      David S. Miller authored
      Esben Haabendal says:
      
      ====================
      net: ll_temac: x86_64 support
      
      This patch series adds support for use of ll_temac driver with
      platform_data configuration and fixes endianess and 64-bit problems so
      that it can be used on x86_64 platform.
      
      A few bugfixes are also included.
      
      Changes since v2:
        - Fixed lp->indirect_mutex initialization regression for OF
          platforms introduced in v2
      
      Changes since v1:
        - Make indirect_mutex specification mandatory when using platform_data
        - Move header to include/linux/platform_data
        - Enable COMPILE_TEST for XILINX_LL_TEMAC
        - Rebased to v5.1-rc7
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a369ae0
    • Esben Haabendal's avatar
      net: ll_temac: Enable DMA when ready, not before · 73f7375d
      Esben Haabendal authored
      As soon as TAILDESCR_PTR is written, DMA transfers might start.
      Let's ensure we are ready to receive DMA IRQ's before doing that.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73f7375d
    • Esben Haabendal's avatar
      net: ll_temac: Allow configuration of IRQ coalescing · 7e97a194
      Esben Haabendal authored
      This allows custom setup of IRQ coalescing for platforms using legacy
      platform_device. The irq timeout and count parameters can be used for
      tuning cpu load vs. latency.
      
      I have maintained the 0x00000400 bit in TX_CHNL_CTRL.  It is specified as
      unused in the documentation I have available.  It does not make any
      difference in the hardware I have available, so it is left in to not risk
      breaking other platforms where it might be used.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e97a194
    • Esben Haabendal's avatar
      net: ll_temac: Replace bad usage of msleep() with usleep_range() · 901d14ab
      Esben Haabendal authored
      Use usleep_range() to avoid problems with msleep() actually sleeping
      much longer than expected.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      901d14ab
    • Esben Haabendal's avatar
      net: ll_temac: Fix bug causing buffer descriptor overrun · 2c9938e7
      Esben Haabendal authored
      As we are actually using a BD for both the skb and each frag contained in
      it, the oldest TX BD would be overwritten when there was exactly one BD
      less than needed.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c9938e7
    • Esben Haabendal's avatar
      net: ll_temac: Fix iommu/swiotlb leak · a8c9bd3b
      Esben Haabendal authored
      Unmap the actual buffer length, not the amount of data received, avoiding
      resource exhaustion of swiotlb (seen on x86_64 platform).
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8c9bd3b
    • Esben Haabendal's avatar
      net: ll_temac: Support indirect_mutex share within TEMAC IP · f14f5c11
      Esben Haabendal authored
      Indirect register access goes through a DCR bus bridge, which
      allows only one outstanding transaction.  And to make matters
      worse, each TEMAC IP block contains two Ethernet interfaces, and
      although they seem to have separate registers for indirect access,
      they actually share the registers.  Or to be more specific, MSW, LSW
      and CTL registers are physically shared between Ethernet interfaces
      in same TEMAC IP, with RDY register being (almost) specificic to
      the Ethernet interface.  The 0x10000 bit in RDY reflects combined
      bus ready state though.
      
      So we need to take care to synchronize not only within a single
      device, but also between devices in same TEMAC IP.
      
      This commit allows to do that with legacy platform devices.
      
      For OF devices, the xlnx,compound parent of the temac node should be
      used to find siblings, and setup a shared indirect_mutex between them.
      I will leave this work to somebody else, as I don't have hardware to
      test that.  No regression is introduced by that, as before this commit
      using two Ethernet interfaces in same TEMAC block is simply broken.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f14f5c11
    • Esben Haabendal's avatar
      net: ll_temac: Allow use on x86 platforms · 2c02c37e
      Esben Haabendal authored
      With little-endian and 64-bit support in place, the ll_temac driver can
      now be used on x86 and x86_64 platforms.
      
      And while at it, enable COMPILE_TEST also.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c02c37e
    • Esben Haabendal's avatar
      net: ll_temac: Fix support for little-endian platforms · fdd7454e
      Esben Haabendal authored
      Both TEMAC and SDMA is big-endian, so make sure that all values in SDMA
      buffer descriptors (cmdac_bd) are handled as big-endian, independent of the
      host endianness. With all currently supported platforms being big-endian,
      this change does not make a change for any of them.
      
      Note, when using app3 and app4 for piggybacking skb pointers there is no
      need to care about endianness, as neither TEMAC nor SDMA access app3 and
      app4 in TX buffer descriptors.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdd7454e
    • Esben Haabendal's avatar
      net: ll_temac: Add support for non-native register endianness · a3246dc4
      Esben Haabendal authored
      Replace the powerpc specific MMIO register access functions with the
      generic big-endian mmio access functions, and add support for
      little-endian access depending on configuration.
      
      Big-endian access is maintained as the default, but little-endian can
      be configured in device-tree binding or in platform data.
      
      The temac_ior()/temac_iow() functions are replaced with macro wrappers
      to avoid modifying existing code more than necessary.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3246dc4
    • Esben Haabendal's avatar
      net: ll_temac: Fix support for 64-bit platforms · d84aec42
      Esben Haabendal authored
      The use of buffer descriptor APP4 field (32-bit) for storing skb pointer
      obviously does not work on 64-bit platforms.
      As APP3 is also unused, we can use that to store the other half of 64-bit
      pointer values.
      
      Contrary to what is hinted at in commit message of commit 15bfe05c
      ("net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit")
      there are no other pointers stored in cdmac_bd.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d84aec42
    • Esben Haabendal's avatar
      net: ll_temac: Extend support to non-device-tree platforms · 8425c41d
      Esben Haabendal authored
      Support initialization with platdata, so the driver can be used on
      non-device-tree platforms.
      
      For currently supported device-tree platforms, the driver should behave
      as before.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8425c41d
    • Esben Haabendal's avatar
      net: ll_temac: Fix and simplify error handling by using devres functions · a63625d2
      Esben Haabendal authored
      As a side effect, a few error cases are fixed.
      
      If of_iomap() of sdma_regs failed, no error code was returned.  Fixed to
      return -ENOMEM similar to of_iomap() fail of regs.
      
      If sysfs_create_group() or register_netdev() failed, lp->phy_node was not
      released.
      
      Finally, the order in remove function is corrected to be reverse order
      of what is done in probe, i.e. calling temac_mdio_teardown() last, so we
      unregister the netdev that most likely is using the mdio_bus first.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a63625d2
    • Hangbin Liu's avatar
      selftests: fib_rule_tests: print the result and return 1 if any tests failed · f68d7c44
      Hangbin Liu authored
      Fixes: 65b2b493 ("selftests: net: initial fib rule tests")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f68d7c44
    • YueHaibing's avatar
      net: ethernet: ti: cpsw: Fix inconsistent IS_ERR and PTR_ERR in cpsw_probe() · ac97a359
      YueHaibing authored
      Fix inconsistent IS_ERR and PTR_ERR in cpsw_probe,
      The proper pointer to use is clk instead of mode.
      
      This issue was detected with the help of Coccinelle.
      
      Fixes: 83a8471b ("net: ethernet: ti: cpsw: refactor probe to group common hw initialization")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac97a359
    • Linus Torvalds's avatar
      gcc-9: properly declare the {pv,hv}clock_page storage · 459e3a21
      Linus Torvalds authored
      The pvlock_page and hvclock_page variables are (as the name implies)
      addresses to pages, created by the linker script.
      
      But we declared them as just "extern u8" variables, which _works_, but
      now that gcc does some more bounds checking, it causes warnings like
      
          warning: array subscript 1 is outside array bounds of ‘u8[1]’
      
      when we then access more than one byte from those variables.
      
      Fix this by simply making the declaration of the variables match
      reality, which makes the compiler happy too.
      Signed-off-by: default avatarLinus Torvalds <torvalds@-linux-foundation.org>
      459e3a21
    • Linus Torvalds's avatar
      gcc-9: don't warn about uninitialized variable · cf676908
      Linus Torvalds authored
      I'm not sure what made gcc warn about this code now.  The 'ret' variable
      does end up initialized in all cases, but it's definitely not obvious,
      so the compiler is quite reasonable to warn about this.
      
      So just add initialization to make it all much more obvious both to
      compilers and to humans.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf676908
    • Linus Torvalds's avatar
      gcc-9: silence 'address-of-packed-member' warning · 6f303d60
      Linus Torvalds authored
      We already did this for clang, but now gcc has that warning too.  Yes,
      yes, the address may be unaligned.  And that's kind of the point.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f303d60
    • Shmulik Ladkani's avatar
      ipv4: ip_do_fragment: Preserve skb_iif during fragmentation · d2f0c961
      Shmulik Ladkani authored
      Previously, during fragmentation after forwarding, skb->skb_iif isn't
      preserved, i.e. 'ip_copy_metadata' does not copy skb_iif from given
      'from' skb.
      
      As a result, ip_do_fragment's creates fragments with zero skb_iif,
      leading to inconsistent behavior.
      
      Assume for example an eBPF program attached at tc egress (post
      forwarding) that examines __sk_buff->ingress_ifindex:
       - the correct iif is observed if forwarding path does not involve
         fragmentation/refragmentation
       - a bogus iif is observed if forwarding path involves
         fragmentation/refragmentatiom
      
      Fix, by preserving skb_iif during 'ip_copy_metadata'.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2f0c961
    • Mark Rutland's avatar
      io_uring: avoid page allocation warnings · d4ef6475
      Mark Rutland authored
      In io_sqe_buffer_register() we allocate a number of arrays based on the
      iov_len from the user-provided iov. While we limit iov_len to SZ_1G,
      we can still attempt to allocate arrays exceeding MAX_ORDER.
      
      On a 64-bit system with 4KiB pages, for an iov where iov_base = 0x10 and
      iov_len = SZ_1G, we'll calculate that nr_pages = 262145. When we try to
      allocate a corresponding array of (16-byte) bio_vecs, requiring 4194320
      bytes, which is greater than 4MiB. This results in SLUB warning that
      we're trying to allocate greater than MAX_ORDER, and failing the
      allocation.
      
      Avoid this by using kvmalloc() for allocations dependent on the
      user-provided iov_len. At the same time, fix a leak of imu->bvec when
      registration fails.
      
      Full splat from before this patch:
      
      WARNING: CPU: 1 PID: 2314 at mm/page_alloc.c:4595 __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 2314 Comm: syz-executor326 Not tainted 5.1.0-rc7-dirty #4
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193
       show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x190 lib/dump_stack.c:113
       panic+0x384/0x68c kernel/panic.c:214
       __warn+0x2bc/0x2c0 kernel/panic.c:571
       report_bug+0x228/0x2d8 lib/bug.c:186
       bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956
       call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline]
       brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316
       do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831
       el1_dbg+0x18/0x8c
       __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
       alloc_pages_current+0x164/0x278 mm/mempolicy.c:2132
       alloc_pages include/linux/gfp.h:509 [inline]
       kmalloc_order+0x20/0x50 mm/slab_common.c:1231
       kmalloc_order_trace+0x30/0x2b0 mm/slab_common.c:1243
       kmalloc_large include/linux/slab.h:480 [inline]
       __kmalloc+0x3dc/0x4f0 mm/slub.c:3791
       kmalloc_array include/linux/slab.h:670 [inline]
       io_sqe_buffer_register fs/io_uring.c:2472 [inline]
       __io_uring_register fs/io_uring.c:2962 [inline]
       __do_sys_io_uring_register fs/io_uring.c:3008 [inline]
       __se_sys_io_uring_register fs/io_uring.c:2990 [inline]
       __arm64_sys_io_uring_register+0x9e0/0x1bc8 fs/io_uring.c:2990
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
       el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83
       el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129
       el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948
      SMP: stopping secondary CPUs
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      CPU features: 0x002,23000438
      Memory Limit: none
      Rebooting in 1 seconds..
      
      Fixes: edafccee ("io_uring: add support for pre-mapped user IO buffers")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4ef6475