1. 19 Jun, 2020 17 commits
    • David S. Miller's avatar
      Merge branch 'mptcp-cope-with-syncookie-on-MP_JOINs' · f3c7a6e0
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      mptcp: cope with syncookie on MP_JOINs
      
      Currently syncookies on MP_JOIN connections are not handled correctly: the
      connections fallback to TCP and are kept alive instead of resetting them at
      fallback time.
      
      The first patch propagates the required information up to syn_recv_sock time,
      and the 2nd patch addresses the unifying the error path for all MP_JOIN
      requests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3c7a6e0
    • Paolo Abeni's avatar
      mptcp: drop MP_JOIN request sock on syn cookies · 9e365ff5
      Paolo Abeni authored
      Currently any MPTCP socket using syn cookies will fallback to
      TCP at 3rd ack time. In case of MP_JOIN requests, the RFC mandate
      closing the child and sockets, but the existing error paths
      do not handle the syncookie scenario correctly.
      
      Address the issue always forcing the child shutdown in case of
      MP_JOIN fallback.
      
      Fixes: ae2dd716 ("mptcp: handle tcp fallback when using syn cookies")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e365ff5
    • Paolo Abeni's avatar
      mptcp: cache msk on MP_JOIN init_req · 8fd4de12
      Paolo Abeni authored
      The msk ownership is transferred to the child socket at
      3rd ack time, so that we avoid more lookups later. If the
      request does not reach the 3rd ack, the MSK reference is
      dropped at request sock release time.
      
      As a side effect, fallback is now tracked by a NULL msk
      reference instead of zeroed 'mp_join' field. This will
      simplify the next patch.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fd4de12
    • guodeqing's avatar
      net: Fix the arp error in some cases · 5eea3a63
      guodeqing authored
      ie.,
      $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
      
      $ ip rule add from 6.6.6.6 table 6666
      
      $ ip route add 9.9.9.9 via 6.6.6.6
      
      $ ping -I 6.6.6.6 9.9.9.9
      PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
      
      3 packets transmitted, 0 received, 100% packet loss, time 2079ms
      
      $ arp
      Address     HWtype  HWaddress           Flags Mask            Iface
      6.6.6.6             (incomplete)                              eth0
      
      The arp request address is error, this is because fib_table_lookup in
      fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
      the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
      Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.
      
      Fixes: 3bfd8472 ("net: Use passed in table for nexthop lookups")
      Signed-off-by: default avatarguodeqing <geffrey.guo@huawei.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5eea3a63
    • David S. Miller's avatar
      Merge branch 'sja1105-fixes' · ad103e03
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix VLAN checks for SJA1105 DSA tc-flower filters
      
      This fixes a ridiculous situation where the driver, in VLAN-unaware
      mode, would refuse accepting any tc filter:
      
      tc filter replace dev sw1p3 ingress flower skip_sw \
      	dst_mac 42:be:24:9b:76:20 \
      	action gate (...)
      Error: sja1105: Can only gate based on {DMAC, VID, PCP}.
      
      tc filter replace dev sw1p3 ingress protocol 802.1Q flower skip_sw \
      	vlan_id 1 vlan_prio 0 dst_mac 42:be:24:9b:76:20 \
      	action gate (...)
      Error: sja1105: Can only gate based on DMAC.
      
      So, without changing the VLAN awareness state, it says it doesn't want
      VLAN-aware rules, and it doesn't want VLAN-unaware rules either. One
      would say it's in Schrodinger's state...
      
      Now, the situation has been made worse by commit 7f14937f ("net:
      dsa: sja1105: keep the VLAN awareness state in a driver variable"),
      which made VLAN awareness a ternary attribute, but after inspecting the
      code from before that patch with a truth table, it looks like the
      logical bug was there even before.
      
      While attempting to fix this, I also noticed some leftover debugging
      code in one of the places that needed to be fixed. It would have
      appeared in the context of patch 3/3 anyway, so I decided to create a
      patch that removes it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad103e03
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix checks for VLAN state in gate action · 5182a622
      Vladimir Oltean authored
      This action requires the VLAN awareness state of the switch to be of the
      same type as the key that's being added:
      
      - If the switch is unaware of VLAN, then the tc filter key must only
        contain the destination MAC address.
      - If the switch is VLAN-aware, the key must also contain the VLAN ID and
        PCP.
      
      But this check doesn't work unless we verify the VLAN awareness state on
      both the "if" and the "else" branches.
      
      Fixes: 834f8933 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5182a622
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix checks for VLAN state in redirect action · c6ae970b
      Vladimir Oltean authored
      This action requires the VLAN awareness state of the switch to be of the
      same type as the key that's being added:
      
      - If the switch is unaware of VLAN, then the tc filter key must only
        contain the destination MAC address.
      - If the switch is VLAN-aware, the key must also contain the VLAN ID and
        PCP.
      
      But this check doesn't work unless we verify the VLAN awareness state on
      both the "if" and the "else" branches.
      
      Fixes: dfacc5a2 ("net: dsa: sja1105: support flow-based redirection via virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6ae970b
    • Vladimir Oltean's avatar
      net: dsa: sja1105: remove debugging code in sja1105_vl_gate · 5b3b396c
      Vladimir Oltean authored
      This shouldn't be there.
      
      Fixes: 834f8933 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b3b396c
    • David S. Miller's avatar
      Merge branch 'act_gate-fixes' · b64ee485
      David S. Miller authored
      Davide Caratti says:
      
      ====================
      two fixes for 'act_gate' control plane
      
      - patch 1/2 attempts to fix the error path of tcf_gate_init() when users
        try to configure 'act_gate' rules with wrong parameters
      - patch 2/2 is a follow-up of a recent fix for NULL dereference in
        the error path of tcf_gate_init()
      
      further work will introduce a tdc test for 'act_gate'.
      
      changes since v2:
        - fix undefined behavior in patch 1/2
        - improve comment in patch 2/2
      changes since v1:
        coding style fixes in patch 1/2 and 2/2
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b64ee485
    • Davide Caratti's avatar
      net/sched: act_gate: fix configuration of the periodic timer · c362a06e
      Davide Caratti authored
      assigning a dummy value of 'clock_id' to avoid cancellation of the cycle
      timer before its initialization was a temporary solution, and we still
      need to handle the case where act_gate timer parameters are changed by
      commands like the following one:
      
       # tc action replace action gate <parameters>
      
      the fix consists in the following items:
      
      1) remove the workaround assignment of 'clock_id', and init the list of
         entries before the first error path after IDR atomic check/allocation
      2) validate 'clock_id' earlier: there is no need to do IDR atomic
         check/allocation if we know that 'clock_id' is a bad value
      3) use a dedicated function, 'gate_setup_timer()', to ensure that the
         timer is cancelled and re-initialized on action overwrite, and also
         ensure we initialize the timer in the error path of tcf_gate_init()
      
      v3: improve comment in the error path of tcf_gate_init() (thanks to
          Vladimir Oltean)
      v2: avoid 'goto' in gate_setup_timer (thanks to Cong Wang)
      
      CC: Ivan Vecera <ivecera@redhat.com>
      Fixes: a01c2454 ("net/sched: fix a couple of splats in the error path of tfc_gate_init()")
      Fixes: a51c328d ("net: qos: introduce a gate control flow action")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c362a06e
    • Davide Caratti's avatar
      net/sched: act_gate: fix NULL dereference in tcf_gate_init() · 7024339a
      Davide Caratti authored
      it is possible to see a KASAN use-after-free, immediately followed by a
      NULL dereference crash, with the following command:
      
       # tc action add action gate index 3 cycle-time 100000000ns \
       > cycle-time-ext 100000000ns clockid CLOCK_TAI
      
       BUG: KASAN: use-after-free in tcf_action_init_1+0x8eb/0x960
       Write of size 1 at addr ffff88810a5908bc by task tc/883
      
       CPU: 0 PID: 883 Comm: tc Not tainted 5.7.0+ #188
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       Call Trace:
        dump_stack+0x75/0xa0
        print_address_description.constprop.6+0x1a/0x220
        kasan_report.cold.9+0x37/0x7c
        tcf_action_init_1+0x8eb/0x960
        tcf_action_init+0x157/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x120/0x380
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [...]
      
       KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
       CPU: 0 PID: 883 Comm: tc Tainted: G    B             5.7.0+ #188
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:tcf_action_fill_size+0xa3/0xf0
       [....]
       RSP: 0018:ffff88813a48f250 EFLAGS: 00010212
       RAX: dffffc0000000000 RBX: 0000000000000094 RCX: ffffffffa47c3eb6
       RDX: 000000000000000e RSI: 0000000000000008 RDI: 0000000000000070
       RBP: ffff88810a590800 R08: 0000000000000004 R09: ffffed1027491e03
       R10: 0000000000000003 R11: ffffed1027491e03 R12: 0000000000000000
       R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88810a590800
       FS:  00007f62cae8ce40(0000) GS:ffff888147c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f62c9d20a10 CR3: 000000013a52a000 CR4: 0000000000340ef0
       Call Trace:
        tcf_action_init+0x172/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x120/0x380
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      this is caused by the test on 'cycletime_ext', that is still unassigned
      when the action is newly created. This makes the action .init() return 0
      without calling tcf_idr_insert(), hence the UAF + crash.
      
      rework the logic that prevents zero values of cycle-time, as follows:
      
      1) 'tcfg_cycletime_ext' seems to be unused in the action software path,
         and it was already possible by other means to obtain non-zero
         cycletime and zero cycletime-ext. So, removing that test should not
         cause any damage.
      2) while at it, we must prevent overwriting configuration data with wrong
         ones: use a temporary variable for 'tcfg_cycletime', and validate it
         preserving the original semantic (that allowed computing the cycle
         time as the sum of all intervals, when not specified by
         TCA_GATE_CYCLE_TIME).
      3) remove the test on 'tcfg_cycletime', no more useful, and avoid
         returning -EFAULT, which did not seem an appropriate return value for
         a wrong netlink attribute.
      
      v3: fix uninitialized 'cycletime' (thanks to Vladimir Oltean)
      v2: remove useless 'return;' at the end of void gate_get_start_time()
      
      Fixes: a51c328d ("net: qos: introduce a gate control flow action")
      CC: Ivan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7024339a
    • Taehee Yoo's avatar
      ip_tunnel: fix use-after-free in ip_tunnel_lookup() · ba61539c
      Taehee Yoo authored
      In the datapath, the ip_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
      	    remote 10.0.0.2
          ip netns exec A ip link set gre1 up
          ip netns exec A ip a a 10.0.100.1/24 dev gre1
          ip netns exec A ip a a 10.0.0.1/24 dev eth0
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
      	    remote 10.0.0.1
          ip netns exec B ip link set gre1 up
          ip netns exec B ip a a 10.0.100.2/24 dev gre1
          ip netns exec B ip a a 10.0.0.2/24 dev eth1
          ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
          ip netns del B
      
      Splat looks like:
      [   77.793450][    C3] ==================================================================
      [   77.794702][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
      [   77.795573][    C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
      [   77.796398][    C3]
      [   77.796664][    C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
      [   77.797474][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   77.798453][    C3] Call Trace:
      [   77.798815][    C3]  <IRQ>
      [   77.799142][    C3]  dump_stack+0x9d/0xdb
      [   77.799605][    C3]  print_address_description.constprop.7+0x2cc/0x450
      [   77.800365][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.800908][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.801517][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.802145][    C3]  kasan_report+0x154/0x190
      [   77.802821][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.803503][    C3]  ip_tunnel_lookup+0xcc4/0xf30
      [   77.804165][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
      [   77.804862][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.805621][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
      [   77.806293][    C3]  ? lock_acquire+0x1a9/0x870
      [   77.806925][    C3]  ? gre_rcv+0xfe/0x354 [gre]
      [   77.807559][    C3]  ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
      [   77.808305][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.809032][    C3]  ? rcu_read_lock_held+0x90/0xa0
      [   77.809713][    C3]  gre_rcv+0x1b8/0x354 [gre]
      [ ... ]
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba61539c
    • Taehee Yoo's avatar
      ip6_gre: fix use-after-free in ip6gre_tunnel_lookup() · dafabb65
      Taehee Yoo authored
      In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add ip6gre1 type ip6gre local fc:0::1 \
      	    remote fc:0::2
          ip netns exec A ip -6 a a fc:100::1/64 dev ip6gre1
          ip netns exec A ip link set ip6gre1 up
          ip netns exec A ip -6 a a fc:0::1/64 dev eth0
          ip netns exec A ip link set ip6gre0 up
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add ip6gre1 type ip6gre local fc:0::2 \
      	    remote fc:0::1
          ip netns exec B ip -6 a a fc:100::2/64 dev ip6gre1
          ip netns exec B ip link set ip6gre1 up
          ip netns exec B ip -6 a a fc:0::2/64 dev eth1
          ip netns exec B ip link set ip6gre0 up
          ip netns exec A ping fc:100::2 -s 60000 &
          ip netns del B
      
      Splat looks like:
      [   73.087285][    C1] BUG: KASAN: use-after-free in ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.088361][    C1] Read of size 4 at addr ffff888040559218 by task ping/1429
      [   73.089317][    C1]
      [   73.089638][    C1] CPU: 1 PID: 1429 Comm: ping Not tainted 5.7.0+ #602
      [   73.090531][    C1] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   73.091725][    C1] Call Trace:
      [   73.092160][    C1]  <IRQ>
      [   73.092556][    C1]  dump_stack+0x96/0xdb
      [   73.093122][    C1]  print_address_description.constprop.6+0x2cc/0x450
      [   73.094016][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.094894][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.095767][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.096619][    C1]  kasan_report+0x154/0x190
      [   73.097209][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.097989][    C1]  ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.098750][    C1]  ? gre_del_protocol+0x60/0x60 [gre]
      [   73.099500][    C1]  gre_rcv+0x1c5/0x1450 [ip6_gre]
      [   73.100199][    C1]  ? ip6gre_header+0xf00/0xf00 [ip6_gre]
      [   73.100985][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.101830][    C1]  ? ip6_input_finish+0x5/0xf0
      [   73.102483][    C1]  ip6_protocol_deliver_rcu+0xcbb/0x1510
      [   73.103296][    C1]  ip6_input_finish+0x5b/0xf0
      [   73.103920][    C1]  ip6_input+0xcd/0x2c0
      [   73.104473][    C1]  ? ip6_input_finish+0xf0/0xf0
      [   73.105115][    C1]  ? rcu_read_lock_held+0x90/0xa0
      [   73.105783][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.106548][    C1]  ipv6_rcv+0x1f1/0x300
      [ ... ]
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dafabb65
    • Taehee Yoo's avatar
      net: core: reduce recursion limit value · fb7861d1
      Taehee Yoo authored
      In the current code, ->ndo_start_xmit() can be executed recursively only
      10 times because of stack memory.
      But, in the case of the vxlan, 10 recursion limit value results in
      a stack overflow.
      In the current code, the nested interface is limited by 8 depth.
      There is no critical reason that the recursion limitation value should
      be 10.
      So, it would be good to be the same value with the limitation value of
      nesting interface depth.
      
      Test commands:
          ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
          ip link set vxlan10 up
          ip a a 192.168.10.1/24 dev vxlan10
          ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent
      
          for i in {9..0}
          do
              let A=$i+1
      	ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
      	ip link set vxlan$i up
      	ip a a 192.168.$i.1/24 dev vxlan$i
      	ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
      	bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
          done
          hping3 192.168.10.2 -2 -d 60000
      
      Splat looks like:
      [  103.814237][ T1127] =============================================================================
      [  103.871955][ T1127] BUG kmalloc-2k (Tainted: G    B            ): Padding overwritten. 0x00000000897a2e4f-0x000
      [  103.873187][ T1127] -----------------------------------------------------------------------------
      [  103.873187][ T1127]
      [  103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
      [  103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G    B             5.7.0+ #575
      [  103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  103.883006][ T1127] Call Trace:
      [  103.883324][ T1127]  dump_stack+0x96/0xdb
      [  103.883716][ T1127]  slab_err+0xad/0xd0
      [  103.884106][ T1127]  ? _raw_spin_unlock+0x1f/0x30
      [  103.884620][ T1127]  ? get_partial_node.isra.78+0x140/0x360
      [  103.885214][ T1127]  slab_pad_check.part.53+0xf7/0x160
      [  103.885769][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.886316][ T1127]  check_slab+0x97/0xb0
      [  103.886763][ T1127]  alloc_debug_processing+0x84/0x1a0
      [  103.887308][ T1127]  ___slab_alloc+0x5a5/0x630
      [  103.887765][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.888265][ T1127]  ? lock_downgrade+0x730/0x730
      [  103.888762][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.889244][ T1127]  ? __slab_alloc+0x3e/0x80
      [  103.889675][ T1127]  __slab_alloc+0x3e/0x80
      [  103.890108][ T1127]  __kmalloc_node_track_caller+0xc7/0x420
      [ ... ]
      
      Fixes: 11a766ce ("net: Increase xmit RECURSION_LIMIT to 10.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb7861d1
    • Yang Yingliang's avatar
      net: fix memleak in register_netdevice() · 814152a8
      Yang Yingliang authored
      I got a memleak report when doing some fuzz test:
      
      unreferenced object 0xffff888112584000 (size 13599):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
          00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002f60ba65>] __kmalloc_node+0x309/0x3a0
          [<0000000075b211ec>] kvmalloc_node+0x7f/0xc0
          [<00000000d3a97396>] alloc_netdev_mqs+0x76/0xfc0
          [<00000000609c3655>] __tun_chr_ioctl+0x1456/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff888111845cc0 (size 8):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 8 bytes):
          74 61 70 30 00 88 ff ff                          tap0....
        backtrace:
          [<000000004c159777>] kstrdup+0x35/0x70
          [<00000000d8b496ad>] kstrdup_const+0x3d/0x50
          [<00000000494e884a>] kvasprintf_const+0xf1/0x180
          [<0000000097880a2b>] kobject_set_name_vargs+0x56/0x140
          [<000000008fbdfc7b>] dev_set_name+0xab/0xe0
          [<000000005b99e3b4>] netdev_register_kobject+0xc0/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff88811886d800 (size 512):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
        backtrace:
          [<0000000050315800>] device_add+0x61e/0x1950
          [<0000000021008dfb>] netdev_register_kobject+0x17e/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      If call_netdevice_notifiers() failed, then rollback_registered()
      calls netdev_unregister_kobject() which holds the kobject. The
      reference cannot be put because the netdev won't be add to todo
      list, so it will leads a memleak, we need put the reference to
      avoid memleak.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      814152a8
    • Sascha Hauer's avatar
      net: ethernet: mvneta: Add 2500BaseX support for SoCs without comphy · 1a642ca7
      Sascha Hauer authored
      The older SoCs like Armada XP support a 2500BaseX mode in the datasheets
      referred to as DR-SGMII (Double rated SGMII) or HS-SGMII (High Speed
      SGMII). This is an upclocked 1000BaseX mode, thus
      PHY_INTERFACE_MODE_2500BASEX is the appropriate mode define for it.
      adding support for it merely means writing the correct magic value into
      the MVNETA_SERDES_CFG register.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a642ca7
    • Sascha Hauer's avatar
      net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy · b4748553
      Sascha Hauer authored
      The MVNETA_SERDES_CFG register is only available on older SoCs like the
      Armada XP. On newer SoCs like the Armada 38x the fields are moved to
      comphy. This patch moves the writes to this register next to the comphy
      initialization, so that depending on the SoC either comphy or
      MVNETA_SERDES_CFG is configured.
      With this we no longer write to the MVNETA_SERDES_CFG on SoCs where it
      doesn't exist.
      Suggested-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4748553
  2. 17 Jun, 2020 16 commits
    • Shannon Nelson's avatar
      ionic: export features for vlans to use · ef7232da
      Shannon Nelson authored
      Set up vlan_features for use by any vlans above us.
      
      Fixes: beead698 ("ionic: Add the basic NDO callbacks for netdev support")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Acked-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef7232da
    • Shannon Nelson's avatar
      ionic: no link check while resetting queues · 3103b6fe
      Shannon Nelson authored
      If the driver is busy resetting queues after a change in
      MTU or queue parameters, don't bother checking the link,
      wait until the next watchdog cycle.
      
      Fixes: 987c0871 ("ionic: check for linkup in watchdog")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Acked-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3103b6fe
    • Jeremy Kerr's avatar
      net: usb: ax88179_178a: fix packet alignment padding · e869e7a1
      Jeremy Kerr authored
      Using a AX88179 device (0b95:1790), I see two bytes of appended data on
      every RX packet. For example, this 48-byte ping, using 0xff as a
      payload byte:
      
        04:20:22.528472 IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 2447, seq 1, length 64
      	0x0000:  000a cd35 ea50 000a cd35 ea4f 0800 4500
      	0x0010:  0054 c116 4000 4001 f63e c0a8 0101 c0a8
      	0x0020:  0102 0800 b633 098f 0001 87ea cd5e 0000
      	0x0030:  0000 dcf2 0600 0000 0000 ffff ffff ffff
      	0x0040:  ffff ffff ffff ffff ffff ffff ffff ffff
      	0x0050:  ffff ffff ffff ffff ffff ffff ffff ffff
      	0x0060:  ffff 961f
      
      Those last two bytes - 96 1f - aren't part of the original packet.
      
      In the ax88179 RX path, the usbnet rx_fixup function trims a 2-byte
      'alignment pseudo header' from the start of the packet, and sets the
      length from a per-packet field populated by hardware. It looks like that
      length field *includes* the 2-byte header; the current driver assumes
      that it's excluded.
      
      This change trims the 2-byte alignment header after we've set the packet
      length, so the resulting packet length is correct. While we're moving
      the comment around, this also fixes the spelling of 'pseudo'.
      Signed-off-by: default avatarJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e869e7a1
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · b9d37bbb
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-06-17
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 10 non-merge commits during the last 2 day(s) which contain
      a total of 14 files changed, 158 insertions(+), 59 deletions(-).
      
      The main changes are:
      
      1) Important fix for bpf_probe_read_kernel_str() return value, from Andrii.
      
      2) [gs]etsockopt fix for large optlen, from Stanislav.
      
      3) devmap allocation fix, from Toke.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9d37bbb
    • Stanislav Fomichev's avatar
      bpf: Document optval > PAGE_SIZE behavior for sockopt hooks · 8030e250
      Stanislav Fomichev authored
      Extend existing doc with more details about requiring ctx->optlen = 0
      for handling optval > PAGE_SIZE.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200617010416.93086-3-sdf@google.com
      8030e250
    • Stanislav Fomichev's avatar
      selftests/bpf: Make sure optvals > PAGE_SIZE are bypassed · a0cb12b0
      Stanislav Fomichev authored
      We are relying on the fact, that we can pass > sizeof(int) optvals
      to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes).
      In the BPF program we check that we can only touch PAGE_SIZE bytes,
      but the real optlen is PAGE_SIZE * 2. In both cases, we override it to
      some predefined value and trim the optlen.
      
      Also, let's modify exiting IP_TOS usecase to test optlen=0 case
      where BPF program just bypasses the data as is.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200617010416.93086-2-sdf@google.com
      a0cb12b0
    • Stanislav Fomichev's avatar
      bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE · d8fe449a
      Stanislav Fomichev authored
      Attaching to these hooks can break iptables because its optval is
      usually quite big, or at least bigger than the current PAGE_SIZE limit.
      David also mentioned some SCTP options can be big (around 256k).
      
      For such optvals we expose only the first PAGE_SIZE bytes to
      the BPF program. BPF program has two options:
      1. Set ctx->optlen to 0 to indicate that the BPF's optval
         should be ignored and the kernel should use original userspace
         value.
      2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
      
      v5:
      * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
      * update the docs accordingly
      
      v4:
      * use temporary buffer to avoid optval == optval_end == NULL;
        this removes the corner case in the verifier that might assume
        non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
      
      v3:
      * don't increase the limit, bypass the argument
      
      v2:
      * proper comments formatting (Jakub Kicinski)
      
      Fixes: 0d01da6a ("bpf: implement getsockopt and setsockopt hooks")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Link: https://lore.kernel.org/bpf/20200617010416.93086-1-sdf@google.com
      d8fe449a
    • Toke Høiland-Jørgensen's avatar
      devmap: Use bpf_map_area_alloc() for allocating hash buckets · 99c51064
      Toke Høiland-Jørgensen authored
      Syzkaller discovered that creating a hash of type devmap_hash with a large
      number of entries can hit the memory allocator limit for allocating
      contiguous memory regions. There's really no reason to use kmalloc_array()
      directly in the devmap code, so just switch it to the existing
      bpf_map_area_alloc() function that is used elsewhere.
      
      Fixes: 6f9d451a ("xdp: Add devmap_hash map type for looking up devices by hashed index")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com
      99c51064
    • Hangbin Liu's avatar
      xdp: Handle frame_sz in xdp_convert_zc_to_xdp_frame() · 3ff23516
      Hangbin Liu authored
      In commit 34cc0b33 we only handled the frame_sz in convert_to_xdp_frame().
      This patch will also handle frame_sz in xdp_convert_zc_to_xdp_frame().
      
      Fixes: 34cc0b33 ("xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616103518.2963410-1-liuhangbin@gmail.com
      3ff23516
    • Tobias Klauser's avatar
      tools, bpftool: Add ringbuf map type to map command docs · 1c7fb20d
      Tobias Klauser authored
      Commit c34a06c5 ("tools/bpftool: Add ringbuf map to a list of known
      map types") added the symbolic "ringbuf" name. Document it in the bpftool
      map command docs and usage as well.
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616113303.8123-1-tklauser@distanz.ch
      1c7fb20d
    • Andrii Nakryiko's avatar
      bpf: bpf_probe_read_kernel_str() has to return amount of data read on success · 02553b91
      Andrii Nakryiko authored
      During recent refactorings, bpf_probe_read_kernel_str() started returning 0 on
      success, instead of amount of data successfully read. This majorly breaks
      applications relying on bpf_probe_read_kernel_str() and bpf_probe_read_str()
      and their results. Fix this by returning actual number of bytes read.
      
      Fixes: 8d92db5c ("bpf: rework the compat kernel probe handling")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616050432.1902042-1-andriin@fb.com
      02553b91
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 69119673
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Don't get per-cpu pointer with preemption enabled in nft_set_pipapo,
          fix from Stefano Brivio.
      
       2) Fix memory leak in ctnetlink, from Pablo Neira Ayuso.
      
       3) Multiple definitions of MPTCP_PM_MAX_ADDR, from Geliang Tang.
      
       4) Accidently disabling NAPI in non-error paths of macb_open(), from
          Charles Keepax.
      
       5) Fix races between alx_stop and alx_remove, from Zekun Shen.
      
       6) We forget to re-enable SRIOV during resume in bnxt_en driver, from
          Michael Chan.
      
       7) Fix memory leak in ipv6_mc_destroy_dev(), from Wang Hai.
      
       8) rxtx stats use wrong index in mvpp2 driver, from Sven Auhagen.
      
       9) Fix memory leak in mptcp_subflow_create_socket error path, from Wei
          Yongjun.
      
      10) We should not adjust the TCP window advertised when sending dup acks
          in non-SACK mode, because it won't be counted as a dup by the sender
          if the window size changes. From Eric Dumazet.
      
      11) Destroy the right number of queues during remove in mvpp2 driver,
          from Sven Auhagen.
      
      12) Various WOL and PM fixes to e1000 driver, from Chen Yu, Vaibhav
          Gupta, and Arnd Bergmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
        e1000e: fix unused-function warning
        e1000: use generic power management
        e1000e: Do not wake up the system via WOL if device wakeup is disabled
        lan743x: add MODULE_DEVICE_TABLE for module loading alias
        mlxsw: spectrum: Adjust headroom buffers for 8x ports
        bareudp: Fixed configuration to avoid having garbage values
        mvpp2: remove module bugfix
        tcp: grow window for OOO packets only for SACK flows
        mptcp: fix memory leak in mptcp_subflow_create_socket()
        netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inline
        net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inline
        net: dsa: sja1105: fix PTP timestamping with large tc-taprio cycles
        mvpp2: ethtool rxtx stats fix
        MAINTAINERS: switch to my private email for Renesas Ethernet drivers
        rocker: fix incorrect error handling in dma_rings_init
        test_objagg: Fix potential memory leak in error handling
        net: ethernet: mtk-star-emac: simplify interrupt handling
        mld: fix memory leak in ipv6_mc_destroy_dev()
        bnxt_en: Return from timer if interface is not in open state.
        bnxt_en: Fix AER reset logic on 57500 chips.
        ...
      69119673
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 26c20ffc
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
       "I've managed to get xfstests kind of working with afs. Here are a set
        of patches that fix most of the bugs found.
      
        There are a number of primary issues:
      
         - Incorrect handling of mtime and non-handling of ctime. It might be
           argued, that the latter isn't a bug since the AFS protocol doesn't
           support ctime, but I should probably still update it locally.
      
         - Shared-write mmap, truncate and writeback bugs. This includes not
           changing i_size under the callback lock, overwriting local i_size
           with the reply from the server after a partial writeback, not
           limiting the writeback from an mmapped page to EOF.
      
         - Checks for an abort code indicating that the primary vnode in an
           operation was deleted by a third-party are done in the wrong place.
      
         - Silly rename bugs. This includes an incomplete conversion to the
           new operation handling, duplicate nlink handling, nlink changing
           not being done inside the callback lock and insufficient handling
           of third-party conflicting directory changes.
      
        And some secondary ones:
      
         - The UAEOVERFLOW abort code should map to EOVERFLOW not EREMOTEIO.
      
         - Remove a couple of unused or incompletely used bits.
      
         - Remove a couple of redundant success checks.
      
        These seem to fix all the data-corruption bugs found by
      
      	./check -afs -g quick
      
        along with the obvious silly rename bugs and time bugs.
      
        There are still some test failures, but they seem to fall into two
        classes: firstly, the authentication/security model is different to
        the standard UNIX model and permission is arbitrated by the server and
        cached locally; and secondly, there are a number of features that AFS
        does not support (such as mknod). But in these cases, the tests
        themselves need to be adapted or skipped.
      
        Using the in-kernel afs client with xfstests also found a bug in the
        AuriStor AFS server that has been fixed for a future release"
      
      * tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix silly rename
        afs: afs_vnode_commit_status() doesn't need to check the RPC error
        afs: Fix use of afs_check_for_remote_deletion()
        afs: Remove afs_operation::abort_code
        afs: Fix yfs_fs_fetch_status() to honour vnode selector
        afs: Remove yfs_fs_fetch_file_status() as it's not used
        afs: Fix the mapping of the UAEOVERFLOW abort code
        afs: Fix truncation issues and mmap writeback size
        afs: Concoct ctimes
        afs: Fix EOF corruption
        afs: afs_write_end() should change i_size under the right lock
        afs: Fix non-setting of mtime when writing into mmap
      26c20ffc
    • Randy Dunlap's avatar
      Documentation: remove SH-5 index entries · f17957f7
      Randy Dunlap authored
      Remove SH-5 documentation index entries following the removal
      of SH-5 source code.
      
      Error: Cannot open file ../arch/sh/mm/tlb-sh5.c
      Error: Cannot open file ../arch/sh/mm/tlb-sh5.c
      Error: Cannot open file ../arch/sh/include/asm/tlb_64.h
      Error: Cannot open file ../arch/sh/include/asm/tlb_64.h
      
      Fixes: 3b69e8b4 ("Merge tag 'sh-for-5.8' of git://git.libc.org/linux-sh")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: ysato@users.sourceforge.jp
      Cc: linux-sh@vger.kernel.org
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f17957f7
    • Linus Torvalds's avatar
      Merge tag 'flex-array-conversions-5.8-rc2' of... · ffbc9376
      Linus Torvalds authored
      Merge tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull flexible-array member conversions from Gustavo A. R. Silva:
       "Replace zero-length arrays with flexible-array members.
      
        Notice that all of these patches have been baking in linux-next for
        two development cycles now.
      
        There is a regular need in the kernel to provide a way to declare
        having a dynamically sized set of trailing elements in a structure.
        Kernel code should always use “flexible array members”[1] for these
        cases. The older style of one-element or zero-length arrays should no
        longer be used[2].
      
        C99 introduced “flexible array members”, which lacks a numeric size
        for the array declaration entirely:
      
              struct something {
                      size_t count;
                      struct foo items[];
              };
      
        This is the way the kernel expects dynamically sized trailing elements
        to be declared. It allows the compiler to generate errors when the
        flexible array does not occur last in the structure, which helps to
        prevent some kind of undefined behavior[3] bugs from being
        inadvertently introduced to the codebase.
      
        It also allows the compiler to correctly analyze array sizes (via
        sizeof(), CONFIG_FORTIFY_SOURCE, and CONFIG_UBSAN_BOUNDS). For
        instance, there is no mechanism that warns us that the following
        application of the sizeof() operator to a zero-length array always
        results in zero:
      
              struct something {
                      size_t count;
                      struct foo items[0];
              };
      
              struct something *instance;
      
              instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
              instance->count = count;
      
              size = sizeof(instance->items) * instance->count;
              memcpy(instance->items, source, size);
      
        At the last line of code above, size turns out to be zero, when one
        might have thought it represents the total size in bytes of the
        dynamic memory recently allocated for the trailing array items. Here
        are a couple examples of this issue[4][5].
      
        Instead, flexible array members have incomplete type, and so the
        sizeof() operator may not be applied[6], so any misuse of such
        operators will be immediately noticed at build time.
      
        The cleanest and least error-prone way to implement this is through
        the use of a flexible array member:
      
              struct something {
                      size_t count;
                      struct foo items[];
              };
      
              struct something *instance;
      
              instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
              instance->count = count;
      
              size = sizeof(instance->items[0]) * instance->count;
              memcpy(instance->items, source, size);
      
        instead"
      
      [1] https://en.wikipedia.org/wiki/Flexible_array_member
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      [4] commit f2cd32a4 ("rndis_wlan: Remove logically dead code")
      [5] commit ab91c2a8 ("tpm: eventlog: Replace zero-length array with flexible-array member")
      [6] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      
      * tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (41 commits)
        w1: Replace zero-length array with flexible-array
        tracing/probe: Replace zero-length array with flexible-array
        soc: ti: Replace zero-length array with flexible-array
        tifm: Replace zero-length array with flexible-array
        dmaengine: tegra-apb: Replace zero-length array with flexible-array
        stm class: Replace zero-length array with flexible-array
        Squashfs: Replace zero-length array with flexible-array
        ASoC: SOF: Replace zero-length array with flexible-array
        ima: Replace zero-length array with flexible-array
        sctp: Replace zero-length array with flexible-array
        phy: samsung: Replace zero-length array with flexible-array
        RxRPC: Replace zero-length array with flexible-array
        rapidio: Replace zero-length array with flexible-array
        media: pwc: Replace zero-length array with flexible-array
        firmware: pcdp: Replace zero-length array with flexible-array
        oprofile: Replace zero-length array with flexible-array
        block: Replace zero-length array with flexible-array
        tools/testing/nvdimm: Replace zero-length array with flexible-array
        libata: Replace zero-length array with flexible-array
        kprobes: Replace zero-length array with flexible-array
        ...
      ffbc9376
    • Arvind Sankar's avatar
      x86/purgatory: Add -fno-stack-protector · ff58155c
      Arvind Sankar authored
      The purgatory Makefile removes -fstack-protector options if they were
      configured in, but does not currently add -fno-stack-protector.
      
      If gcc was configured with the --enable-default-ssp configure option,
      this results in the stack protector still being enabled for the
      purgatory (absent distro-specific specs files that might disable it
      again for freestanding compilations), if the main kernel is being
      compiled with stack protection enabled (if it's disabled for the main
      kernel, the top-level Makefile will add -fno-stack-protector).
      
      This will break the build since commit
        e4160b2e ("x86/purgatory: Fail the build if purgatory.ro has missing symbols")
      and prior to that would have caused runtime failure when trying to use
      kexec.
      
      Explicitly add -fno-stack-protector to avoid this, as done in other
      Makefiles that need to disable the stack protector.
      Reported-by: default avatarGabriel C <nix.or.die@googlemail.com>
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff58155c
  3. 16 Jun, 2020 7 commits
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · c9f66b43
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2020-06-16
      
      This series contains fixes to e1000 and e1000e.
      
      Chen fixes an e1000e issue where systems could be waken via WoL, even
      though the user has disabled the wakeup bit via sysfs.
      
      Vaibhav Gupta updates the e1000 driver to clean up the legacy Power
      Management hooks.
      
      Arnd Bergmann cleans up the inconsistent use CONFIG_PM_SLEEP
      preprocessor tags, which also resolves the compiler warnings about the
      possibility of unused structure.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9f66b43
    • Arnd Bergmann's avatar
      e1000e: fix unused-function warning · 880e6269
      Arnd Bergmann authored
      The CONFIG_PM_SLEEP #ifdef checks in this file are inconsistent,
      leading to a warning about sometimes unused function:
      
      drivers/net/ethernet/intel/e1000e/netdev.c:137:13: error: unused function 'e1000e_check_me' [-Werror,-Wunused-function]
      
      Rather than adding more #ifdefs, just remove them completely
      and mark the PM functions as __maybe_unused to let the compiler
      work it out on it own.
      
      Fixes: e086ba2f ("e1000e: disable s0ix entry and exit flows for ME systems")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      880e6269
    • Vaibhav Gupta's avatar
      e1000: use generic power management · eb6779d4
      Vaibhav Gupta authored
      With legacy PM hooks, it was the responsibility of a driver to manage PCI
      states and also the device's power state. The generic approach is to let PCI
      core handle the work.
      
      e1000_suspend() calls __e1000_shutdown() to perform intermediate tasks.
      __e1000_shutdown() modifies the value of "wake" (device should be wakeup
      enabled or not), responsible for controlling the flow of legacy PM.
      
      Since, PCI core has no idea about the value of "wake", new code for generic
      PM may produce unexpected results. Thus, use "device_set_wakeup_enable()"
      to wakeup-enable the device accordingly.
      Signed-off-by: default avatarVaibhav Gupta <vaibhavgupta40@gmail.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      eb6779d4
    • Chen Yu's avatar
      e1000e: Do not wake up the system via WOL if device wakeup is disabled · 6bf6be11
      Chen Yu authored
      Currently the system will be woken up via WOL(Wake On LAN) even if the
      device wakeup ability has been disabled via sysfs:
       cat /sys/devices/pci0000:00/0000:00:1f.6/power/wakeup
       disabled
      
      The system should not be woken up if the user has explicitly
      disabled the wake up ability for this device.
      
      This patch clears the WOL ability of this network device if the
      user has disabled the wake up ability in sysfs.
      
      Fixes: bc7f75fa ("[E1000E]: New pci-express e1000 driver")
      Reported-by: default avatar"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6bf6be11
    • Tim Harvey's avatar
      lan743x: add MODULE_DEVICE_TABLE for module loading alias · ea12fe9d
      Tim Harvey authored
      Without a MODULE_DEVICE_TABLE the attributes are missing that create
      an alias for auto-loading the module in userspace via hotplug.
      Signed-off-by: default avatarTim Harvey <tharvey@gateworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea12fe9d
    • David Howells's avatar
      afs: Fix silly rename · b6489a49
      David Howells authored
      Fix AFS's silly rename by the following means:
      
       (1) Set the destination directory in afs_do_silly_rename() so as to avoid
           misbehaviour and indicate that the directory data version will
           increment by 1 so as to avoid warnings about unexpected changes in the
           DV.  Also indicate that the ctime should be updated to avoid xfstest
           grumbling.
      
       (2) Note when the server indicates that a directory changed more than we
           expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
           third party change, checking on successful completion of unlink and
           rename.
      
           The problem is that the FS.RemoveFile RPC op doesn't report the status
           of the unlinked file, though YFS.RemoveFile2 does.  This can be
           mitigated by the assumption that if the directory DV cranked by
           exactly 1, we can be sure we removed one link from the file; further,
           ordinarily in AFS, files cannot be hardlinked across directories, so
           if we reduce nlink to 0, the file is deleted.
      
           However, if the directory DV jumps by more than 1, we cannot know if a
           third party intervened by adding or removing a link on the file we
           just removed a link from.
      
           The same also goes for any vnode that is at the destination of the
           FS.Rename RPC op.
      
       (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
           section along with the other attribute updates if ->op_unlinked is set
           on the descriptor for the appropriate vnode.
      
       (4) Issue a follow up status fetch to the unlinked file in the event of a
           third party conflict that makes it impossible for us to know if we
           actually deleted the file or not.
      
       (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
           the user about the nlink of a silly deleted file so that it appears as
           0, not 1.
      
      Found with the generic/035 and generic/084 xfstests.
      
      Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b6489a49
    • Ido Schimmel's avatar
      mlxsw: spectrum: Adjust headroom buffers for 8x ports · 60833d54
      Ido Schimmel authored
      The port's headroom buffers are used to store packets while they
      traverse the device's pipeline and also to store packets that are egress
      mirrored.
      
      On Spectrum-3, ports with eight lanes use two headroom buffers between
      which the configured headroom size is split.
      
      In order to prevent packet loss, multiply the calculated headroom size
      by two for 8x ports.
      
      Fixes: da382875 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60833d54