1. 18 Feb, 2020 17 commits
  2. 17 Feb, 2020 23 commits
    • Horatiu Vultur's avatar
      net: mscc: fix in frame extraction · a8154104
      Horatiu Vultur authored
      Each extracted frame on Ocelot has an IFH. The frame and IFH are extracted
      by reading chuncks of 4 bytes from a register.
      
      In case the IFH and frames were read corretly it would try to read the next
      frame. In case there are no more frames in the queue, it checks if there
      were any previous errors and in that case clear the queue. But this check
      will always succeed also when there are no errors. Because when extracting
      the IFH the error is checked against 4(number of bytes read) and then the
      error is set only if the extraction of the frame failed. So in a happy case
      where there are no errors the err variable is still 4. So it could be
      a case where after the check that there are no more frames in the queue, a
      frame will arrive in the queue but because the error is not reseted, it
      would try to flush the queue. So the frame will be lost.
      
      The fix consist in resetting the error after reading the IFH.
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Acked-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8154104
    • Florian Westphal's avatar
      netfilter: conntrack: allow insertion of clashing entries · 6a757c07
      Florian Westphal authored
      This patch further relaxes the need to drop an skb due to a clash with
      an existing conntrack entry.
      
      Current clash resolution handles the case where the clash occurs between
      two identical entries (distinct nf_conn objects with same tuples), i.e.:
      
                          Original                        Reply
      existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      
      ... existing handling will discard the unconfirmed clashing entry and
      makes skb->_nfct point to the existing one.  The skb can then be
      processed normally just as if the clash would not have existed in the
      first place.
      
      For other clashes, the skb needs to be dropped.
      This frequently happens with DNS resolvers that send A and AAAA queries
      back-to-back when NAT rules are present that cause packets to get
      different DNAT transformations applied, for example:
      
      -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
      -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353
      
      In this case the A or AAAA query is dropped which incurs a costly
      delay during name resolution.
      
      This patch also allows this collision type:
                             Original                   Reply
      existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.7:5353
      
      In this case, clash is in original direction -- the reply direction
      is still unique.
      
      The change makes it so that when the 2nd colliding packet is received,
      the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed
      1 second timeout and is inserted in the reply direction only.
      
      The entry is hidden from 'conntrack -L', it will time out quickly
      and it can be early dropped because it will never progress to the
      ASSURED state.
      
      To avoid special-casing the delete code path to special case
      the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is
      added so hlist_nulls_del() will work.
      
      Example:
      
            CPU A:                               CPU B:
      1.  10.2.3.4:42 -> 10.8.8.8:53 (A)
      2.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
      3.  Apply DNAT, reply changed to 10.0.0.6
      4.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
      5.                                         Apply DNAT, reply changed to 10.0.0.7
      6. confirm/commit to conntrack table, no collisions
      7.                                         commit clashing entry
      
      Reply comes in:
      
      10.2.3.4:42 <- 10.0.0.6:5353 (A)
       -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
      10.2.3.4:42 <- 10.0.0.7:5353 (AAAA)
       -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
          The conntrack entry is deleted from table, as it has the NAT_CLASH
          bit set.
      
      In case of a retransmit from ORIGINAL dir, all further packets will get
      the DNAT transformation to 10.0.0.6.
      
      I tried to come up with other solutions but they all have worse
      problems.
      
      Alternatives considered were:
      1.  Confirm ct entries at allocation time, not in postrouting.
       a. will cause uneccesarry work when the skb that creates the
          conntrack is dropped by ruleset.
       b. in case nat is applied, ct entry would need to be moved in
          the table, which requires another spinlock pair to be taken.
       c. breaks the 'unconfirmed entry is private to cpu' assumption:
          we would need to guard all nfct->ext allocation requests with
          ct->lock spinlock.
      
      2. Make the unconfirmed list a hash table instead of a pcpu list.
         Shares drawback c) of the first alternative.
      
      3. Document this is expected and force users to rearrange their
         ruleset (e.g. by using "-m cluster" instead of "-m statistics").
         nft has the 'jhash' expression which can be used instead of 'numgen'.
      
         Major drawback: doesn't fix what I consider a bug, not very realistic
         and I believe its reasonable to have the existing rulesets to 'just
         work'.
      
      4. Document this is expected and force users to steer problematic
         packets to the same CPU -- this would serialize the "allocate new
         conntrack entry/nat table evaluation/perform nat/confirm entry", so
         no race can occur.  Similar drawback to 3.
      
      Another advantage of this patch compared to 1) and 2) is that there are
      no changes to the hot path; things are handled in the udp tracker and
      the clash resolution path.
      
      Cc: rcu@vger.kernel.org
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6a757c07
    • Paul Cercueil's avatar
      net: ethernet: dm9000: Handle -EPROBE_DEFER in dm9000_parse_dt() · 9a6a0dea
      Paul Cercueil authored
      The call to of_get_mac_address() can return -EPROBE_DEFER, for instance
      when the MAC address is read from a NVMEM driver that did not probe yet.
      
      Cc: H. Nikolaus Schaller <hns@goldelico.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a6a0dea
    • Randy Dunlap's avatar
      skbuff.h: fix all kernel-doc warnings · d2f273f0
      Randy Dunlap authored
      Fix all kernel-doc warnings in <linux/skbuff.h>.
      Fixes these warnings:
      
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'list' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2f273f0
    • Randy Dunlap's avatar
      skbuff: remove stale bit mask comments · 8955b435
      Randy Dunlap authored
      Remove stale comments since this flag is no longer a bit mask
      but is a bit field.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8955b435
    • Randy Dunlap's avatar
      net/sock.h: fix all kernel-doc warnings · 66256e0b
      Randy Dunlap authored
      Fix all kernel-doc warnings for <net/sock.h>.
      Fixes these warnings:
      
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
      
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_rx_skb_cache' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_tx_skb_cache' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_bpf_storage' not described in 'sock'
      
      ../include/net/sock.h:2024: warning: No description found for return value of 'sk_wmem_alloc_get'
      ../include/net/sock.h:2035: warning: No description found for return value of 'sk_rmem_alloc_get'
      ../include/net/sock.h:2046: warning: No description found for return value of 'sk_has_allocations'
      ../include/net/sock.h:2082: warning: No description found for return value of 'skwq_has_sleeper'
      ../include/net/sock.h:2244: warning: No description found for return value of 'sk_page_frag'
      ../include/net/sock.h:2444: warning: Function parameter or member 'tcp_rx_skb_cache_key' not described in 'DECLARE_STATIC_KEY_FALSE'
      ../include/net/sock.h:2444: warning: Excess function parameter 'sk' description in 'DECLARE_STATIC_KEY_FALSE'
      ../include/net/sock.h:2444: warning: Excess function parameter 'skb' description in 'DECLARE_STATIC_KEY_FALSE'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66256e0b
    • Marek Vasut's avatar
      net: ks8851-ml: Fix 16-bit IO operation · 58292104
      Marek Vasut authored
      The Micrel KSZ8851-16MLLI datasheet DS00002357B page 12 states that
      BE[3:0] signals are active high. This contradicts the measurements
      of the behavior of the actual chip, where these signals behave as
      active low. For example, to read the CIDER register, the bus must
      expose 0xc0c0 during the address phase, which means BE[3:0]=4'b1100.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: Petr Stetiar <ynezz@true.cz>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58292104
    • Marek Vasut's avatar
      net: ks8851-ml: Fix 16-bit data access · edacb098
      Marek Vasut authored
      The packet data written to and read from Micrel KSZ8851-16MLLI must be
      byte-swapped in 16-bit mode, add this byte-swapping.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: Petr Stetiar <ynezz@true.cz>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edacb098
    • Marek Vasut's avatar
      net: ks8851-ml: Remove 8-bit bus accessors · 69233bba
      Marek Vasut authored
      This driver is mixing 8-bit and 16-bit bus accessors for reasons unknown,
      however the speculation is that this was some sort of attempt to support
      the 8-bit bus mode.
      
      As per the KS8851-16MLL documentation, all two registers accessed via the
      8-bit accessors are internally 16-bit registers, so reading them using
      16-bit accessors is fine. The KS_CCR read can be converted to 16-bit read
      outright, as it is already a concatenation of two 8-bit reads of that
      register. The KS_RXQCR accesses are 8-bit only, however writing the top
      8 bits of the register is OK as well, since the driver caches the entire
      16-bit register value anyway.
      
      Finally, the driver is not used by any hardware in the kernel right now.
      The only hardware available to me is one with 16-bit bus, so I have no
      way to test the 8-bit bus mode, however it is unlikely this ever really
      worked anyway. If the 8-bit bus mode is ever required, it can be easily
      added by adjusting the 16-bit accessors to do 2 consecutive accesses,
      which is how this should have been done from the beginning.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: Petr Stetiar <ynezz@true.cz>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69233bba
    • Matthieu Baerts's avatar
      mptcp: select CRYPTO · 357b41ca
      Matthieu Baerts authored
      Without this modification and if CRYPTO is not selected, we have this
      warning:
      
        WARNING: unmet direct dependencies detected for CRYPTO_LIB_SHA256
          Depends on [n]: CRYPTO [=n]
          Selected by [y]:
          - MPTCP [=y] && NET [=y] && INET [=y]
      
      MPTCP selects CRYPTO_LIB_SHA256 which seems to depend on CRYPTO. CRYPTO
      is now selected to avoid this issue.
      
      Even though the config system prints that warning, it looks like
      sha256.c is compiled and linked even without CONFIG_CRYPTO. Since MPTCP
      will end up needing CONFIG_CRYPTO anyway in future commits -- currently
      in preparation for net-next -- we propose to add it now to fix the
      warning.
      
      The dependency in the config system comes from the fact that
      CRYPTO_LIB_SHA256 is defined in "lib/crypto/Kconfig" which is sourced
      from "crypto/Kconfig" only if CRYPTO is selected.
      
      Fixes: 65492c5a (mptcp: move from sha1 (v0) to sha256 (v1))
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      357b41ca
    • David S. Miller's avatar
      Merge branch 'bonding-fix-bonding-interface-bugs' · c230978f
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      bonding: fix bonding interface bugs
      
      This patchset fixes lockdep problem in bonding interface
      
      1. The first patch is to add missing netdev_update_lockdep_key().
      After bond_release(), netdev_update_lockdep_key() should be called.
      But both ioctl path and attribute path don't call
      netdev_update_lockdep_key().
      This patch adds missing netdev_update_lockdep_key().
      
      2. The second patch is to export netdev_next_lower_dev_rcu symbol.
      netdev_next_lower_dev_rcu() is useful to implement the function,
      which is to walk their all lower interfaces.
      This patch is actually a preparing patch for the third patch.
      
      3. The last patch is to fix lockdep waring in bond_get_stats().
      The stats_lock uses a dynamic lockdep key.
      So, after "nomaster" operation, updating the dynamic lockdep key
      routine is needed. but it doesn't
      So, lockdep warning occurs.
      
      Change log:
      v1 -> v2:
       - Update headline from "fix bonding interface bugs"
         to "bonding: fix bonding interface bugs"
       - Drop a patch("bonding: do not collect slave's stats")
       - Add new patches
         - ("net: export netdev_next_lower_dev_rcu()")
         - ("bonding: fix lockdep warning in bond_get_stats()")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c230978f
    • Taehee Yoo's avatar
      bonding: fix lockdep warning in bond_get_stats() · b3e80d44
      Taehee Yoo authored
      In the "struct bonding", there is stats_lock.
      This lock protects "bond_stats" in the "struct bonding".
      bond_stats is updated in the bond_get_stats() and this function would be
      executed concurrently. So, the lock is needed.
      
      Bonding interfaces would be nested.
      So, either stats_lock should use dynamic lockdep class key or stats_lock
      should be used by spin_lock_nested(). In the current code, stats_lock is
      using a dynamic lockdep class key.
      But there is no updating stats_lock_key routine So, lockdep warning
      will occur.
      
      Test commands:
          ip link add bond0 type bond
          ip link add bond1 type bond
          ip link set bond0 master bond1
          ip link set bond0 nomaster
          ip link set bond1 master bond0
      
      Splat looks like:
      [   38.420603][  T957] 5.5.0+ #394 Not tainted
      [   38.421074][  T957] ------------------------------------------------------
      [   38.421837][  T957] ip/957 is trying to acquire lock:
      [   38.422399][  T957] ffff888063262cd8 (&bond->stats_lock_key#2){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding]
      [   38.423528][  T957]
      [   38.423528][  T957] but task is already holding lock:
      [   38.424526][  T957] ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding]
      [   38.426075][  T957]
      [   38.426075][  T957] which lock already depends on the new lock.
      [   38.426075][  T957]
      [   38.428536][  T957]
      [   38.428536][  T957] the existing dependency chain (in reverse order) is:
      [   38.429475][  T957]
      [   38.429475][  T957] -> #1 (&bond->stats_lock_key){+.+.}:
      [   38.430273][  T957]        _raw_spin_lock+0x30/0x70
      [   38.430812][  T957]        bond_get_stats+0x90/0x4d0 [bonding]
      [   38.431451][  T957]        dev_get_stats+0x1ec/0x270
      [   38.432088][  T957]        bond_get_stats+0x1a5/0x4d0 [bonding]
      [   38.432767][  T957]        dev_get_stats+0x1ec/0x270
      [   38.433322][  T957]        rtnl_fill_stats+0x44/0xbe0
      [   38.433866][  T957]        rtnl_fill_ifinfo+0xeb2/0x3720
      [   38.434474][  T957]        rtmsg_ifinfo_build_skb+0xca/0x170
      [   38.435081][  T957]        rtmsg_ifinfo_event.part.33+0x1b/0xb0
      [   38.436848][  T957]        rtnetlink_event+0xcd/0x120
      [   38.437455][  T957]        notifier_call_chain+0x90/0x160
      [   38.438067][  T957]        netdev_change_features+0x74/0xa0
      [   38.438708][  T957]        bond_compute_features.isra.45+0x4e6/0x6f0 [bonding]
      [   38.439522][  T957]        bond_enslave+0x3639/0x47b0 [bonding]
      [   38.440225][  T957]        do_setlink+0xaab/0x2ef0
      [   38.440786][  T957]        __rtnl_newlink+0x9c5/0x1270
      [   38.441463][  T957]        rtnl_newlink+0x65/0x90
      [   38.442075][  T957]        rtnetlink_rcv_msg+0x4a8/0x890
      [   38.442774][  T957]        netlink_rcv_skb+0x121/0x350
      [   38.443451][  T957]        netlink_unicast+0x42e/0x610
      [   38.444282][  T957]        netlink_sendmsg+0x65a/0xb90
      [   38.444992][  T957]        ____sys_sendmsg+0x5ce/0x7a0
      [   38.445679][  T957]        ___sys_sendmsg+0x10f/0x1b0
      [   38.446365][  T957]        __sys_sendmsg+0xc6/0x150
      [   38.447007][  T957]        do_syscall_64+0x99/0x4f0
      [   38.447668][  T957]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   38.448538][  T957]
      [   38.448538][  T957] -> #0 (&bond->stats_lock_key#2){+.+.}:
      [   38.449554][  T957]        __lock_acquire+0x2d8d/0x3de0
      [   38.450148][  T957]        lock_acquire+0x164/0x3b0
      [   38.450711][  T957]        _raw_spin_lock+0x30/0x70
      [   38.451292][  T957]        bond_get_stats+0x90/0x4d0 [bonding]
      [   38.451950][  T957]        dev_get_stats+0x1ec/0x270
      [   38.452425][  T957]        bond_get_stats+0x1a5/0x4d0 [bonding]
      [   38.453362][  T957]        dev_get_stats+0x1ec/0x270
      [   38.453825][  T957]        rtnl_fill_stats+0x44/0xbe0
      [   38.454390][  T957]        rtnl_fill_ifinfo+0xeb2/0x3720
      [   38.456257][  T957]        rtmsg_ifinfo_build_skb+0xca/0x170
      [   38.456998][  T957]        rtmsg_ifinfo_event.part.33+0x1b/0xb0
      [   38.459351][  T957]        rtnetlink_event+0xcd/0x120
      [   38.460086][  T957]        notifier_call_chain+0x90/0x160
      [   38.460829][  T957]        netdev_change_features+0x74/0xa0
      [   38.461752][  T957]        bond_compute_features.isra.45+0x4e6/0x6f0 [bonding]
      [   38.462705][  T957]        bond_enslave+0x3639/0x47b0 [bonding]
      [   38.463476][  T957]        do_setlink+0xaab/0x2ef0
      [   38.464141][  T957]        __rtnl_newlink+0x9c5/0x1270
      [   38.464897][  T957]        rtnl_newlink+0x65/0x90
      [   38.465522][  T957]        rtnetlink_rcv_msg+0x4a8/0x890
      [   38.466215][  T957]        netlink_rcv_skb+0x121/0x350
      [   38.466895][  T957]        netlink_unicast+0x42e/0x610
      [   38.467583][  T957]        netlink_sendmsg+0x65a/0xb90
      [   38.468285][  T957]        ____sys_sendmsg+0x5ce/0x7a0
      [   38.469202][  T957]        ___sys_sendmsg+0x10f/0x1b0
      [   38.469884][  T957]        __sys_sendmsg+0xc6/0x150
      [   38.470587][  T957]        do_syscall_64+0x99/0x4f0
      [   38.471245][  T957]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   38.472093][  T957]
      [   38.472093][  T957] other info that might help us debug this:
      [   38.472093][  T957]
      [   38.473438][  T957]  Possible unsafe locking scenario:
      [   38.473438][  T957]
      [   38.474898][  T957]        CPU0                    CPU1
      [   38.476234][  T957]        ----                    ----
      [   38.480171][  T957]   lock(&bond->stats_lock_key);
      [   38.480808][  T957]                                lock(&bond->stats_lock_key#2);
      [   38.481791][  T957]                                lock(&bond->stats_lock_key);
      [   38.482754][  T957]   lock(&bond->stats_lock_key#2);
      [   38.483416][  T957]
      [   38.483416][  T957]  *** DEADLOCK ***
      [   38.483416][  T957]
      [   38.484505][  T957] 3 locks held by ip/957:
      [   38.485048][  T957]  #0: ffffffffbccf6230 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890
      [   38.486198][  T957]  #1: ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding]
      [   38.487625][  T957]  #2: ffffffffbc9254c0 (rcu_read_lock){....}, at: bond_get_stats+0x5/0x4d0 [bonding]
      [   38.488897][  T957]
      [   38.488897][  T957] stack backtrace:
      [   38.489646][  T957] CPU: 1 PID: 957 Comm: ip Not tainted 5.5.0+ #394
      [   38.490497][  T957] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   38.492810][  T957] Call Trace:
      [   38.493219][  T957]  dump_stack+0x96/0xdb
      [   38.493709][  T957]  check_noncircular+0x371/0x450
      [   38.494344][  T957]  ? lookup_address+0x60/0x60
      [   38.494923][  T957]  ? print_circular_bug.isra.35+0x310/0x310
      [   38.495699][  T957]  ? hlock_class+0x130/0x130
      [   38.496334][  T957]  ? __lock_acquire+0x2d8d/0x3de0
      [   38.496979][  T957]  __lock_acquire+0x2d8d/0x3de0
      [   38.497607][  T957]  ? register_lock_class+0x14d0/0x14d0
      [   38.498333][  T957]  ? check_chain_key+0x236/0x5d0
      [   38.499003][  T957]  lock_acquire+0x164/0x3b0
      [   38.499800][  T957]  ? bond_get_stats+0x90/0x4d0 [bonding]
      [   38.500706][  T957]  _raw_spin_lock+0x30/0x70
      [   38.501435][  T957]  ? bond_get_stats+0x90/0x4d0 [bonding]
      [   38.502311][  T957]  bond_get_stats+0x90/0x4d0 [bonding]
      [ ... ]
      
      But, there is another problem.
      The dynamic lockdep class key is protected by RTNL, but bond_get_stats()
      would be called outside of RTNL.
      So, it would use an invalid dynamic lockdep class key.
      
      In order to fix this issue, stats_lock uses spin_lock_nested() instead of
      a dynamic lockdep key.
      The bond_get_stats() calls bond_get_lowest_level_rcu() to get the correct
      nest level value, which will be used by spin_lock_nested().
      The "dev->lower_level" indicates lower nest level value, but this value
      is invalid outside of RTNL.
      So, bond_get_lowest_level_rcu() returns valid lower nest level value in
      the RCU critical section.
      bond_get_lowest_level_rcu() will be work only when LOCKDEP is enabled.
      
      Fixes: 089bca2c ("bonding: use dynamic lockdep key instead of subclass")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3e80d44
    • Taehee Yoo's avatar
      net: export netdev_next_lower_dev_rcu() · 7151affe
      Taehee Yoo authored
      netdev_next_lower_dev_rcu() will be used to implement a function,
      which is to walk all lower interfaces.
      There are already functions that they walk their lower interface.
      (netdev_walk_all_lower_dev_rcu, netdev_walk_all_lower_dev()).
      But, there would be cases that couldn't be covered by given
      netdev_walk_all_lower_dev_{rcu}() function.
      So, some modules would want to implement own function,
      which is to walk all lower interfaces.
      
      In the next patch, netdev_next_lower_dev_rcu() will be used.
      In addition, this patch removes two unused prototypes in netdevice.h.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7151affe
    • Taehee Yoo's avatar
      bonding: add missing netdev_update_lockdep_key() · 064ff66e
      Taehee Yoo authored
      After bond_release(), netdev_update_lockdep_key() should be called.
      But both ioctl path and attribute path don't call
      netdev_update_lockdep_key().
      This patch adds missing netdev_update_lockdep_key().
      
      Test commands:
          ip link add bond0 type bond
          ip link add bond1 type bond
          ifenslave bond0 bond1
          ifenslave -d bond0 bond1
          ifenslave bond1 bond0
      
      Splat looks like:
      [   29.501182][ T1046] WARNING: possible circular locking dependency detected
      [   29.501945][ T1039] hardirqs last disabled at (1962): [<ffffffffac6c807f>] handle_mm_fault+0x13f/0x700
      [   29.503442][ T1046] 5.5.0+ #322 Not tainted
      [   29.503447][ T1046] ------------------------------------------------------
      [   29.504277][ T1039] softirqs last  enabled at (1180): [<ffffffffade00678>] __do_softirq+0x678/0x981
      [   29.505443][ T1046] ifenslave/1046 is trying to acquire lock:
      [   29.505886][ T1039] softirqs last disabled at (1169): [<ffffffffac19c18a>] irq_exit+0x17a/0x1a0
      [   29.509997][ T1046] ffff88805d5da280 (&dev->addr_list_lock_key#3){+...}, at: dev_mc_sync_multiple+0x95/0x120
      [   29.511243][ T1046]
      [   29.511243][ T1046] but task is already holding lock:
      [   29.512192][ T1046] ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding]
      [   29.514124][ T1046]
      [   29.514124][ T1046] which lock already depends on the new lock.
      [   29.514124][ T1046]
      [   29.517297][ T1046]
      [   29.517297][ T1046] the existing dependency chain (in reverse order) is:
      [   29.518231][ T1046]
      [   29.518231][ T1046] -> #1 (&dev->addr_list_lock_key#4){+...}:
      [   29.519076][ T1046]        _raw_spin_lock+0x30/0x70
      [   29.519588][ T1046]        dev_mc_sync_multiple+0x95/0x120
      [   29.520208][ T1046]        bond_enslave+0x448d/0x47b0 [bonding]
      [   29.520862][ T1046]        bond_option_slaves_set+0x1a3/0x370 [bonding]
      [   29.521640][ T1046]        __bond_opt_set+0x1ff/0xbb0 [bonding]
      [   29.522438][ T1046]        __bond_opt_set_notify+0x2b/0xf0 [bonding]
      [   29.523251][ T1046]        bond_opt_tryset_rtnl+0x92/0xf0 [bonding]
      [   29.524082][ T1046]        bonding_sysfs_store_option+0x8a/0xf0 [bonding]
      [   29.524959][ T1046]        kernfs_fop_write+0x276/0x410
      [   29.525620][ T1046]        vfs_write+0x197/0x4a0
      [   29.526218][ T1046]        ksys_write+0x141/0x1d0
      [   29.526818][ T1046]        do_syscall_64+0x99/0x4f0
      [   29.527430][ T1046]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   29.528265][ T1046]
      [   29.528265][ T1046] -> #0 (&dev->addr_list_lock_key#3){+...}:
      [   29.529272][ T1046]        __lock_acquire+0x2d8d/0x3de0
      [   29.529935][ T1046]        lock_acquire+0x164/0x3b0
      [   29.530638][ T1046]        _raw_spin_lock+0x30/0x70
      [   29.531187][ T1046]        dev_mc_sync_multiple+0x95/0x120
      [   29.531790][ T1046]        bond_enslave+0x448d/0x47b0 [bonding]
      [   29.532451][ T1046]        bond_option_slaves_set+0x1a3/0x370 [bonding]
      [   29.533163][ T1046]        __bond_opt_set+0x1ff/0xbb0 [bonding]
      [   29.533789][ T1046]        __bond_opt_set_notify+0x2b/0xf0 [bonding]
      [   29.534595][ T1046]        bond_opt_tryset_rtnl+0x92/0xf0 [bonding]
      [   29.535500][ T1046]        bonding_sysfs_store_option+0x8a/0xf0 [bonding]
      [   29.536379][ T1046]        kernfs_fop_write+0x276/0x410
      [   29.537057][ T1046]        vfs_write+0x197/0x4a0
      [   29.537640][ T1046]        ksys_write+0x141/0x1d0
      [   29.538251][ T1046]        do_syscall_64+0x99/0x4f0
      [   29.538870][ T1046]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   29.539659][ T1046]
      [   29.539659][ T1046] other info that might help us debug this:
      [   29.539659][ T1046]
      [   29.540953][ T1046]  Possible unsafe locking scenario:
      [   29.540953][ T1046]
      [   29.541883][ T1046]        CPU0                    CPU1
      [   29.542540][ T1046]        ----                    ----
      [   29.543209][ T1046]   lock(&dev->addr_list_lock_key#4);
      [   29.543880][ T1046]                                lock(&dev->addr_list_lock_key#3);
      [   29.544873][ T1046]                                lock(&dev->addr_list_lock_key#4);
      [   29.545863][ T1046]   lock(&dev->addr_list_lock_key#3);
      [   29.546525][ T1046]
      [   29.546525][ T1046]  *** DEADLOCK ***
      [   29.546525][ T1046]
      [   29.547542][ T1046] 5 locks held by ifenslave/1046:
      [   29.548196][ T1046]  #0: ffff88806044c478 (sb_writers#5){.+.+}, at: vfs_write+0x3bb/0x4a0
      [   29.549248][ T1046]  #1: ffff88805af00890 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cf/0x410
      [   29.550343][ T1046]  #2: ffff88805b8b54b0 (kn->count#157){.+.+}, at: kernfs_fop_write+0x1f2/0x410
      [   29.551575][ T1046]  #3: ffffffffaecf4cf0 (rtnl_mutex){+.+.}, at: bond_opt_tryset_rtnl+0x5f/0xf0 [bonding]
      [   29.552819][ T1046]  #4: ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding]
      [   29.554175][ T1046]
      [   29.554175][ T1046] stack backtrace:
      [   29.554907][ T1046] CPU: 0 PID: 1046 Comm: ifenslave Not tainted 5.5.0+ #322
      [   29.555854][ T1046] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   29.557064][ T1046] Call Trace:
      [   29.557504][ T1046]  dump_stack+0x96/0xdb
      [   29.558054][ T1046]  check_noncircular+0x371/0x450
      [   29.558723][ T1046]  ? print_circular_bug.isra.35+0x310/0x310
      [   29.559486][ T1046]  ? hlock_class+0x130/0x130
      [   29.560100][ T1046]  ? __lock_acquire+0x2d8d/0x3de0
      [   29.560761][ T1046]  __lock_acquire+0x2d8d/0x3de0
      [   29.561366][ T1046]  ? register_lock_class+0x14d0/0x14d0
      [   29.562045][ T1046]  ? find_held_lock+0x39/0x1d0
      [   29.562641][ T1046]  lock_acquire+0x164/0x3b0
      [   29.563199][ T1046]  ? dev_mc_sync_multiple+0x95/0x120
      [   29.563872][ T1046]  _raw_spin_lock+0x30/0x70
      [   29.564464][ T1046]  ? dev_mc_sync_multiple+0x95/0x120
      [   29.565146][ T1046]  dev_mc_sync_multiple+0x95/0x120
      [   29.565793][ T1046]  bond_enslave+0x448d/0x47b0 [bonding]
      [   29.566487][ T1046]  ? bond_update_slave_arr+0x940/0x940 [bonding]
      [   29.567279][ T1046]  ? bstr_printf+0xc20/0xc20
      [   29.567857][ T1046]  ? stack_trace_consume_entry+0x160/0x160
      [   29.568614][ T1046]  ? deactivate_slab.isra.77+0x2c5/0x800
      [   29.569320][ T1046]  ? check_chain_key+0x236/0x5d0
      [   29.569939][ T1046]  ? sscanf+0x93/0xc0
      [   29.570442][ T1046]  ? vsscanf+0x1e20/0x1e20
      [   29.571003][ T1046]  bond_option_slaves_set+0x1a3/0x370 [bonding]
      [ ... ]
      
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      064ff66e
    • Christophe JAILLET's avatar
      NFC: pn544: Fix a typo in a debug message · c4c10784
      Christophe JAILLET authored
      The ending character of the string shoulb be \n, not \b.
      
      Fixes: 17936b43 ("NFC: Standardize logging style")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c10784
    • Michal Kubecek's avatar
      ethtool: fix application of verbose no_mask bitset · 66991703
      Michal Kubecek authored
      A bitset without mask in a _SET request means we want exactly the bits in
      the bitset to be set. This works correctly for compact format but when
      verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
      bits present in the request bitset but does not clear the rest. This can
      cause incorrect results like
      
        lion:~ # ethtool eth0 | grep Wake
                Supports Wake-on: pumbg
                Wake-on: g
        lion:~ # ethtool -s eth0 wol u
        lion:~ # ethtool eth0 | grep Wake
                Supports Wake-on: pumbg
                Wake-on: ug
      
      when the second ethtool command issues request
      
      ETHTOOL_MSG_WOL_SET
          ETHTOOL_A_WOL_HEADER
              ETHTOOL_A_HEADER_DEV_NAME = "eth0"
          ETHTOOL_A_WOL_MODES
              ETHTOOL_A_BITSET_NOMASK
              ETHTOOL_A_BITSET_BITS
                  ETHTOOL_A_BITSET_BITS_BIT
                      ETHTOOL_BITSET_BIT_INDEX = 1
      
      Fix the logic by clearing the whole target bitmap before we start iterating
      through the request bits.
      
      Fixes: 10b518d4 ("ethtool: netlink bitset handling")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66991703
    • Florian Fainelli's avatar
      net: dsa: b53: Ensure the default VID is untagged · d965a543
      Florian Fainelli authored
      We need to ensure that the default VID is untagged otherwise the switch
      will be sending tagged frames and the results can be problematic. This
      is especially true with b53 switches that use VID 0 as their default
      VLAN since VID 0 has a special meaning.
      
      Fixes: fea83353 ("net: dsa: b53: Fix default VLAN ID")
      Fixes: 061f6a50 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d965a543
    • David S. Miller's avatar
      Merge branch 'wireguard-fixes' · 82d81bb0
      David S. Miller authored
      Jason A. Donenfeld says:
      
      ====================
      wireguard fixes for 5.6-rc2
      
      Here are four fixes for wireguard collected since rc1:
      
      1) Some small cleanups to the test suite to help massively parallel
         builds.
      
      2) A change in how we reset our load calculation to avoid a more
         expensive comparison, suggested by Matt Dunwoodie.
      
      3) I've been loading more and more of wireguard's surface into
         syzkaller, trying to get our coverage as complete as possible,
         leading in this case to a fix for mtu=0 devices.
      
      4) A removal of superfluous code, pointed out by Eric Dumazet.
      
      v2 fixes a logical problem in the patch for (3) pointed out by Eric Dumazet. v3
      replaces some non-obvious bitmath in (3) with a more obvious expression, and
      adds patch (4).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82d81bb0
    • Jason A. Donenfeld's avatar
      wireguard: socket: remove extra call to synchronize_net · 1fbc33b0
      Jason A. Donenfeld authored
      synchronize_net() is a wrapper around synchronize_rcu(), so there's no
      point in having synchronize_net and synchronize_rcu back to back,
      despite the documentation comment suggesting maybe it's somewhat useful,
      "Wait for packets currently being received to be done." This commit
      removes the extra call.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fbc33b0
    • Jason A. Donenfeld's avatar
      wireguard: send: account for mtu=0 devices · 175f1ca9
      Jason A. Donenfeld authored
      It turns out there's an easy way to get packets queued up while still
      having an MTU of zero, and that's via persistent keep alive. This commit
      makes sure that in whatever condition, we don't wind up dividing by
      zero. Note that an MTU of zero for a wireguard interface is something
      quasi-valid, so I don't think the correct fix is to limit it via
      min_mtu. This can be reproduced easily with:
      
      ip link add wg0 type wireguard
      ip link add wg1 type wireguard
      ip link set wg0 up mtu 0
      ip link set wg1 up
      wg set wg0 private-key <(wg genkey)
      wg set wg1 listen-port 1 private-key <(wg genkey) peer $(wg show wg0 public-key)
      wg set wg0 peer $(wg show wg1 public-key) persistent-keepalive 1 endpoint 127.0.0.1:1
      
      However, while min_mtu=0 seems fine, it makes sense to restrict the
      max_mtu. This commit also restricts the maximum MTU to the greatest
      number for which rounding up to the padding multiple won't overflow a
      signed integer. Packets this large were always rejected anyway
      eventually, due to checks deeper in, but it seems more sound not to even
      let the administrator configure something that won't work anyway.
      
      We use this opportunity to clean up this function a bit so that it's
      clear which paths we're expecting.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      175f1ca9
    • Jason A. Donenfeld's avatar
      wireguard: receive: reset last_under_load to zero · 2a8a4df3
      Jason A. Donenfeld authored
      This is a small optimization that prevents more expensive comparisons
      from happening when they are no longer necessary, by clearing the
      last_under_load variable whenever we wind up in a state where we were
      under load but we no longer are.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Suggested-by: default avatarMatt Dunwoodie <ncon@noconroy.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a8a4df3
    • Jason A. Donenfeld's avatar
      wireguard: selftests: reduce complexity and fix make races · 04ddf120
      Jason A. Donenfeld authored
      This gives us fewer dependencies and shortens build time, fixes up some
      hash checking race conditions, and also fixes missing directory creation
      that caused issues on massively parallel builds.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04ddf120
    • Mat Martineau's avatar
      mptcp: Protect subflow socket options before connection completes · b6e4a1ae
      Mat Martineau authored
      Userspace should not be able to directly manipulate subflow socket
      options before a connection is established since it is not yet known if
      it will be an MPTCP subflow or a TCP fallback subflow. TCP fallback
      subflows can be more directly controlled by userspace because they are
      regular TCP connections, while MPTCP subflow sockets need to be
      configured for the specific needs of MPTCP. Use the same logic as
      sendmsg/recvmsg to ensure that socket option calls are only passed
      through to known TCP fallback subflows.
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6e4a1ae