1. 06 Jan, 2020 2 commits
    • Xin Long's avatar
      sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY · be7a7729
      Xin Long authored
      This patch is to fix a memleak caused by no place to free cmd->obj.chunk
      for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to
      process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq
      with an allocated chunk in cmd->obj.chunk.
      
      So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on
      the cmd seq when any cmd returns error. While at it, also remove 'nomem'
      label.
      
      Reported-by: syzbot+107c4aff5f392bf1517f@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be7a7729
    • Ying Xue's avatar
      tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit error · a7869e5f
      Ying Xue authored
      syzbot found the following crash on:
      =====================================================
      BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline]
      BUG: KMSAN: uninit-value in nlmsg_parse_deprecated
      include/net/netlink.h:706 [inline]
      BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0
      net/tipc/netlink_compat.c:215
      CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1c9/0x220 lib/dump_stack.c:118
        kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
        __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245
        __nlmsg_parse include/net/netlink.h:661 [inline]
        nlmsg_parse_deprecated include/net/netlink.h:706 [inline]
        __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215
        tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x444179
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20
      R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
        kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
        kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86
        slab_alloc_node mm/slub.c:2774 [inline]
        __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382
        __kmalloc_reserve net/core/skbuff.c:141 [inline]
        __alloc_skb+0x309/0xa50 net/core/skbuff.c:209
        alloc_skb include/linux/skbuff.h:1049 [inline]
        nlmsg_new include/net/netlink.h:888 [inline]
        tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      =====================================================
      
      The complaint above occurred because the memory region pointed by attrbuf
      variable was not initialized. To eliminate this warning, we use kcalloc()
      rather than kmalloc_array() to allocate memory for attrbuf.
      
      Reported-by: syzbot+b1fd2bf2c89d8407e15f@syzkaller.appspotmail.com
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7869e5f
  2. 05 Jan, 2020 4 commits
    • Stephen Boyd's avatar
      macb: Don't unregister clks unconditionally · d89091a4
      Stephen Boyd authored
      The only clk init function in this driver that register a clk is
      fu540_c000_clk_init(), and thus we need to unregister the clk when this
      driver is removed on that platform. Other init functions, for example
      macb_clk_init(), don't register clks and therefore we shouldn't
      unregister the clks when this driver is removed. Convert this
      registration path to devm so it gets auto-unregistered when this driver
      is removed and drop the clk_unregister() calls in driver remove (and
      error paths) so that we don't erroneously remove a clk from the system
      that isn't registered by this driver.
      
      Otherwise we get strange crashes with a use-after-free when the
      devm_clk_get() call in macb_clk_init() calls clk_put() on a clk pointer
      that has become invalid because it is freed in clk_unregister().
      
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Cc: Yash Shah <yash.shah@sifive.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Fixes: c218ad55 ("macb: Add support for SiFive FU540-C000")
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d89091a4
    • Krzysztof Kozlowski's avatar
      MAINTAINERS: Drop obsolete entries from Samsung sxgbe ethernet driver · 15a821f0
      Krzysztof Kozlowski authored
      The emails to ks.giri@samsung.com and vipul.pandya@samsung.com bounce
      with 550 error code:
      
          host mailin.samsung.com[203.254.224.12] said: 550
          5.1.1 Recipient address rejected: User unknown (in reply to RCPT TO
          command)"
      
      Drop Girish K S and Vipul Pandya from sxgbe maintainers entry.
      
      Cc: Byungho An <bh74.an@samsung.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15a821f0
    • Carl Huang's avatar
      net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue · ce57785b
      Carl Huang authored
      The len used for skb_put_padto is wrong, it need to add len of hdr.
      
      In qrtr_node_enqueue, local variable size_t len is assign with
      skb->len, then skb_push(skb, sizeof(*hdr)) will add skb->len with
      sizeof(*hdr), so local variable size_t len is not same with skb->len
      after skb_push(skb, sizeof(*hdr)).
      
      Then the purpose of skb_put_padto(skb, ALIGN(len, 4)) is to add add
      pad to the end of the skb's data if skb->len is not aligned to 4, but
      unfortunately it use len instead of skb->len, at this line, skb->len
      is 32 bytes(sizeof(*hdr)) more than len, for example, len is 3 bytes,
      then skb->len is 35 bytes(3 + 32), and ALIGN(len, 4) is 4 bytes, so
      __skb_put_padto will do nothing after check size(35) < len(4), the
      correct value should be 36(sizeof(*hdr) + ALIGN(len, 4) = 32 + 4),
      then __skb_put_padto will pass check size(35) < len(36) and add 1 byte
      to the end of skb's data, then logic is correct.
      
      function of skb_push:
      void *skb_push(struct sk_buff *skb, unsigned int len)
      {
      	skb->data -= len;
      	skb->len  += len;
      	if (unlikely(skb->data < skb->head))
      		skb_under_panic(skb, len, __builtin_return_address(0));
      	return skb->data;
      }
      
      function of skb_put_padto
      static inline int skb_put_padto(struct sk_buff *skb, unsigned int len)
      {
      	return __skb_put_padto(skb, len, true);
      }
      
      function of __skb_put_padto
      static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len,
      				  bool free_on_error)
      {
      	unsigned int size = skb->len;
      
      	if (unlikely(size < len)) {
      		len -= size;
      		if (__skb_pad(skb, len, free_on_error))
      			return -ENOMEM;
      		__skb_put(skb, len);
      	}
      	return 0;
      }
      Signed-off-by: default avatarCarl Huang <cjhuang@codeaurora.org>
      Signed-off-by: default avatarWen Gong <wgong@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce57785b
    • Fenghua Yu's avatar
      drivers/net/b44: Change to non-atomic bit operations on pwol_mask · f11421ba
      Fenghua Yu authored
      Atomic operations that span cache lines are super-expensive on x86
      (not just to the current processor, but also to other processes as all
      memory operations are blocked until the operation completes). Upcoming
      x86 processors have a switch to cause such operations to generate a #AC
      trap. It is expected that some real time systems will enable this mode
      in BIOS.
      
      In preparation for this, it is necessary to fix code that may execute
      atomic instructions with operands that cross cachelines because the #AC
      trap will crash the kernel.
      
      Since "pwol_mask" is local and never exposed to concurrency, there is
      no need to set bits in pwol_mask using atomic operations.
      
      Directly operate on the byte which contains the bit instead of using
      __set_bit() to avoid any big endian concern due to type cast to
      unsigned long in __set_bit().
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f11421ba
  3. 03 Jan, 2020 10 commits
  4. 02 Jan, 2020 11 commits
  5. 31 Dec, 2019 11 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 738d2902
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix big endian overflow in nf_flow_table, from Arnd Bergmann.
      
       2) Fix port selection on big endian in nft_tproxy, from Phil Sutter.
      
       3) Fix precision tracking for unbound scalars in bpf verifier, from
          Daniel Borkmann.
      
       4) Fix integer overflow in socket rcvbuf check in UDP, from Antonio
          Messina.
      
       5) Do not perform a neigh confirmation during a pmtu update over a
          tunnel, from Hangbin Liu.
      
       6) Fix DMA mapping leak in dpaa_eth driver, from Madalin Bucur.
      
       7) Various PTP fixes for sja1105 dsa driver, from Vladimir Oltean.
      
       8) Add missing to dummy definition of of_mdiobus_child_is_phy(), from
          Geert Uytterhoeven
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
        hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename()
        net/sched: add delete_empty() to filters and use it in cls_flower
        tcp: Fix highest_sack and highest_sack_seq
        ptp: fix the race between the release of ptp_clock and cdev
        net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/S
        Documentation: net: dsa: sja1105: Remove text about taprio base-time limitation
        net: dsa: sja1105: Remove restriction of zero base-time for taprio offload
        net: dsa: sja1105: Really make the PTP command read-write
        net: dsa: sja1105: Take PTP egress timestamp by port, not mgmt slot
        cxgb4/cxgb4vf: fix flow control display for auto negotiation
        mlxsw: spectrum: Use dedicated policer for VRRP packets
        mlxsw: spectrum_router: Skip loopback RIFs during MAC validation
        net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs
        net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device
        net_sched: sch_fq: properly set sk->sk_pacing_status
        bnx2x: Fix accounting of vlan resources among the PFs
        bnx2x: Use appropriate define for vlan credit
        of: mdio: Add missing inline to of_mdiobus_child_is_phy() dummy
        net: phy: aquantia: add suspend / resume ops for AQR105
        dpaa_eth: fix DMA mapping leak
        ...
      738d2902
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-fixes-for-5.5' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 · c5c928c6
      Linus Torvalds authored
      Pull tomoyo fixes from Tetsuo Handa:
       "Two bug fixes:
      
         - Suppress RCU warning at list_for_each_entry_rcu()
      
         - Don't use fancy names on sockets"
      
      * tag 'tomoyo-fixes-for-5.5' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
        tomoyo: Suppress RCU warning at list_for_each_entry_rcu().
        tomoyo: Don't use nifty names on sockets.
      c5c928c6
    • Taehee Yoo's avatar
      hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() · 04b69426
      Taehee Yoo authored
      hsr slave interfaces don't have debugfs directory.
      So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name
      is changed.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
          ip link set dummy0 name ap
      
      Splat looks like:
      [21071.899367][T22666] ap: renamed from dummy0
      [21071.914005][T22666] ==================================================================
      [21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666
      [21071.926941][T22666]
      [21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240
      [21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [21071.935094][T22666] Call Trace:
      [21071.935867][T22666]  dump_stack+0x96/0xdb
      [21071.936687][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.937774][T22666]  print_address_description.constprop.5+0x1be/0x360
      [21071.939019][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940081][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940949][T22666]  __kasan_report+0x12a/0x16f
      [21071.941758][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.942674][T22666]  kasan_report+0xe/0x20
      [21071.943325][T22666]  hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.944187][T22666]  hsr_netdev_notify+0x1fe/0x9b0 [hsr]
      [21071.945052][T22666]  ? __module_text_address+0x13/0x140
      [21071.945897][T22666]  notifier_call_chain+0x90/0x160
      [21071.946743][T22666]  dev_change_name+0x419/0x840
      [21071.947496][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.948600][T22666]  ? netdev_adjacent_rename_links+0x280/0x280
      [21071.949577][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.950672][T22666]  ? lock_downgrade+0x6e0/0x6e0
      [21071.951345][T22666]  ? do_setlink+0x811/0x2ef0
      [21071.951991][T22666]  do_setlink+0x811/0x2ef0
      [21071.952613][T22666]  ? is_bpf_text_address+0x81/0xe0
      [ ... ]
      
      Reported-by: syzbot+9328206518f08318a5fd@syzkaller.appspotmail.com
      Fixes: 4c2d5e33 ("hsr: rename debugfs file when interface name is changed")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04b69426
    • Davide Caratti's avatar
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti authored
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Tested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b72a08
    • Cambda Zhu's avatar
      tcp: Fix highest_sack and highest_sack_seq · 85369750
      Cambda Zhu authored
      >From commit 50895b9d ("tcp: highest_sack fix"), the logic about
      setting tp->highest_sack to the head of the send queue was removed.
      Of course the logic is error prone, but it is logical. Before we
      remove the pointer to the highest sack skb and use the seq instead,
      we need to set tp->highest_sack to NULL when there is no skb after
      the last sack, and then replace NULL with the real skb when new skb
      inserted into the rtx queue, because the NULL means the highest sack
      seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
      the next ACK with sack option will increase tp->reordering unexpectedly.
      
      This patch sets tp->highest_sack to the tail of the rtx queue if
      it's NULL and new data is sent. The patch keeps the rule that the
      highest_sack can only be maintained by sack processing, except for
      this only case.
      
      Fixes: 50895b9d ("tcp: highest_sack fix")
      Signed-off-by: default avatarCambda Zhu <cambda@linux.alibaba.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85369750
    • Vladis Dronov's avatar
      ptp: fix the race between the release of ptp_clock and cdev · a33121e5
      Vladis Dronov authored
      In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
      device is removed, closing this file leads to a race. This reproduces
      easily in a kvm virtual machine:
      
      ts# cat openptp0.c
      int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
      ts# uname -r
      5.5.0-rc3-46cf053e
      ts# cat /proc/cmdline
      ... slub_debug=FZP
      ts# modprobe ptp_kvm
      ts# ./openptp0 &
      [1] 670
      opened /dev/ptp0, sleeping 10s...
      ts# rmmod ptp_kvm
      ts# ls /dev/ptp*
      ls: cannot access '/dev/ptp*': No such file or directory
      ts# ...woken up
      [   48.010809] general protection fault: 0000 [#1] SMP
      [   48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
      [   48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
      [   48.016270] RIP: 0010:module_put.part.0+0x7/0x80
      [   48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
      [   48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
      [   48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
      [   48.019470] ...                                              ^^^ a slub poison
      [   48.023854] Call Trace:
      [   48.024050]  __fput+0x21f/0x240
      [   48.024288]  task_work_run+0x79/0x90
      [   48.024555]  do_exit+0x2af/0xab0
      [   48.024799]  ? vfs_write+0x16a/0x190
      [   48.025082]  do_group_exit+0x35/0x90
      [   48.025387]  __x64_sys_exit_group+0xf/0x10
      [   48.025737]  do_syscall_64+0x3d/0x130
      [   48.026056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   48.026479] RIP: 0033:0x7f53b12082f6
      [   48.026792] ...
      [   48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
      [   48.045001] Fixing recursive fault but reboot is needed!
      
      This happens in:
      
      static void __fput(struct file *file)
      {   ...
          if (file->f_op->release)
              file->f_op->release(inode, file); <<< cdev is kfree'd here
          if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
                   !(mode & FMODE_PATH))) {
              cdev_put(inode->i_cdev); <<< cdev fields are accessed here
      
      Namely:
      
      __fput()
        posix_clock_release()
          kref_put(&clk->kref, delete_clock) <<< the last reference
            delete_clock()
              delete_ptp_clock()
                kfree(ptp) <<< cdev is embedded in ptp
        cdev_put
          module_put(p->owner) <<< *p is kfree'd, bang!
      
      Here cdev is embedded in posix_clock which is embedded in ptp_clock.
      The race happens because ptp_clock's lifetime is controlled by two
      refcounts: kref and cdev.kobj in posix_clock. This is wrong.
      
      Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
      created especially for such cases. This way the parent device with its
      ptp_clock is not released until all references to the cdev are released.
      This adds a requirement that an initialized but not exposed struct
      device should be provided to posix_clock_register() by a caller instead
      of a simple dev_t.
      
      This approach was adopted from the commit 72139dfa ("watchdog: Fix
      the race between the release of watchdog_core_data and cdev"). See
      details of the implementation in the commit 233ed09d ("chardev: add
      helper function to register char devs with a struct device").
      
      Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#uAnalyzed-by: default avatarStephen Johnston <sjohnsto@redhat.com>
      Analyzed-by: default avatarVern Lovejoy <vlovejoy@redhat.com>
      Signed-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33121e5
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/S · 54fa49ee
      Vladimir Oltean authored
      For first-generation switches (SJA1105E and SJA1105T):
      - TPID means C-Tag (typically 0x8100)
      - TPID2 means S-Tag (typically 0x88A8)
      
      While for the second generation switches (SJA1105P, SJA1105Q, SJA1105R,
      SJA1105S) it is the other way around:
      - TPID means S-Tag (typically 0x88A8)
      - TPID2 means C-Tag (typically 0x8100)
      
      In other words, E/T tags untagged traffic with TPID, and P/Q/R/S with
      TPID2.
      
      So the patch mentioned below fixed VLAN filtering for P/Q/R/S, but broke
      it for E/T.
      
      We strive for a common code path for all switches in the family, so just
      lie in the static config packing functions that TPID and TPID2 are at
      swapped bit offsets than they actually are, for P/Q/R/S. This will make
      both switches understand TPID to be ETH_P_8021Q and TPID2 to be
      ETH_P_8021AD. The meaning from the original E/T was chosen over P/Q/R/S
      because E/T is actually the one with public documentation available
      (UM10944.pdf).
      
      Fixes: f9a1a764 ("net: dsa: sja1105: Reverse TPID and TPID2")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54fa49ee
    • Vladimir Oltean's avatar
      Documentation: net: dsa: sja1105: Remove text about taprio base-time limitation · 3a323ed7
      Vladimir Oltean authored
      Since commit 86db36a3 ("net: dsa: sja1105: Implement state machine
      for TAS with PTP clock source"), this paragraph is no longer true. So
      remove it.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a323ed7
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Remove restriction of zero base-time for taprio offload · d00bdc0a
      Vladimir Oltean authored
      The check originates from the initial implementation which was not based
      on PTP time but on a standalone clock source. In the meantime we can now
      program the PTPSCHTM register at runtime with the dynamic base time
      (actually with a value that is 200 ns smaller, to avoid writing DELTA=0
      in the Schedule Entry Points Parameters Table). And we also have logic
      for moving the actual base time in the future of the PHC's current time
      base, so the check for zero serves no purpose, since even if the user
      will specify zero, that's not what will end up in the static config
      table where the limitation is.
      
      Fixes: 86db36a3 ("net: dsa: sja1105: Implement state machine for TAS with PTP clock source")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d00bdc0a
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Really make the PTP command read-write · 5a47f588
      Vladimir Oltean authored
      When activating tc-taprio offload on the switch ports, the TAS state
      machine will try to check whether it is running or not, but will find
      both the STARTED and STOPPED bits as false in the
      sja1105_tas_check_running function. So the function will return -EINVAL
      (an abnormal situation) and the kernel will keep printing this from the
      TAS FSM workqueue:
      
      [   37.691971] sja1105 spi0.1: An operation returned -22
      
      The reason is that the underlying function that gets called,
      sja1105_ptp_commit, does not actually do a SPI_READ, but a SPI_WRITE. So
      the command buffer remains initialized with zeroes instead of retrieving
      the hardware state. Fix that.
      
      Fixes: 41603d78 ("net: dsa: sja1105: Make the PTP command read-write")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a47f588
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Take PTP egress timestamp by port, not mgmt slot · 9fcf024d
      Vladimir Oltean authored
      The PTP egress timestamp N must be captured from register PTPEGR_TS[n],
      where n = 2 * PORT + TSREG. There are 10 PTPEGR_TS registers, 2 per
      port. We are only using TSREG=0.
      
      As opposed to the management slots, which are 4 in number
      (SJA1105_NUM_PORTS, minus the CPU port). Any management frame (which
      includes PTP frames) can be sent to any non-CPU port through any
      management slot. When the CPU port is not the last port (#4), there will
      be a mismatch between the slot and the port number.
      
      Luckily, the only mainline occurrence with this switch
      (arch/arm/boot/dts/ls1021a-tsn.dts) does have the CPU port as #4, so the
      issue did not manifest itself thus far.
      
      Fixes: 47ed985e ("net: dsa: sja1105: Add logic for TX timestamping")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fcf024d
  6. 30 Dec, 2019 1 commit
  7. 29 Dec, 2019 1 commit