- 15 Jul, 2021 7 commits
-
-
Jia He authored
Liajian reported a bug_on hit on a ThunderX2 arm64 server with FastLinQ QL41000 ethernet controller: BUG: scheduling while atomic: kworker/0:4/531/0x00000200 [qed_probe:488()]hw prepare failed kernel BUG at mm/vmalloc.c:2355! Internal error: Oops - BUG: 0 [#1] SMP CPU: 0 PID: 531 Comm: kworker/0:4 Tainted: G W 5.4.0-77-generic #86-Ubuntu pstate: 00400009 (nzcv daif +PAN -UAO) Call trace: vunmap+0x4c/0x50 iounmap+0x48/0x58 qed_free_pci+0x60/0x80 [qed] qed_probe+0x35c/0x688 [qed] __qede_probe+0x88/0x5c8 [qede] qede_probe+0x60/0xe0 [qede] local_pci_probe+0x48/0xa0 work_for_cpu_fn+0x24/0x38 process_one_work+0x1d0/0x468 worker_thread+0x238/0x4e0 kthread+0xf0/0x118 ret_from_fork+0x10/0x18 In this case, qed_hw_prepare() returns error due to hw/fw error, but in theory work queue should be in process context instead of interrupt. The root cause might be the unpaired spin_{un}lock_bh() in _qed_mcp_cmd_and_union(), which causes botton half is disabled incorrectly. Reported-by: Lijian Zhang <Lijian.Zhang@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ziyang Xuan authored
When nr_segs equal to zero in iovec_from_user, the object msg->msg_iter.iov is uninit stack memory in caif_seqpkt_sendmsg which is defined in ___sys_sendmsg. So we cann't just judge msg->msg_iter.iov->base directlly. We can use nr_segs to judge msg in caif_seqpkt_sendmsg whether has data buffers. ===================================================== BUG: KMSAN: uninit-value in caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2343 ___sys_sendmsg net/socket.c:2397 [inline] __sys_sendmmsg+0x808/0xc90 net/socket.c:2480 __compat_sys_sendmmsg net/compat.c:656 [inline] Reported-by: syzbot+09a5d591c1f98cf5efcb@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=1ace85e8fc9b0d5a45c08c2656c3e91762daa9b8 Fixes: bece7b23 ("caif: Rewritten socket implementation") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Qitao Xu authored
Tracepoint trace_qdisc_enqueue() is introduced to trace skb at the entrance of TC layer on TX side. This is similar to trace_qdisc_dequeue(): 1. For both we only trace successful cases. The failure cases can be traced via trace_kfree_skb(). 2. They are called at entrance or exit of TC layer, not for each ->enqueue() or ->dequeue(). This is intentional, because we want to make trace_qdisc_enqueue() symmetric to trace_qdisc_dequeue(), which is easier to use. The return value of qdisc_enqueue() is not interesting here, we have Qdisc's drop packets in ->dequeue(), it is impossible to trace them even if we have the return value, the only way to trace them is tracing kfree_skb(). We only add information we need to trace ring buffer. If any other information is needed, it is easy to extend it without breaking ABI, see commit 3dd344ea ("net: tracepoint: exposing sk_family in all tcp:tracepoints"). Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Qitao Xu <qitao.xu@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Qitao Xu authored
Print format of skbaddr is changed to %px from %p, because we want to use skb address as a quick way to identify a packet. Note, trace ring buffer is only accessible to privileged users, it is safe to use a real kernel address here. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Qitao Xu <qitao.xu@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Qitao Xu authored
The print format of skb adress in tracepoint class net_dev_template is changed to %px from %p, because we want to use skb address as a quick way to identify a packet. Note, trace ring buffer is only accessible to privileged users, it is safe to use a real kernel address here. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Qitao Xu <qitao.xu@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Shifting the u16 integer oct->pcie_port by CN23XX_PKT_INPUT_CTL_MAC_NUM_POS (29) bits will be promoted to a 32 bit signed int and then sign-extended to a u64. In the cases where oct->pcie_port where bit 2 is set (e.g. 3..7) the shifted value will be sign extended and the top 32 bits of the result will be set. Fix this by casting the u16 values to a u64 before the 29 bit left shift. Addresses-Coverity: ("Unintended sign extension") Fixes: 3451b97c ("liquidio: CN23XX register setup") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
Making global2 support mandatory removed the Kconfig symbol NET_DSA_MV88E6XXX_GLOBAL2. This symbol also served as an intermediate symbol to make NET_DSA_MV88E6XXX_PTP depend on NET_DSA_MV88E6XXX. With the symbol removed, the user is always asked about PTP support for Marvell 88E6xxx switches, even if the latter support is not enabled. Fix this by reinstating the dependency. Fixes: 63368a74 ("net: dsa: mv88e6xxx: Make global2 support mandatory") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 14 Jul, 2021 6 commits
-
-
David S. Miller authored
Takashi Iwai says: ==================== r8152: Fix a couple of PM problems it seems that r8152 driver suffers from the deadlock at both runtime and system PM. Formerly, it was seen more often at hibernation resume, but now it's triggered more frequently, as reported in SUSE Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1186194 While debugging the problem, I stumbled on a few obvious bugs and here is the results with two patches for addressing the resume problem. *** However, the story doesn't end here, unfortunately, and those patches don't seem sufficing. The rest major problem is that the driver calls napi_disable() and napi_enable() in the PM suspend callbacks. This makes the system stalling at (runtime-)suspend. If we drop napi_disable() and napi_enable() calls in the PM suspend callbacks, it starts working (that was the result in Bugzilla comment 13): https://bugzilla.suse.com/show_bug.cgi?id=1186194#c13 So, my patches aren't enough and we still need to investigate further. It'd be appreciated if anyone can give a fix or a hint for more debugging. The usage of napi_disable() at PM callbacks is unique in this driver and looks rather suspicious to me; but I'm no expert in this area so I might be wrong... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takashi Iwai authored
r8152 driver sets up the MAC address at reset-resume, while rtl8152_set_mac_address() has the temporary autopm get/put. This may lead to a deadlock as the PM lock has been already taken for the execution of the runtime PM callback. This patch adds the workaround to avoid the superfluous autpm when called from rtl8152_reset_resume(). Link: https://bugzilla.suse.com/show_bug.cgi?id=1186194Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takashi Iwai authored
rtl8152_close() takes the refcount via usb_autopm_get_interface() but it doesn't release when RTL8152_UNPLUG test hits. This may lead to the imbalance of PM refcount. This patch addresses it. Link: https://bugzilla.suse.com/show_bug.cgi?id=1186194Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ...
-
Christian Brauner authored
Add a simple helper that filesystems can use in their parameter parser to parse the "source" parameter. A few places open-coded this function and that already caused a bug in the cgroup v1 parser that we fixed. Let's make it harder to get this wrong by introducing a helper which performs all necessary checks. Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12 Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
The following sequence can be used to trigger a UAF: int fscontext_fd = fsopen("cgroup"); int fd_null = open("/dev/null, O_RDONLY); int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null); close_range(3, ~0U, 0); The cgroup v1 specific fs parser expects a string for the "source" parameter. However, it is perfectly legitimate to e.g. specify a file descriptor for the "source" parameter. The fs parser doesn't know what a filesystem allows there. So it's a bug to assume that "source" is always of type fs_value_is_string when it can reasonably also be fs_value_is_file. This assumption in the cgroup code causes a UAF because struct fs_parameter uses a union for the actual value. Access to that union is guarded by the param->type member. Since the cgroup paramter parser didn't check param->type but unconditionally moved param->string into fc->source a close on the fscontext_fd would trigger a UAF during put_fs_context() which frees fc->source thereby freeing the file stashed in param->file causing a UAF during a close of the fd_null. Fix this by verifying that param->type is actually a string and report an error if not. In follow up patches I'll add a new generic helper that can be used here and by other filesystems instead of this error-prone copy-pasta fix. But fixing it in here first makes backporting a it to stable a lot easier. Fixes: 8d2451f4 ("cgroup1: switch to option-by-option parsing") Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@kernel.org> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 13 Jul, 2021 11 commits
-
-
Vladimir Oltean authored
This was not caught because there is no switch driver which implements the .port_bridge_join but not .port_bridge_leave method, but it should nonetheless be fixed, as in certain conditions (driver development) it might lead to NULL pointer dereference. Fixes: f66a6a69 ("net: dsa: permit cross-chip bridging between all trees in the system") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linuxLinus Torvalds authored
Pull vboxsf fixes from Hans de Goede: "This adds support for the atomic_open directory-inode op to vboxsf. Note this is not just an enhancement this also fixes an actual issue which users are hitting, see the commit message of the "boxsf: Add support for the atomic_open directory-inode" patch" * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux: vboxsf: Add support for the atomic_open directory-inode op vboxsf: Add vboxsf_[create|release]_sf_handle() helpers vboxsf: Make vboxsf_dir_create() return the handle for the created file vboxsf: Honor excl flag to the dir-inode create op
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs zoned mode fixes from David Sterba: - fix deadlock when allocating system chunk - fix wrong mutex unlock on an error path - fix extent map splitting for append operation - update and fix message reporting unusable chunk space - don't block when background zone reclaim runs with balance in parallel * tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree btrfs: don't block if we can't acquire the reclaim lock btrfs: properly split extent_map for REQ_OP_ZONE_APPEND btrfs: rework chunk allocation to avoid exhaustion of the system chunk array btrfs: fix deadlock with concurrent chunk allocations involving system chunks btrfs: zoned: print unusable percentage when reclaiming block groups btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work
-
David S. Miller authored
Íñigo Huguet says: ==================== sfc: Fix lack of XDP TX queues A change introduced in commit e26ca4b5 ("sfc: reduce the number of requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT because it unintentionally reduced the number of XDP TX queues, letting not enough queues to have one per CPU, which leaded to errors if XDP TX/REDIRECT was done from a high numbered CPU. This patchs make the following changes: - Fix the bug mentioned above - Revert commit 99ba0ea6 ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") which intended to fix a related problem, created by mentioned bug, but it's no longer necessary - Add a new error log message if there are not enough resources to make XDP_TX/REDIRECT work V1 -> V2: keep the calculation of how many tx queues can handle a single event queue, but apply the "max. tx queues per channel" upper limit. V2 -> V3: WARN_ON if the number of initialized XDP TXQs differs from the expected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Íñigo Huguet authored
If it's not possible to allocate enough channels for XDP, XDP_TX and XDP_REDIRECT don't work. However, only a message saying that not enough channels were available was shown, but not saying what are the consequences in that case. The user didn't know if he/she can use XDP or not, if the performance is reduced, or what. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Íñigo Huguet authored
Commit 99ba0ea6 ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") intended to fix a problem caused by a round up when calculating the number of XDP channels and queues. However, this was not the real problem. The real problem was that the number of XDP TX queues had been reduced to half in commit e26ca4b5 ("sfc: reduce the number of requested xdp ev queues"), but the variable xdp_tx_queue_count had remained the same. Once the correct number of XDP TX queues is created again in the previous patch of this series, this also can be reverted since the error doesn't actually exist. Only in the case that there is a bug in the code we can have different values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this, and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it happens again in the future. Note that the number of allocated queues can be higher than the number of used ones due to the round up, as explained in the existing comment in the code. That's why we also have to stop increasing xdp_queue_number beyond efx->xdp_tx_queue_count. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Íñigo Huguet authored
Fixes: e26ca4b5 sfc: reduce the number of requested xdp ev queues The buggy commit intended to allocate less channels for XDP in order to be more unlikely to reach the limit of 32 channels of the driver. The idea was to use each IRQ/eventqeue for more XDP TX queues than before, calculating which is the maximum number of TX queues that one event queue can handle. For example, in EF10 each event queue could handle up to 8 queues, better than the 4 they were handling before the change. This way, it would have to allocate half of channels than before for XDP TX. The problem is that the TX queues are also contained inside the channel structs, and there are only 4 queues per channel. Reducing the number of channels means also reducing the number of queues, resulting in not having the desired number of 1 queue per CPU. This leads to getting errors on XDP_TX and XDP_REDIRECT if they're executed from a high numbered CPU, because there only exist queues for the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low numbered CPU, the error doesn't happen. This is the error in the logs (repeated many times, even rate limited): sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) This errors happens in function efx_xdp_tx_buffers, where it expects to have a dedicated XDP TX queue per CPU. Reverting the change makes again more likely to reach the limit of 32 channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT will be possible at all, and we will have this log error messages: At interface probe: sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32) At every subsequent XDP_TX/REDIRECT failure, rate limited: sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) However, without reverting the change, it makes the user to think that everything is OK at probe time, but later it fails in an unpredictable way, depending on the CPU that handles the packet. It is better to restore the predictable behaviour. If the user sees the error message at probe time, he/she can try to configure the best way it fits his/her needs. At least, he/she will have 2 options: - Accept that XDP_TX/REDIRECT is not available (he/she may not need it) - Load sfc module with modparam 'rss_cpus' with a lower number, thus creating less normal RX queues/channels, letting more free resources for XDP, with some performance penalty. Anyway, let the calculation of maximum TX queues that can be handled by a single event queue, and use it only if it's less than the number of TX queues per channel. This doesn't happen in practice, but could happen if some constant values are tweaked in the future, such us EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE. Related mailing list thread: https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Skripkin authored
fp is netdev private data and it cannot be used after free_netdev() call. Using fp after free_netdev() can cause UAF bug. Fix it by moving free_netdev() after error message. Fixes: 61414f5e ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
In May 2019 when commit 640f763f ("net: dsa: sja1105: Add support for Spanning Tree Protocol") was introduced, the comment that "STP does not get called for the CPU port" was true. This changed after commit 0394a63a ("net: dsa: enable and disable all ports") in August 2019 and went largely unnoticed, because the sja1105_bridge_stp_state_set() method did nothing different compared to the static setup done by sja1105_init_mac_settings(). With the ability to turn address learning off introduced by the blamed commit, there is a new priv->learn_ena port mask in the driver. When sja1105_bridge_stp_state_set() gets called and we are in BR_STATE_LEARNING or later, address learning is enabled or not depending on priv->learn_ena & BIT(port). So what happens is that priv->learn_ena is not being set from anywhere for the CPU port, and the static configuration done by sja1105_init_mac_settings() is being overwritten. To solve this, acknowledge that the static configuration of STP state is no longer necessary because the STP state is being set by the DSA core now, but what is necessary is to set priv->learn_ena for the CPU port. Fixes: 4d942354 ("net: dsa: sja1105: offload bridge port flags to device") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The point with a *dev and a *brport_dev is that when we have a LAG net device that is a bridge port, *dev is an ocelot net device and *brport_dev is the bonding/team net device. The ocelot net device beneath the LAG does not exist from the bridge's perspective, so we need to sync the switchdev objects belonging to the brport_dev and not to the dev. Fixes: e4bd44e8 ("net: ocelot: replay switchdev events when joining bridge") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yajun Deng authored
It has 'if (err >0 )' statement in nlmsg_unicast(), so use nlmsg_unicast() instead of netlink_unicast(), this looks more concise. v2: remove the change in netfilter. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 12 Jul, 2021 6 commits
-
-
Christoph Hellwig authored
No need to give up the original sd minor even with this option, and if we did we'd also need to fix the number of minors for this configuration to actually work. Fixes: 7c3f828b ("block: refactor device number setup in __device_add_disk") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Rewrite copy_huge_page() and move it into mm/util.c so it's always available. Fixes an exposure of uninitialised memory on configurations with HUGETLB and UFFD enabled and MIGRATION disabled. Fixes: 8cc5fcbb ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Many thanks to Kirill for reminding that PageDoubleMap cannot be relied on to warn of pte mappings in the Anon THP case; and a scan of subpages does not seem appropriate here. Note how follow_trans_huge_pmd() does not even mark an Anon THP as mlocked when compound_mapcount != 1: multiple mlocking of Anon THP is avoided, so simply return from page_mlock() in this case. Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/ Fixes: d9770fcc ("mm/rmap: fix old bug: munlocking THP missed other mlocks") Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Colin Ian King authored
In the case where act->id is FLOW_ACTION_POLICE and also act->police.rate_bytes_ps > 0 or act->police.rate_pkt_ps is not > 0 the boolean variable pps contains an uninitialized value when function otx2_tc_act_set_police is called. Fix this by initializing pps to false. Addresses-Coverity: ("Uninitialized scalar variable)" Fixes: 68fbff68 ("octeontx2-pf: Add police action for TC flower") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasily Averin authored
When TEE target mirrors traffic to another interface, sk_buff may not have enough headroom to be processed correctly. ip_finish_output2() detect this situation for ipv4 and allocates new skb with enogh headroom. However ipv6 lacks this logic in ip_finish_output2 and it leads to skb_under_panic: skbuff: skb_under_panic: text:ffffffffc0866ad4 len:96 put:24 head:ffff97be85e31800 data:ffff97be85e317f8 tail:0x58 end:0xc0 dev:gre0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:110! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 393 Comm: kworker/2:2 Tainted: G OE 5.13.0 #13 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:skb_panic+0x48/0x4a Call Trace: skb_push.cold.111+0x10/0x10 ipgre_header+0x24/0xf0 [ip_gre] neigh_connected_output+0xae/0xf0 ip6_finish_output2+0x1a8/0x5a0 ip6_output+0x5c/0x110 nf_dup_ipv6+0x158/0x1000 [nf_dup_ipv6] tee_tg6+0x2e/0x40 [xt_TEE] ip6t_do_table+0x294/0x470 [ip6_tables] nf_hook_slow+0x44/0xc0 nf_hook.constprop.34+0x72/0xe0 ndisc_send_skb+0x20d/0x2e0 ndisc_send_ns+0xd1/0x210 addrconf_dad_work+0x3c8/0x540 process_one_work+0x1d1/0x370 worker_thread+0x30/0x390 kthread+0x116/0x130 ret_from_fork+0x22/0x30 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Randy Dunlap authored
Rename module_init & module_exit functions that are named "mod_init" and "mod_exit" so that they are unique in both the System.map file and in initcall_debug output instead of showing up as almost anonymous "mod_init". This is helpful for debugging and in determining how long certain module_init calls take to execute. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Krzysztof Halasa <khc@pm.waw.pl> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Martin Schiller <ms@dev.tdt.de> Cc: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 11 Jul, 2021 10 commits
-
-
Linus Torvalds authored
-
Hugh Dickins authored
I know nothing about zone_device pages and !device_private pages; but if try_to_migrate_one() will do nothing for them, then it's better that try_to_migrate() filter them first, than trawl through all their vmas. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
In the unlikely race case that page_mlock_one() finds VM_LOCKED has been cleared by the time it got page table lock, page_vma_mapped_walk_done() must be called before returning, either explicitly, or by a final call to page_vma_mapped_walk() - otherwise the page table remains locked. Fixes: cd62734c ("mm/rmap: split try_to_munlock from try_to_unmap") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
The kernel recovers in due course from missing Mlocked pages: but there was no point in calling page_mlock() (formerly known as try_to_munlock()) on a THP, because nothing got done even when it was found to be mapped in another VM_LOCKED vma. It's true that we need to be careful: Mlocked accounting of pte-mapped THPs is too difficult (so consistently avoided); but Mlocked accounting of only-pmd-mapped THPs is supposed to work, even when multiple mappings are mlocked and munlocked or munmapped. Refine the tests. There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so page_mlock_one() does not even have to worry about that complication. (I said the kernel recovers: but would page reclaim be likely to split THP before rediscovering that it's VM_LOCKED? I've not followed that up) Fixes: 9a73f61b ("thp, mlock: do not mlock PTE-mapped file huge pages") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Parallel developments in mm/rmap.c have left behind some out-of-date comments: try_to_migrate_one() also accepts TTU_SYNC (already commented in try_to_migrate() itself), and try_to_migrate() returns nothing at all. TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete the "recently referenced" comment from try_to_unmap_one() (once upon a time the comment was near the removed codeblock, but they drifted apart). Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://lore.kernel.org/lkml/563ce5b2-7a44-5b4d-1dfd-59a0e65932a9@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David S. Miller authored
Nikolay Aleksandrov says: ==================== net: bridge: multicast: fix automatic router port marking races While working on per-vlan multicast snooping I found two race conditions when multicast snooping is enabled. They're identical and happen when the router port list is modified without the multicast lock. One requires a PIM hello message to be received on a port and the other an MRD advertisement. To fix them we just need to take the multicast_lock when adding the ports to the router port list (marking them as router ports). Tested on an affected setup by generating the required packets while modifying the port list in parallel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
When an MRD advertisement is received on a bridge port with multicast snooping enabled, we mark it as a router port automatically, that includes adding that port to the router port list. The multicast lock protects that list, but it is not acquired in the MRD advertisement case leading to a race condition, we need to take it to fix the race. Cc: stable@vger.kernel.org Cc: linus.luessing@c0d3.blue Fixes: 4b3087c7 ("bridge: Snoop Multicast Router Advertisements") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
When a PIM hello packet is received on a bridge port with multicast snooping enabled, we mark it as a router port automatically, that includes adding that port the router port list. The multicast lock protects that list, but it is not acquired in the PIM message case leading to a race condition, we need to take it to fix the race. Cc: stable@vger.kernel.org Fixes: 91b02d3d ("bridge: mcast: add router port on PIM hello message") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Ingo Molnar: "Two fixes: - Fix a MIPS IRQ handling RCU bug - Remove a DocBook annotation for a parameter that doesn't exist anymore" * tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mips: Fix RCU violation when using irqdomain lookup on interrupt entry genirq/irqdesc: Drop excess kernel-doc entry @lookup
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Ingo Molnar: "Three fixes: - Fix load tracking bug/inconsistency - Fix a sporadic CFS bandwidth constraints enforcement bug - Fix a uclamp utilization tracking bug for newly woken tasks" * tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/uclamp: Ignore max aggregation if rq is idle sched/fair: Fix CFS bandwidth hrtimer expiry type sched/fair: Sync load_sum with load_avg after dequeue
-