- 25 Jan, 2021 1 commit
-
-
Andrew Price authored
Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
- 22 Jan, 2021 1 commit
-
-
Zhaoyang Huang authored
As gfs2_quotad_cachep and gfs2_glock_cachep have registered shrinkers, amending SLAB_RECLAIM_ACCOUNT when creating them, which improves slab accounting. Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 31 Dec, 2020 1 commit
-
-
Bob Peterson authored
Function gfs2_log_write_page is only used in lops.c, so make it static. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 22 Dec, 2020 2 commits
-
-
Bob Peterson authored
Before this patch, sister functions gfs2_make_fs_rw and gfs2_make_fs_ro locked (held) the freeze glock by calling gfs2_freeze_lock and gfs2_freeze_unlock. The problem is, not all the callers of gfs2_make_fs_ro should be doing this. The three callers of gfs2_make_fs_ro are: remount (gfs2_reconfigure), signal_our_withdraw, and unmount (gfs2_put_super). But when unmounting the file system we can get into the following circular lock dependency: deactivate_super down_write(&s->s_umount); <-------------------------------------- s_umount deactivate_locked_super gfs2_kill_sb kill_block_super generic_shutdown_super gfs2_put_super gfs2_make_fs_ro gfs2_glock_nq_init sd_freeze_gl freeze_go_sync if (freeze glock in SH) freeze_super (vfs) down_write(&sb->s_umount); <------- s_umount This patch moves the hold of the freeze glock outside the two sister rw/ro functions to their callers, but it doesn't request the glock from gfs2_put_super, thus eliminating the circular dependency. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Many places in the gfs2 code queued and dequeued the freeze glock. Almost all of them acquire it in SHARED mode, and need to specify the same LM_FLAG_NOEXP and GL_EXACT flags. This patch adds common helper functions gfs2_freeze_lock and gfs2_freeze_unlock to make the code more readable, and to prepare for the next patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 20 Dec, 2020 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds authored
Pull gfs2 updates from Andreas Gruenbacher: - Don't wait for unfreeze of the wrong filesystems - Remove an obsolete delete_work_func hack and an incorrect sb_start_write - Minor documentation updates and cosmetic care * tag 'gfs2-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only gfs2: Remove sb_start_write from gfs2_statfs_sync gfs2: remove trailing semicolons from macro definitions Revert "GFS2: Prevent delete work from occurring on glocks used for create" gfs2: Make inode operations static MAINTAINERS: Add gfs2 bug tracker link Documentation: Update filesystems/gfs2.rst
-
Heiko Carstens authored
Commit b0a0c261 ("epoll: wire up syscall epoll_pwait2") wired up the 64 bit syscall instead of the compat variant in a couple of places. Fixes: b0a0c261 ("epoll: wire up syscall epoll_pwait2") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 19 Dec, 2020 32 commits
-
-
Linus Torvalds authored
Merge tag 'close-range-cloexec-unshare-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range fix from Christian Brauner: "syzbot reported a bug when asking close_range() to unshare the file descriptor table and making all fds close-on-exec. If CLOSE_RANGE_UNSHARE the caller will receive a private file descriptor table in case their file descriptor table is currently shared before operating on the requested file descriptor range. For the case where the caller has requested all file descriptors to be actually closed via e.g. close_range(3, ~0U, CLOSE_RANGE_UNSHARE) the kernel knows that the caller does not need any of the file descriptors anymore and will optimize the close operation by only copying all files in the range from 0 to 3 and no others. However, if the caller requested CLOSE_RANGE_CLOEXEC together with CLOSE_RANGE_UNSHARE the caller wants to still make use of the file descriptors so the kernel needs to copy all of them and can't optimize. The original patch didn't account for this and thus could cause oopses as evidenced by the syzbot report because it assumed that all fds had been copied. Fix this by handling the CLOSE_RANGE_CLOEXEC case and copying all fds if the two flags are specified together. This should've been caught in the selftests but the original patch didn't cover this case and I didn't catch it during review. So in addition to the bugfix I'm also adding selftests. They will reliably reproduce the bug on a non-fixed kernel and allows us to catch regressions and verify correct behavior. Note, the kernel selftest tree contained a bunch of changes that made the original selftest fail to compile so there are small fixups in here make them compile without warnings" * tag 'close-range-cloexec-unshare-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests/core: add regression test for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC selftests/core: add test for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC selftests/core: handle missing syscall number for close_range selftests/core: fix close_range_test build after XFAIL removal close_range: unshare all fds for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull more xen updates from Juergen Gross: "Some minor cleanup patches and a small series disentangling some Xen related Kconfig options" * tag 'for-linus-5.11-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Kconfig: remove X86_64 depends from XEN_512GB xen/manage: Fix fall-through warnings for Clang xen-blkfront: Fix fall-through warnings for Clang xen: remove trailing semicolon in macro definition xen: Kconfig: nest Xen guest options xen: Remove Xen PVH/PVHVM dependency on PCI x86/xen: Convert to DEFINE_SHOW_ATTRIBUTE
-
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linuxLinus Torvalds authored
Pull pcmcia updates from Dominik Brodowski: "Besides a few PCMCIA odd fixes, the NEC VRC4173 CARDU driver is removed, as it has not compiled in ages" * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: omap: Fix error return code in omap_cf_probe() pcmcia: Remove NEC VRC4173 CARDU pcmcia: db1xxx_ss: remove unneeded semicolon pcmcia/electra_cf: Fix some return values in 'electra_cf_probe()' in case of error
-
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linuxLinus Torvalds authored
Pull i3c updates from Boris Brezillon: - Add the HCI driver - Add a missing destroy_workqueue() in an error path - Flag Alexandre Belloni as the new maintainer * tag 'i3c/for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c/master/mipi-i3c-hci: quiet maybe-unused variable warning i3c: Resign from my maintainer role i3c/master: Fix uninitialized variable next_addr i3c/master: introduce the mipi-i3c-hci driver dt-bindings: i3c: MIPI I3C Host Controller Interface i3c master: fix missing destroy_workqueue() on error in i3c_master_register
-
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supplyLinus Torvalds authored
Pull power supply and reset updates from Sebastian Reichel: "Battery/charger driver changes: - collie_battery, generic-adc-battery, s3c-adc-battery: convert to GPIO descriptors (incl ARM board files) - misc cleanup and fixes Reset drivers: - new poweroff driver for force disabling a regulator - use printk format symbol resolver - ocelot: add support for Luton and Jaguar2" * tag 'for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (31 commits) power: supply: Fix a typo in warning message Documentation: DT: binding documentation for regulator-poweroff power: reset: new driver regulator-poweroff power: supply: ab8500: Use dev_err_probe() for IIO channels power: supply: ab8500_fg: Request all IRQs as threaded power: supply: ab8500_charger: Oneshot threaded IRQs power: supply: ab8500: Convert to dev_pm_ops power: supply: ab8500: Use local helper power: supply: wm831x_power: remove unneeded break power: supply: bq24735: Drop unused include power: supply: bq24190_charger: Drop unused include power: supply: generic-adc-battery: Use GPIO descriptors power: supply: collie_battery: Convert to GPIO descriptors power: supply: bq24190_charger: fix reference leak power: supply: s3c-adc-battery: Convert to GPIO descriptors power: reset: Use printk format symbol resolver power: supply: axp20x_usb_power: Use power efficient workqueue for debounce power: supply: axp20x_usb_power: fix typo power: supply: max8997-charger: Improve getting charger status power: supply: max8997-charger: Fix platform data retrieval ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsiLinus Torvalds authored
Pull HSI updates from Sebastian Reichel: "Misc cleanups" * tag 'hsi-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: HSI: core: fix a kernel-doc markup HSI: omap_ssi: Don't jump to free ID in ssi_add_controller()
-
Linus Torvalds authored
Merge tag 'pwm/for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This is a fairly big release cycle from the PWM framework's point of view. There's a large patcheset here which converts drivers to use the new devm_platform_ioremap_resource() helper and a bunch of minor fixes to existing drivers. Some of the existing drivers also add support for more hardware, such as Atmel SAMA 5D2 and Mediatek MT8183. Finally there's a couple of new drivers for Intel Keem Bay and LGM SoCs as well as the DesignWare PWM controller" * tag 'pwm/for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (66 commits) pwm: sun4i: Remove erroneous else branch pwm: sl28cpld: Set driver data before registering the PWM chip pwm: Remove unused function pwmchip_add_inversed() pwm: imx27: Fix overflow for bigger periods pwm: bcm2835: Support apply function for atomic configuration pwm: keembay: Fix build failure with -Os pwm: core: Use octal permission pwm: lpss: Make compilable with COMPILE_TEST pwm: Fix dependencies on HAS_IOMEM pwm: Use -EINVAL for unsupported polarity pwm: sti: Remove unnecessary blank line pwm: sti: Avoid conditional gotos pwm: Add PWM fan controller driver for LGM SoC Add DT bindings YAML schema for PWM fan controller of LGM SoC pwm: Add DesignWare PWM Controller Driver dt-bindings: pwm: mtk-disp: add MT8167 SoC binding pwm: mediatek: Add MT8183 SoC support pwm: mediatek: Always use bus clock dt-bindings: pwm: pwm-mediatek: Add documentation for MT8183 SoC pwm: Add PWM driver for Intel Keem Bay ...
-
Linus Torvalds authored
Merge still more updates from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (memcg and cleanups) and epoll" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/Kconfig: fix spelling mistake "whats" -> "what's" selftests/filesystems: expand epoll with epoll_pwait2 epoll: wire up syscall epoll_pwait2 epoll: add syscall epoll_pwait2 epoll: convert internal api to timespec64 epoll: eliminate unnecessary lock for zero timeout epoll: replace gotos with a proper loop epoll: pull all code between fetch_events and send_event into the loop epoll: simplify and optimize busy loop logic epoll: move eavail next to the list_empty_careful check epoll: pull fatal signal checks into ep_send_events() epoll: simplify signal handling epoll: check for events when removing a timed out thread from the wait queue mm/memcontrol:rewrite mem_cgroup_page_lruvec() mm, kvm: account kvm_vcpu_mmap to kmemcg mm/memcg: remove unused definitions mm/memcg: warning on !memcg after readahead page charged mm/memcg: bail early from swap accounting if memcg disabled
-
Colin Ian King authored
There is a spelling mistake in the Kconfig help text. Fix it. Link: https://lkml.kernel.org/r/20201217172717.58203-1-colin.king@canonical.comSigned-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Code coverage for the epoll_pwait2 syscall. epoll62: Repeat basic test epoll1, but exercising the new syscall. epoll63: Pass a timespec and exercise the timeout wakeup path. Link: https://lkml.kernel.org/r/20201121144401.3727659-5-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Add syscall epoll_pwait2, an epoll_wait variant with nsec resolution that replaces int timeout with struct timespec. It is equivalent otherwise. int epoll_pwait2(int fd, struct epoll_event *events, int maxevents, const struct timespec *timeout, const sigset_t *sigset); The underlying hrtimer is already programmed with nsec resolution. pselect and ppoll also set nsec resolution timeout with timespec. The sigset_t in epoll_pwait has a compat variant. epoll_pwait2 needs the same. For timespec, only support this new interface on 2038 aware platforms that define __kernel_timespec_t. So no CONFIG_COMPAT_32BIT_TIME. Link: https://lkml.kernel.org/r/20201121144401.3727659-3-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Patch series "add epoll_pwait2 syscall", v4. Enable nanosecond timeouts for epoll. Analogous to pselect and ppoll, introduce an epoll_wait syscall variant that takes a struct timespec instead of int timeout. This patch (of 4): Make epoll more consistent with select/poll: pass along the timeout as timespec64 pointer. In anticipation of additional changes affecting all three polling mechanisms: - add epoll_pwait2 syscall with timespec semantics, and share poll_select_set_timeout implementation. - compute slack before conversion to absolute time, to save one ktime_get_ts64 call. Link: https://lkml.kernel.org/r/20201121144401.3727659-1-willemdebruijn.kernel@gmail.com Link: https://lkml.kernel.org/r/20201121144401.3727659-2-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
We call ep_events_available() under lock when timeout is 0, and then call it without locks in the loop for the other cases. Instead, call ep_events_available() without lock for all cases. For non-zero timeouts, we will recheck after adding the thread to the wait queue. For zero timeout cases, by definition, user is opportunistically polling and will have to call epoll_wait again in the future. Note that this lock was kept in c5a282e9 because the whole loop was historically under lock. This patch results in a 1% CPU/RPC reduction in RPC benchmarks. Link: https://lkml.kernel.org/r/20201106231635.3528496-9-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
The existing loop is pointless, and the labels make it really hard to follow the structure. Replace that control structure with a simple loop that returns when there are new events, there is a signal, or the thread has timed out. Link: https://lkml.kernel.org/r/20201106231635.3528496-8-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
This is a no-op change which simplifies the follow up patches. Link: https://lkml.kernel.org/r/20201106231635.3528496-7-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
ep_events_available() is called multiple times around the busy loop logic, even though the logic is generally not used. ep_reset_busy_poll_napi_id() is similarly always called, even when busy loop is not used. Eliminate ep_reset_busy_poll_napi_id() and inline it inside ep_busy_loop(). Make ep_busy_loop() return whether there are any events available after the busy loop. This will eliminate unnecessary loads and branches, and simplifies the loop. Link: https://lkml.kernel.org/r/20201106231635.3528496-6-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
This is a no-op change and simply to make the code more coherent. Link: https://lkml.kernel.org/r/20201106231635.3528496-5-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
To simplify the code, pull in checking the fatal signals into ep_send_events(). ep_send_events() is called only from ep_poll(). Note that, previously, we were always checking fatal events, but it is checked only if eavail is true. This should be fine because the goal of that check is to quickly return from epoll_wait() when there is a pending fatal signal. Link: https://lkml.kernel.org/r/20201106231635.3528496-4-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
Check signals before locking ep->lock, and immediately return -EINTR if there is any signal pending. This saves a few loads, stores, and branches from the hot path and simplifies the loop structure for follow up patches. Link: https://lkml.kernel.org/r/20201106231635.3528496-3-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
Patch series "simplify ep_poll". This patch series is a followup based on the suggestions and feedback by Linus: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com The first patch in the series is a fix for the epoll race in presence of timeouts, so that it can be cleanly backported to all affected stable kernels. The rest of the patch series simplify the ep_poll() implementation. Some of these simplifications result in minor performance enhancements as well. We have kept these changes under self tests and internal benchmarks for a few days, and there are minor (1-2%) performance enhancements as a result. This patch (of 8): After abc610e0 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout"), we break out of the ep_poll loop upon timeout, without checking whether there is any new events available. Prior to that patch-series we always called ep_events_available() after exiting the loop. This can cause races and missed wakeups. For example, consider the following scenario reported by Guantao Liu: Suppose we have an eventfd added using EPOLLET to an epollfd. Thread 1: Sleeps for just below 5ms and then writes to an eventfd. Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an event of the eventfd, it will write back on that fd. Thread 3: Calls epoll_wait with a negative timeout. Prior to abc610e0, it is guaranteed that Thread 3 will wake up either by Thread 1 or Thread 2. After abc610e0, Thread 3 can be blocked indefinitely if Thread 2 sees a timeout right before the write to the eventfd by Thread 1. Thread 2 will be woken up from schedule_hrtimeout_range and, with evail 0, it will not call ep_send_events(). To fix this issue: 1) Simplify the timed_out case as suggested by Linus. 2) while holding the lock, recheck whether the thread was woken up after its time out has reached. Note that (2) is different from Linus' original suggestion: It do not set "eavail = ep_events_available(ep)" to avoid unnecessary contention (when there are too many timed-out threads and a small number of events), as well as races mentioned in the discussion thread. This is the first patch in the series so that the backport to stable releases is straightforward. Link: https://lkml.kernel.org/r/20201106231635.3528496-1-soheil.kdev@gmail.com Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com Link: https://lkml.kernel.org/r/20201106231635.3528496-2-soheil.kdev@gmail.com Fixes: abc610e0 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Guantao Liu <guantaol@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Guantao Liu <guantaol@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hui Su authored
mem_cgroup_page_lruvec() in memcontrol.c and mem_cgroup_lruvec() in memcontrol.h is very similar except for the param(page and memcg) which also can be convert to each other. So rewrite mem_cgroup_page_lruvec() with mem_cgroup_lruvec(). [alex.shi@linux.alibaba.com: add missed warning in mem_cgroup_lruvec] Link: https://lkml.kernel.org/r/94f17bb7-ec61-5b72-3555-fabeb5a4d73b@linux.alibaba.com [lstoakes@gmail.com: warn on missing memcg on mem_cgroup_page_lruvec()] Link: https://lkml.kernel.org/r/20201125112202.387009-1-lstoakes@gmail.com Link: https://lkml.kernel.org/r/20201108143731.GA74138@rlkSigned-off-by: Hui Su <sh_def@163.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shakeel Butt authored
A VCPU of a VM can allocate couple of pages which can be mmap'ed by the user space application. At the moment this memory is not charged to the memcg of the VMM. On a large machine running large number of VMs or small number of VMs having large number of VCPUs, this unaccounted memory can be very significant. So, charge this memory to the memcg of the VMM. Please note that lifetime of these allocations corresponds to the lifetime of the VMM. Link: https://lkml.kernel.org/r/20201106202923.2087414-1-shakeelb@google.comSigned-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
Some definitions are left unused, just clean them. Link: https://lkml.kernel.org/r/20201108003834.12669-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alex Shi authored
Add VM_WARN_ON_ONCE_PAGE() macro. Since readahead page is charged on memcg too, in theory we don't have to check this exception now. Before safely remove them all, add a warning for the unexpected !memcg. Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alex Shi authored
Patch series "bail out early for memcg disable". These 2 patches are indepenedent from per memcg lru lock, and may encounter unexpected warning, so let's move out them from per memcg lru locking patchset. This patch (of 2): We could bail out early when memcg wasn't enabled. Link: https://lkml.kernel.org/r/1604283436-18880-1-git-send-email-alex.shi@linux.alibaba.com Link: https://lkml.kernel.org/r/1604283436-18880-2-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
This test is a minimalized version of the reproducer given by syzbot (cf. [1]). After introducing CLOSE_RANGE_CLOEXEC syzbot reported a crash when CLOSE_RANGE_CLOEXEC is specified in conjunction with CLOSE_RANGE_UNSHARE. When CLOSE_RANGE_UNSHARE is specified the caller will receive a private file descriptor table in case their file descriptor table is currently shared. For the case where the caller has requested all file descriptors to be actually closed via e.g. close_range(3, ~0U, 0) the kernel knows that the caller does not need any of the file descriptors anymore and will optimize the close operation by only copying all files in the range from 0 to 3 and no others. However, if the caller requested CLOSE_RANGE_CLOEXEC together with CLOSE_RANGE_UNSHARE the caller wants to still make use of the file descriptors so the kernel needs to copy all of them and can't optimize. The original patch didn't account for this and thus could cause oopses as evidenced by the syzbot report. Add tests for this regression. We first create a huge gap in the fd table. When we now call CLOSE_RANGE_UNSHARE with a shared fd table and and with ~0U as upper bound the kernel will only copy up to fd1 file descriptors into the new fd table. If the kernel is buggy and doesn't handle CLOSE_RANGE_CLOEXEC correctly it will not have copied all file descriptors and we will oops! This test passes on a fixed kernel and will trigger an oops on a buggy kernel. [1]: https://syzkaller.appspot.com/text?tag=KernelConfig&x=db720fe37a6a41d8 Cc: Giuseppe Scrivano <gscrivan@redhat.com> Cc: linux-fsdevel@vger.kernel.org Link: syzbot+96cfd2b22b3213646a93@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20201218145415.801063-4-christian.brauner@ubuntu.comSigned-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Add a test to verify that CLOSE_RANGE_UNSHARE works correctly when combined with CLOSE_RANGE_CLOEXEC for the single-threaded case. Cc: Giuseppe Scrivano <gscrivan@redhat.com> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20201218145415.801063-3-christian.brauner@ubuntu.comSigned-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
This improves the syscall number handling in the close_range() selftests. This should handle any architecture. Cc: Giuseppe Scrivano <gscrivan@redhat.com> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20201218145415.801063-2-christian.brauner@ubuntu.comSigned-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-
Tobias Klauser authored
XFAIL was removed in commit 9847d24a ("selftests/harness: Refactor XFAIL into SKIP") and its use in close_range_test was already replaced by commit 1d44d0dd ("selftests: core: use SKIP instead of XFAIL in close_range_test.c"). However, commit 23afeaef ("selftests: core: add tests for CLOSE_RANGE_CLOEXEC") introduced usage of XFAIL in TEST(close_range_cloexec). Use SKIP there as well. Fixes: 23afeaef ("selftests: core: add tests for CLOSE_RANGE_CLOEXEC") Cc: Giuseppe Scrivano <gscrivan@redhat.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20201218112428.13662-1-tklauser@distanz.ch Link: https://lore.kernel.org/r/20201218145415.801063-1-christian.brauner@ubuntu.comSigned-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
After introducing CLOSE_RANGE_CLOEXEC syzbot reported a crash when CLOSE_RANGE_CLOEXEC is specified in conjunction with CLOSE_RANGE_UNSHARE. When CLOSE_RANGE_UNSHARE is specified the caller will receive a private file descriptor table in case their file descriptor table is currently shared. For the case where the caller has requested all file descriptors to be actually closed via e.g. close_range(3, ~0U, 0) the kernel knows that the caller does not need any of the file descriptors anymore and will optimize the close operation by only copying all files in the range from 0 to 3 and no others. However, if the caller requested CLOSE_RANGE_CLOEXEC together with CLOSE_RANGE_UNSHARE the caller wants to still make use of the file descriptors so the kernel needs to copy all of them and can't optimize. The original patch didn't account for this and thus could cause oopses as evidenced by the syzbot report because it assumed that all fds had been copied. Fix this by handling the CLOSE_RANGE_CLOEXEC case. syzbot reported ================================================================== BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: null-ptr-deref in atomic64_read include/asm-generic/atomic-instrumented.h:837 [inline] BUG: KASAN: null-ptr-deref in atomic_long_read include/asm-generic/atomic-long.h:29 [inline] BUG: KASAN: null-ptr-deref in filp_close+0x22/0x170 fs/open.c:1274 Read of size 8 at addr 0000000000000077 by task syz-executor511/8522 CPU: 1 PID: 8522 Comm: syz-executor511 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:549 [inline] kasan_report.cold+0x5/0x37 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:192 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic64_read include/asm-generic/atomic-instrumented.h:837 [inline] atomic_long_read include/asm-generic/atomic-long.h:29 [inline] filp_close+0x22/0x170 fs/open.c:1274 close_files fs/file.c:402 [inline] put_files_struct fs/file.c:417 [inline] put_files_struct+0x1cc/0x350 fs/file.c:414 exit_files+0x12a/0x170 fs/file.c:435 do_exit+0xb4f/0x2a00 kernel/exit.c:818 do_group_exit+0x125/0x310 kernel/exit.c:920 get_signal+0x428/0x2100 kernel/signal.c:2792 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x447039 Code: Unable to access opcode bytes at RIP 0x44700f. RSP: 002b:00007f1b1225cdb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: 0000000000000001 RBX: 00000000006dbc28 RCX: 0000000000447039 RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000006dbc2c RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c R13: 00007fff223b6bef R14: 00007f1b1225d9c0 R15: 00000000006dbc2c ================================================================== syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+96cfd2b22b3213646a93@syzkaller.appspotmail.com Tested on: commit: 10f7cddd selftests/core: add regression test for CLOSE_RAN.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git vfs kernel config: https://syzkaller.appspot.com/x/.config?x=5d42216b510180e3 dashboard link: https://syzkaller.appspot.com/bug?extid=96cfd2b22b3213646a93 compiler: gcc (GCC) 10.1.0-syz 20200507 Reported-by: syzbot+96cfd2b22b3213646a93@syzkaller.appspotmail.com Fixes: 582f1fb6 ("fs, close_range: add flag CLOSE_RANGE_CLOEXEC") Cc: Giuseppe Scrivano <gscrivan@redhat.com> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20201217213303.722643-1-christian.brauner@ubuntu.comSigned-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-
Jason Andryuk authored
commit bfda93ae ("xen: Kconfig: nest Xen guest options") accidentally re-added X86_64 as a dependency to XEN_512GB. It was originally removed in commit a13f2ef1 ("x86/xen: remove 32-bit Xen PV guest support"). Remove it again. Fixes: bfda93ae ("xen: Kconfig: nest Xen guest options") Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20201216140838.16085-1-jandryuk@gmail.comSigned-off-by: Juergen Gross <jgross@suse.com>
-
- 18 Dec, 2020 1 commit
-
-
Kent Overstreet authored
If iter->count is 0 and iocb->ki_pos is page aligned, this causes nr_pages to be 0. Then in generic_file_buffered_read_get_pages() find_get_pages_contig() returns 0 - because we asked for 0 pages, so we call generic_file_buffered_read_no_cached_page() which attempts to add a page to the page cache, which fails with -EEXIST, and then we loop. Oops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reported-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-