- 08 Feb, 2021 1 commit
-
-
Andrew Price authored
Turn on rgrplvb by default for sb_fs_format > 1801. Mount options still have to override this so a new args field to differentiate between 'off' and 'not specified' is added, and the new default is applied only when it's not specified. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 05 Feb, 2021 2 commits
-
-
Bob Peterson authored
Patch fb6791d1 was designed to allow gfs2 to unmount quicker by skipping the step where it tells dlm to unlock glocks in EX with lvbs. This was done because when gfs2 unmounts a file system, it destroys the dlm lockspace shortly after it destroys the glocks so it doesn't need to unlock them all: the unlock is implied when the lockspace is destroyed by dlm. However, that patch introduced a use-after-free in dlm: as part of its normal dlm_recoverd process, it can call ls_recovery to recover dead locks. In so doing, it can call recover_rsbs which calls recover_lvb for any mastered rsbs. Func recover_lvb runs through the list of lkbs queued to the given rsb (if the glock is cached but unlocked, it will still be queued to the lkb, but in NL--Unlocked--mode) and if it has an lvb, copies it to the rsb, thus trying to preserve the lkb. However, when gfs2 skips the dlm unlock step, it frees the glock and its lvb, which means dlm's function recover_lvb references the now freed lvb pointer, copying the freed lvb memory to the rsb. This patch changes the check in gdlm_put_lock so that it calls dlm_unlock for all glocks that contain an lvb pointer. Fixes: fb6791d1 ("GFS2: skip dlm_unlock calls in unmount") Cc: stable@vger.kernel.org # v3.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
In gfs2_recover_one, fix a sd_log_flush_lock imbalance when a recovery pass fails. Fixes: c9ebc4b7 ("gfs2: allow journal replay to hold sd_log_flush_lock") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 25 Jan, 2021 4 commits
-
-
Bob Peterson authored
The recovery func can recover multiple journals, but they were all using the same bio. This resulted in use-after-free related to sdp->sd_log_bio. This patch moves the variable to the journal descriptor, jd, so that every recovery can operate on its own bio. And hopefully we never run out. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Bob Peterson authored
If go_free is defined, function signal_our_withdraw is supposed to synchronize on the GLF_FREEING flag of the inode glock, but it accidentally does that on the live glock. Fix that and disambiguate the glock variables. Fixes: 601ef0d5 ("gfs2: Force withdraw to replay journals and wait for it to finish") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Bob Peterson authored
This reverts commit 428fd95d. Patch 428fd95d85b2 added a call to log_flush_wait to function gfs2_log_flush. Then gfs2_log_flush calls log_write_header which submits a write request with the REQ_PREFLUSH flag which also forces it to wait. This patch removes the unnecessary call to log_flush_wait. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Andrew Price authored
Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
- 22 Jan, 2021 1 commit
-
-
Zhaoyang Huang authored
As gfs2_quotad_cachep and gfs2_glock_cachep have registered shrinkers, amending SLAB_RECLAIM_ACCOUNT when creating them, which improves slab accounting. Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 31 Dec, 2020 1 commit
-
-
Bob Peterson authored
Function gfs2_log_write_page is only used in lops.c, so make it static. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 22 Dec, 2020 2 commits
-
-
Bob Peterson authored
Before this patch, sister functions gfs2_make_fs_rw and gfs2_make_fs_ro locked (held) the freeze glock by calling gfs2_freeze_lock and gfs2_freeze_unlock. The problem is, not all the callers of gfs2_make_fs_ro should be doing this. The three callers of gfs2_make_fs_ro are: remount (gfs2_reconfigure), signal_our_withdraw, and unmount (gfs2_put_super). But when unmounting the file system we can get into the following circular lock dependency: deactivate_super down_write(&s->s_umount); <-------------------------------------- s_umount deactivate_locked_super gfs2_kill_sb kill_block_super generic_shutdown_super gfs2_put_super gfs2_make_fs_ro gfs2_glock_nq_init sd_freeze_gl freeze_go_sync if (freeze glock in SH) freeze_super (vfs) down_write(&sb->s_umount); <------- s_umount This patch moves the hold of the freeze glock outside the two sister rw/ro functions to their callers, but it doesn't request the glock from gfs2_put_super, thus eliminating the circular dependency. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Many places in the gfs2 code queued and dequeued the freeze glock. Almost all of them acquire it in SHARED mode, and need to specify the same LM_FLAG_NOEXP and GL_EXACT flags. This patch adds common helper functions gfs2_freeze_lock and gfs2_freeze_unlock to make the code more readable, and to prepare for the next patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 20 Dec, 2020 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds authored
Pull gfs2 updates from Andreas Gruenbacher: - Don't wait for unfreeze of the wrong filesystems - Remove an obsolete delete_work_func hack and an incorrect sb_start_write - Minor documentation updates and cosmetic care * tag 'gfs2-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only gfs2: Remove sb_start_write from gfs2_statfs_sync gfs2: remove trailing semicolons from macro definitions Revert "GFS2: Prevent delete work from occurring on glocks used for create" gfs2: Make inode operations static MAINTAINERS: Add gfs2 bug tracker link Documentation: Update filesystems/gfs2.rst
-
Heiko Carstens authored
Commit b0a0c261 ("epoll: wire up syscall epoll_pwait2") wired up the 64 bit syscall instead of the compat variant in a couple of places. Fixes: b0a0c261 ("epoll: wire up syscall epoll_pwait2") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 19 Dec, 2020 27 commits
-
-
Linus Torvalds authored
Merge tag 'close-range-cloexec-unshare-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range fix from Christian Brauner: "syzbot reported a bug when asking close_range() to unshare the file descriptor table and making all fds close-on-exec. If CLOSE_RANGE_UNSHARE the caller will receive a private file descriptor table in case their file descriptor table is currently shared before operating on the requested file descriptor range. For the case where the caller has requested all file descriptors to be actually closed via e.g. close_range(3, ~0U, CLOSE_RANGE_UNSHARE) the kernel knows that the caller does not need any of the file descriptors anymore and will optimize the close operation by only copying all files in the range from 0 to 3 and no others. However, if the caller requested CLOSE_RANGE_CLOEXEC together with CLOSE_RANGE_UNSHARE the caller wants to still make use of the file descriptors so the kernel needs to copy all of them and can't optimize. The original patch didn't account for this and thus could cause oopses as evidenced by the syzbot report because it assumed that all fds had been copied. Fix this by handling the CLOSE_RANGE_CLOEXEC case and copying all fds if the two flags are specified together. This should've been caught in the selftests but the original patch didn't cover this case and I didn't catch it during review. So in addition to the bugfix I'm also adding selftests. They will reliably reproduce the bug on a non-fixed kernel and allows us to catch regressions and verify correct behavior. Note, the kernel selftest tree contained a bunch of changes that made the original selftest fail to compile so there are small fixups in here make them compile without warnings" * tag 'close-range-cloexec-unshare-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests/core: add regression test for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC selftests/core: add test for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC selftests/core: handle missing syscall number for close_range selftests/core: fix close_range_test build after XFAIL removal close_range: unshare all fds for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull more xen updates from Juergen Gross: "Some minor cleanup patches and a small series disentangling some Xen related Kconfig options" * tag 'for-linus-5.11-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Kconfig: remove X86_64 depends from XEN_512GB xen/manage: Fix fall-through warnings for Clang xen-blkfront: Fix fall-through warnings for Clang xen: remove trailing semicolon in macro definition xen: Kconfig: nest Xen guest options xen: Remove Xen PVH/PVHVM dependency on PCI x86/xen: Convert to DEFINE_SHOW_ATTRIBUTE
-
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linuxLinus Torvalds authored
Pull pcmcia updates from Dominik Brodowski: "Besides a few PCMCIA odd fixes, the NEC VRC4173 CARDU driver is removed, as it has not compiled in ages" * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: omap: Fix error return code in omap_cf_probe() pcmcia: Remove NEC VRC4173 CARDU pcmcia: db1xxx_ss: remove unneeded semicolon pcmcia/electra_cf: Fix some return values in 'electra_cf_probe()' in case of error
-
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linuxLinus Torvalds authored
Pull i3c updates from Boris Brezillon: - Add the HCI driver - Add a missing destroy_workqueue() in an error path - Flag Alexandre Belloni as the new maintainer * tag 'i3c/for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c/master/mipi-i3c-hci: quiet maybe-unused variable warning i3c: Resign from my maintainer role i3c/master: Fix uninitialized variable next_addr i3c/master: introduce the mipi-i3c-hci driver dt-bindings: i3c: MIPI I3C Host Controller Interface i3c master: fix missing destroy_workqueue() on error in i3c_master_register
-
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supplyLinus Torvalds authored
Pull power supply and reset updates from Sebastian Reichel: "Battery/charger driver changes: - collie_battery, generic-adc-battery, s3c-adc-battery: convert to GPIO descriptors (incl ARM board files) - misc cleanup and fixes Reset drivers: - new poweroff driver for force disabling a regulator - use printk format symbol resolver - ocelot: add support for Luton and Jaguar2" * tag 'for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (31 commits) power: supply: Fix a typo in warning message Documentation: DT: binding documentation for regulator-poweroff power: reset: new driver regulator-poweroff power: supply: ab8500: Use dev_err_probe() for IIO channels power: supply: ab8500_fg: Request all IRQs as threaded power: supply: ab8500_charger: Oneshot threaded IRQs power: supply: ab8500: Convert to dev_pm_ops power: supply: ab8500: Use local helper power: supply: wm831x_power: remove unneeded break power: supply: bq24735: Drop unused include power: supply: bq24190_charger: Drop unused include power: supply: generic-adc-battery: Use GPIO descriptors power: supply: collie_battery: Convert to GPIO descriptors power: supply: bq24190_charger: fix reference leak power: supply: s3c-adc-battery: Convert to GPIO descriptors power: reset: Use printk format symbol resolver power: supply: axp20x_usb_power: Use power efficient workqueue for debounce power: supply: axp20x_usb_power: fix typo power: supply: max8997-charger: Improve getting charger status power: supply: max8997-charger: Fix platform data retrieval ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsiLinus Torvalds authored
Pull HSI updates from Sebastian Reichel: "Misc cleanups" * tag 'hsi-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: HSI: core: fix a kernel-doc markup HSI: omap_ssi: Don't jump to free ID in ssi_add_controller()
-
Linus Torvalds authored
Merge tag 'pwm/for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This is a fairly big release cycle from the PWM framework's point of view. There's a large patcheset here which converts drivers to use the new devm_platform_ioremap_resource() helper and a bunch of minor fixes to existing drivers. Some of the existing drivers also add support for more hardware, such as Atmel SAMA 5D2 and Mediatek MT8183. Finally there's a couple of new drivers for Intel Keem Bay and LGM SoCs as well as the DesignWare PWM controller" * tag 'pwm/for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (66 commits) pwm: sun4i: Remove erroneous else branch pwm: sl28cpld: Set driver data before registering the PWM chip pwm: Remove unused function pwmchip_add_inversed() pwm: imx27: Fix overflow for bigger periods pwm: bcm2835: Support apply function for atomic configuration pwm: keembay: Fix build failure with -Os pwm: core: Use octal permission pwm: lpss: Make compilable with COMPILE_TEST pwm: Fix dependencies on HAS_IOMEM pwm: Use -EINVAL for unsupported polarity pwm: sti: Remove unnecessary blank line pwm: sti: Avoid conditional gotos pwm: Add PWM fan controller driver for LGM SoC Add DT bindings YAML schema for PWM fan controller of LGM SoC pwm: Add DesignWare PWM Controller Driver dt-bindings: pwm: mtk-disp: add MT8167 SoC binding pwm: mediatek: Add MT8183 SoC support pwm: mediatek: Always use bus clock dt-bindings: pwm: pwm-mediatek: Add documentation for MT8183 SoC pwm: Add PWM driver for Intel Keem Bay ...
-
Linus Torvalds authored
Merge still more updates from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (memcg and cleanups) and epoll" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/Kconfig: fix spelling mistake "whats" -> "what's" selftests/filesystems: expand epoll with epoll_pwait2 epoll: wire up syscall epoll_pwait2 epoll: add syscall epoll_pwait2 epoll: convert internal api to timespec64 epoll: eliminate unnecessary lock for zero timeout epoll: replace gotos with a proper loop epoll: pull all code between fetch_events and send_event into the loop epoll: simplify and optimize busy loop logic epoll: move eavail next to the list_empty_careful check epoll: pull fatal signal checks into ep_send_events() epoll: simplify signal handling epoll: check for events when removing a timed out thread from the wait queue mm/memcontrol:rewrite mem_cgroup_page_lruvec() mm, kvm: account kvm_vcpu_mmap to kmemcg mm/memcg: remove unused definitions mm/memcg: warning on !memcg after readahead page charged mm/memcg: bail early from swap accounting if memcg disabled
-
Colin Ian King authored
There is a spelling mistake in the Kconfig help text. Fix it. Link: https://lkml.kernel.org/r/20201217172717.58203-1-colin.king@canonical.comSigned-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Code coverage for the epoll_pwait2 syscall. epoll62: Repeat basic test epoll1, but exercising the new syscall. epoll63: Pass a timespec and exercise the timeout wakeup path. Link: https://lkml.kernel.org/r/20201121144401.3727659-5-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Add syscall epoll_pwait2, an epoll_wait variant with nsec resolution that replaces int timeout with struct timespec. It is equivalent otherwise. int epoll_pwait2(int fd, struct epoll_event *events, int maxevents, const struct timespec *timeout, const sigset_t *sigset); The underlying hrtimer is already programmed with nsec resolution. pselect and ppoll also set nsec resolution timeout with timespec. The sigset_t in epoll_pwait has a compat variant. epoll_pwait2 needs the same. For timespec, only support this new interface on 2038 aware platforms that define __kernel_timespec_t. So no CONFIG_COMPAT_32BIT_TIME. Link: https://lkml.kernel.org/r/20201121144401.3727659-3-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Willem de Bruijn authored
Patch series "add epoll_pwait2 syscall", v4. Enable nanosecond timeouts for epoll. Analogous to pselect and ppoll, introduce an epoll_wait syscall variant that takes a struct timespec instead of int timeout. This patch (of 4): Make epoll more consistent with select/poll: pass along the timeout as timespec64 pointer. In anticipation of additional changes affecting all three polling mechanisms: - add epoll_pwait2 syscall with timespec semantics, and share poll_select_set_timeout implementation. - compute slack before conversion to absolute time, to save one ktime_get_ts64 call. Link: https://lkml.kernel.org/r/20201121144401.3727659-1-willemdebruijn.kernel@gmail.com Link: https://lkml.kernel.org/r/20201121144401.3727659-2-willemdebruijn.kernel@gmail.comSigned-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
We call ep_events_available() under lock when timeout is 0, and then call it without locks in the loop for the other cases. Instead, call ep_events_available() without lock for all cases. For non-zero timeouts, we will recheck after adding the thread to the wait queue. For zero timeout cases, by definition, user is opportunistically polling and will have to call epoll_wait again in the future. Note that this lock was kept in c5a282e9 because the whole loop was historically under lock. This patch results in a 1% CPU/RPC reduction in RPC benchmarks. Link: https://lkml.kernel.org/r/20201106231635.3528496-9-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
The existing loop is pointless, and the labels make it really hard to follow the structure. Replace that control structure with a simple loop that returns when there are new events, there is a signal, or the thread has timed out. Link: https://lkml.kernel.org/r/20201106231635.3528496-8-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
This is a no-op change which simplifies the follow up patches. Link: https://lkml.kernel.org/r/20201106231635.3528496-7-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
ep_events_available() is called multiple times around the busy loop logic, even though the logic is generally not used. ep_reset_busy_poll_napi_id() is similarly always called, even when busy loop is not used. Eliminate ep_reset_busy_poll_napi_id() and inline it inside ep_busy_loop(). Make ep_busy_loop() return whether there are any events available after the busy loop. This will eliminate unnecessary loads and branches, and simplifies the loop. Link: https://lkml.kernel.org/r/20201106231635.3528496-6-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
This is a no-op change and simply to make the code more coherent. Link: https://lkml.kernel.org/r/20201106231635.3528496-5-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
To simplify the code, pull in checking the fatal signals into ep_send_events(). ep_send_events() is called only from ep_poll(). Note that, previously, we were always checking fatal events, but it is checked only if eavail is true. This should be fine because the goal of that check is to quickly return from epoll_wait() when there is a pending fatal signal. Link: https://lkml.kernel.org/r/20201106231635.3528496-4-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
Check signals before locking ep->lock, and immediately return -EINTR if there is any signal pending. This saves a few loads, stores, and branches from the hot path and simplifies the loop structure for follow up patches. Link: https://lkml.kernel.org/r/20201106231635.3528496-3-soheil.kdev@gmail.comSigned-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soheil Hassas Yeganeh authored
Patch series "simplify ep_poll". This patch series is a followup based on the suggestions and feedback by Linus: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com The first patch in the series is a fix for the epoll race in presence of timeouts, so that it can be cleanly backported to all affected stable kernels. The rest of the patch series simplify the ep_poll() implementation. Some of these simplifications result in minor performance enhancements as well. We have kept these changes under self tests and internal benchmarks for a few days, and there are minor (1-2%) performance enhancements as a result. This patch (of 8): After abc610e0 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout"), we break out of the ep_poll loop upon timeout, without checking whether there is any new events available. Prior to that patch-series we always called ep_events_available() after exiting the loop. This can cause races and missed wakeups. For example, consider the following scenario reported by Guantao Liu: Suppose we have an eventfd added using EPOLLET to an epollfd. Thread 1: Sleeps for just below 5ms and then writes to an eventfd. Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an event of the eventfd, it will write back on that fd. Thread 3: Calls epoll_wait with a negative timeout. Prior to abc610e0, it is guaranteed that Thread 3 will wake up either by Thread 1 or Thread 2. After abc610e0, Thread 3 can be blocked indefinitely if Thread 2 sees a timeout right before the write to the eventfd by Thread 1. Thread 2 will be woken up from schedule_hrtimeout_range and, with evail 0, it will not call ep_send_events(). To fix this issue: 1) Simplify the timed_out case as suggested by Linus. 2) while holding the lock, recheck whether the thread was woken up after its time out has reached. Note that (2) is different from Linus' original suggestion: It do not set "eavail = ep_events_available(ep)" to avoid unnecessary contention (when there are too many timed-out threads and a small number of events), as well as races mentioned in the discussion thread. This is the first patch in the series so that the backport to stable releases is straightforward. Link: https://lkml.kernel.org/r/20201106231635.3528496-1-soheil.kdev@gmail.com Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com Link: https://lkml.kernel.org/r/20201106231635.3528496-2-soheil.kdev@gmail.com Fixes: abc610e0 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Guantao Liu <guantaol@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Guantao Liu <guantaol@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hui Su authored
mem_cgroup_page_lruvec() in memcontrol.c and mem_cgroup_lruvec() in memcontrol.h is very similar except for the param(page and memcg) which also can be convert to each other. So rewrite mem_cgroup_page_lruvec() with mem_cgroup_lruvec(). [alex.shi@linux.alibaba.com: add missed warning in mem_cgroup_lruvec] Link: https://lkml.kernel.org/r/94f17bb7-ec61-5b72-3555-fabeb5a4d73b@linux.alibaba.com [lstoakes@gmail.com: warn on missing memcg on mem_cgroup_page_lruvec()] Link: https://lkml.kernel.org/r/20201125112202.387009-1-lstoakes@gmail.com Link: https://lkml.kernel.org/r/20201108143731.GA74138@rlkSigned-off-by: Hui Su <sh_def@163.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shakeel Butt authored
A VCPU of a VM can allocate couple of pages which can be mmap'ed by the user space application. At the moment this memory is not charged to the memcg of the VMM. On a large machine running large number of VMs or small number of VMs having large number of VCPUs, this unaccounted memory can be very significant. So, charge this memory to the memcg of the VMM. Please note that lifetime of these allocations corresponds to the lifetime of the VMM. Link: https://lkml.kernel.org/r/20201106202923.2087414-1-shakeelb@google.comSigned-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
Some definitions are left unused, just clean them. Link: https://lkml.kernel.org/r/20201108003834.12669-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alex Shi authored
Add VM_WARN_ON_ONCE_PAGE() macro. Since readahead page is charged on memcg too, in theory we don't have to check this exception now. Before safely remove them all, add a warning for the unexpected !memcg. Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alex Shi authored
Patch series "bail out early for memcg disable". These 2 patches are indepenedent from per memcg lru lock, and may encounter unexpected warning, so let's move out them from per memcg lru locking patchset. This patch (of 2): We could bail out early when memcg wasn't enabled. Link: https://lkml.kernel.org/r/1604283436-18880-1-git-send-email-alex.shi@linux.alibaba.com Link: https://lkml.kernel.org/r/1604283436-18880-2-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
This test is a minimalized version of the reproducer given by syzbot (cf. [1]). After introducing CLOSE_RANGE_CLOEXEC syzbot reported a crash when CLOSE_RANGE_CLOEXEC is specified in conjunction with CLOSE_RANGE_UNSHARE. When CLOSE_RANGE_UNSHARE is specified the caller will receive a private file descriptor table in case their file descriptor table is currently shared. For the case where the caller has requested all file descriptors to be actually closed via e.g. close_range(3, ~0U, 0) the kernel knows that the caller does not need any of the file descriptors anymore and will optimize the close operation by only copying all files in the range from 0 to 3 and no others. However, if the caller requested CLOSE_RANGE_CLOEXEC together with CLOSE_RANGE_UNSHARE the caller wants to still make use of the file descriptors so the kernel needs to copy all of them and can't optimize. The original patch didn't account for this and thus could cause oopses as evidenced by the syzbot report. Add tests for this regression. We first create a huge gap in the fd table. When we now call CLOSE_RANGE_UNSHARE with a shared fd table and and with ~0U as upper bound the kernel will only copy up to fd1 file descriptors into the new fd table. If the kernel is buggy and doesn't handle CLOSE_RANGE_CLOEXEC correctly it will not have copied all file descriptors and we will oops! This test passes on a fixed kernel and will trigger an oops on a buggy kernel. [1]: https://syzkaller.appspot.com/text?tag=KernelConfig&x=db720fe37a6a41d8 Cc: Giuseppe Scrivano <gscrivan@redhat.com> Cc: linux-fsdevel@vger.kernel.org Link: syzbot+96cfd2b22b3213646a93@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20201218145415.801063-4-christian.brauner@ubuntu.comSigned-off-by: Christian Brauner <christian.brauner@ubuntu.com>
-