- 21 Mar, 2022 8 commits
-
-
Xiubo Li authored
Reset the last_readdir at the same time, and add a comment explaining why we don't free last_readdir when dir_emit returns false. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Dan Carpenter authored
If read_mapping_folio() fails then "inline_version" is printed without being initialized. [ jlayton: use CEPH_INLINE_NONE instead of "-1" ] Fixes: 083db6fd ("ceph: uninline the data on a file opened for writing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Venky Shankar authored
Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Venky Shankar authored
stdev is computed in `cephfs-top` tool - clients forward square of sums and IO count required to calculate stdev. Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Venky Shankar authored
Make the math a bit simpler to understand (should not affect execution speeds). Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Venky Shankar authored
Latencies are of type ktime_t, coverting from jiffies is incorrect. Also, switch to "struct ceph_timespec" for r/w/m latencies. Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
The ceph_find_inode() may will fail and return NULL. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
The ceph_get_inode() will search for or insert a new inode into the hash for the given vino, and return a reference to it. If new is non-NULL, its reference is consumed. We should release the reference when in error handing cases. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
- 01 Mar, 2022 19 commits
-
-
Xiubo Li authored
To make the logs more readable such as for log likes: ceph: will move 00000000a42b796b to split realm 100000003ed 000000007146df45 With this it will always show the inode numbers instead the inode addresses. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
This will reduce very possible but unnecessary frequently memory allocate/free in this loop. URL: https://tracker.ceph.com/issues/44100Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
The global snaprealm would be created and then destroyed immediately every time when updating it. URL: https://tracker.ceph.com/issues/54362Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
Ceph have removed this macro and the 0x3 will be use for global dummy snaprealm. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Xiubo has been doing stellar kernel work lately, and has graciously volunteered to help with maintainer duties. Add him on as co-maintainer in for ceph.ko and libceph.ko. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
Use a list instead of recursion to avoid possible stack overflow. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
We will only track the uppest parent snapshot realm from which we need to rebuild the snapshot contexts _downward_ in hierarchy. For all the others having no new snapshot we will do nothing. This fix will avoid calling ceph_queue_cap_snap() on some inodes inappropriately. For example, with the code in mainline, suppose there are 2 directory hierarchies (with 6 directories total), like this: /dir_X1/dir_X2/dir_X3/ /dir_Y1/dir_Y2/dir_Y3/ Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a root snapshot under /.snap/root_snap. Every time we make snapshots under /dir_Y1/..., the kclient will always try to rebuild the snap context for snap_X2 realm and finally will always try to queue cap snaps for dir_Y2 and dir_Y3, which makes no sense. That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when creating a new snapshot under /dir_Y1/... the new seq will be 4, and the mds will send the kclient a snapshot backtrace in _downward_ order: seqs 4, 3. When ceph_update_snap_trace() is called, it will always rebuild the from the last realm, that's the root_snap. So later when rebuilding the snap context, the current logic will always cause it to rebuild the snap_X2 realm and then try to queue cap snaps for all the inodes related in that realm, even though it's not necessary. This is accompanied by a lot of these sorts of dout messages: "ceph: queue_cap_snap 00000000a42b796b nothing dirty|writing" Fix the logic to avoid this situation. Also, the 'invalidate' word is not precise here. In actuality, it will cause a rebuild of the existing snapshot contexts or just build non-existent ones. Rename it to 'rebuild_snapcs'. URL: https://tracker.ceph.com/issues/44100Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
This potentially will cause a bug in future if using an old ceph version that sends a smaller inode struct, which can cause some members to be skipped in handle_reply. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
There could be huge number of capsnaps around at any given time. On x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes by kzalloc. Move this to a dedicated slabcache to save 8 bytes for each. [ jlayton: use kmem_cache_zalloc ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Milind Changire authored
Problem: Some directory vxattrs (e.g. ceph.dir.pin.random) are governed by information that isn't necessarily shared with the client. Add support for the new GETVXATTR operation, which allows the client to query the MDS directly for vxattrs. When the client is queried for a vxattr that doesn't have a special handler, have it issue a GETVXATTR to the MDS directly. Solution: Adds new getvxattr op to fetch ceph.dir.pin*, ceph.dir.layout* and ceph.file.layout* vxattrs. If the entire layout for a dir or a file is being set, then it is expected that the layout be set in standard JSON format. Individual field value retrieval is not wrapped in JSON. The JSON format also applies while setting the vxattr if the entire layout is being set in one go. As a temporary measure, setting a vxattr can also be done in the old format. The old format will be deprecated in the future. URL: https://tracker.ceph.com/issues/51062Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Just call set_in_bvec in the non-conditional part. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
hongnanli authored
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
If MDS return ESTALE, that means the MDS has already iterated all the possible active MDSes including the auth MDS or the inode is under purging. No need to retry in auth MDS and will just return ESTALE directly. Retrying in this situation will cause an infinite loop. Also, retrying like this would prevent the kernel VFS layer ESTALE handling from working properly. An ESTALE error is usually an indication that the dcache is wrong, so we want to allow the VFS to redo the lookup and revalidate it properly. URL: https://tracker.ceph.com/issues/53504Signed-off-by: Xiubo Li <xiubli@redhat.com> Acked-by: Greg Farnum <gfarnum@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Currently, we only wake the waiters if we got a req->r_target_inode out of the reply. In the case where the create fails though, we may not have one. If there is a non-zero result from the create, then ensure that we wake anything waiting on the inode after we shut it down. URL: https://tracker.ceph.com/issues/54067Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
If we haven't received a reply to an async create request, then we don't want to send any cap messages to the MDS for that inode yet. Just have ceph_check_caps and __kick_flushing_caps return without doing anything, and have ceph_write_inode wait for the reply if we were asked to wait on the inode writeback. URL: https://tracker.ceph.com/issues/54107Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
...and instead just pass the wait function on the stack. Make ceph_mdsc_wait_request non-static, and add an argument for wait for completion. Then have ceph_lock_message call ceph_mdsc_submit_request, and ceph_mdsc_wait_request and pass in the pointer to ceph_lock_wait_for_completion. While we're in there, rearrange some fields in ceph_mds_request, so we save a total of 24 bytes per. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
David Howells authored
If a ceph file is made up of inline data, uninline that in the ceph_open() rather than in ceph_page_mkwrite(), ceph_write_iter(), ceph_fallocate() or ceph_write_begin(). This makes it easier to convert to using the netfs library for VM write hooks. Should this also take the inode lock for the duration on uninlining to prevent a race with truncation? [ jlayton: fix up folio locking, update i_inline_version after write ] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
David Howells authored
Make ceph_netfs_issue_op() handle inlined data on page 0. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
One fewer pointer dereference, and in the future we may not be able to count on the mapping pointer being populated (e.g. in the DIO case). Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
- 27 Feb, 2022 4 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fix from Thomas Gleixner: "A single fix for a regression caused by the recent PCI/MSI rework which resulted in a recursive locking problem in the VMD driver. The cure is to cache the relevant information upfront instead of retrieving it at runtime" * tag 'irq-urgent-2022-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI: vmd: Prevent recursive locking on interrupt allocation
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: - fix a swiotlb info leak (Halil Pasic) * tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix info leak with DMA_FROM_DEVICE
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds authored
Pull pin control fixes from Linus Walleij: - Fix some drive strength and pull-up code in the K210 driver. - Add the Alder Lake-M ACPI ID so it starts to work properly. - Use a static name for the StarFive GPIO irq_chip, forestalling an upcoming fixes series from Marc Zyngier. - Fix an ages old bug in the Tegra 186 driver where we were indexing at random into struct and being lucky getting the right member. * tag 'pinctrl-v5-17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: gpio: tegra186: Fix chip_data type confusion pinctrl: starfive: Use a static name for the GPIO irq_chip pinctrl: tigerlake: Revert "Add Alder Lake-M ACPI ID" pinctrl: k210: Fix bias-pull-up pinctrl: fix loop in k210_pinconf_get_drive()
-
- 26 Feb, 2022 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: - rtla (Real-Time Linux Analysis tool): - fix typo in man page - Update API -e to -E before it is released - Error message fix and memory leak fix - Partially uninline trace event soft disable to shrink text - Fix function graph start up test - Have triggers affect the trace instance they are in and not top level - Have osnoise sleep in the units it says it uses - Remove unused ftrace stub function - Remove event probe redundant info from event in the buffer - Fix group ownership setting in tracefs - Ensure trace buffer is minimum size to prevent crashes * tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rtla/osnoise: Fix error message when failing to enable trace instance rtla/osnoise: Free params at the exit rtla/hist: Make -E the short version of --entries tracing: Fix selftest config check for function graph start up test tracefs: Set the group ownership in apply_options() not parse_options() tracing/osnoise: Make osnoise_main to sleep for microseconds ftrace: Remove unused ftrace_startup_enable() stub tracing: Ensure trace buffer is at least 4096 bytes large tracing: Uninline trace_trigger_soft_disabled() partly eprobes: Remove redundant event type information tracing: Have traceon and traceoff trigger honor the instance tracing: Dump stacktrace trigger to the corresponding instance rtla: Fix systme -> system typo on man page
-
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblockLinus Torvalds authored
Pull memblock fix from Mike Rapoport: "Use kfree() to release kmalloced memblock regions memblock.{reserved,memory}.regions may be allocated using kmalloc() in memblock_double_array(). Use kfree() to release these kmalloced regions" * tag 'fixes-2022-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: use kfree() to release kmalloced memblock regions
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "12 patches. Subsystems affected by this patch series: MAINTAINERS, mailmap, memfd, and mm (hugetlb, kasan, hugetlbfs, pagemap, selftests, memcg, and slab)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: selftests/memfd: clean up mapping in mfd_fail_write mailmap: update Roman Gushchin's email MAINTAINERS, SLAB: add Roman as reviewer, git tree MAINTAINERS: add Shakeel as a memcg co-maintainer MAINTAINERS: remove Vladimir from memcg maintainers MAINTAINERS: add Roman as a memcg co-maintainer selftest/vm: fix map_fixed_noreplace test failure mm: fix use-after-free bug when mm->mmap is reused after being freed hugetlbfs: fix a truncation issue in hugepages parameter kasan: test: prevent cache merging in kmem_cache_double_destroy mm/hugetlb: fix kernel crash with hugetlb mremap MAINTAINERS: add sysctl-next git tree
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: - A fix for the K210 sdcard defconfig, to avoid using a fixed delay for the root FS - A fix to make sure there's a proper call frame for trace_hardirqs_{on,off}(). * tag 'riscv-for-linus-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: fix oops caused by irqsoff latency tracer riscv: fix nommu_k210_sdcard_defconfig
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull xfs fixes from Darrick Wong: "Nothing exciting, just more fixes for not returning sync_filesystem error values (and eliding it when it's not necessary). Summary: - Only call sync_filesystem when we're remounting the filesystem readonly readonly, and actually check its return value" * tag 'xfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: only bother with sync_filesystem during readonly remount
-
Mike Kravetz authored
Running the memfd script ./run_hugetlbfs_test.sh will often end in error as follows: memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE memfd-hugetlb: SEAL-SHRINK fallocate(ALLOC) failed: No space left on device ./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs opening: ./mnt/memfd fuse: DONE If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE test the mfd_fail_write routine maps the file, but does not unmap. As a result, two hugetlb pages remain reserved for the mapping. When the fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb pages, it is short by the two reserved pages. Fix by making sure to unmap in mfd_fail_write. Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.comSigned-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
I'm moving to a @linux.dev account. Map my old addresses. Link: https://lkml.kernel.org/r/20220221200006.416377-1-roman.gushchin@linux.devSigned-off-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
The slab code has an overlap with kmem accounting, where Roman has done a lot of work recently and it would be useful to make sure he's CC'd on patches that potentially affect it. Thus add him as a reviewer for the SLAB subsystem. Also while at it, add the link to slab git tree. Link: https://lkml.kernel.org/r/20220222103104.13241-1-vbabka@suse.czSigned-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shakeel Butt authored
I have been contributing and reviewing to the memcg codebase for last couple of years. So, making it official. Link: https://lkml.kernel.org/r/20220224060148.4092228-1-shakeelb@google.comSigned-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-