Skip to content

Commit

Permalink
Merge tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/k…
Browse files Browse the repository at this point in the history
…ernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Support caching symlink lengths in inodes

     The size is stored in a new union utilizing the same space as
     i_devices, thus avoiding growing the struct or taking up any more
     space

     When utilized it dodges strlen() in vfs_readlink(), giving about
     1.5% speed up when issuing readlink on /initrd.img on ext4

   - Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag

     If a file system supports uncached buffered IO, it may set
     FOP_DONTCACHE and enable support for RWF_DONTCACHE.

     If RWF_DONTCACHE is attempted without the file system supporting
     it, it'll get errored with -EOPNOTSUPP

   - Enable VBOXGUEST and VBOXSF_FS on ARM64

     Now that VirtualBox is able to run as a host on arm64 (e.g. the
     Apple M3 processors) we can enable VBOXSF_FS (and in turn
     VBOXGUEST) for this architecture.

     Tested with various runs of bonnie++ and dbench on an Apple MacBook
     Pro with the latest Virtualbox 7.1.4 r165100 installed

  Cleanups:

   - Delay sysctl_nr_open check in expand_files()

   - Use kernel-doc includes in fiemap docbook

   - Use page->private instead of page->index in watch_queue

   - Use a consume fence in mnt_idmap() as it's heavily used in
     link_path_walk()

   - Replace magic number 7 with ARRAY_SIZE() in fc_log

   - Sort out a stale comment about races between fd alloc and dup2()

   - Fix return type of do_mount() from long to int

   - Various cosmetic cleanups for the lockref code

  Fixes:

   - Annotate spinning as unlikely() in __read_seqcount_begin

     The annotation already used to be there, but got lost in commit
     52ac39e ("seqlock: seqcount_t: Implement all read APIs as
     statement expressions")

   - Fix proc_handler for sysctl_nr_open

   - Flush delayed work in delayed fput()

   - Fix grammar and spelling in propagate_umount()

   - Fix ESP not readable during coredump

     In /proc/PID/stat, there is the kstkesp field which is the stack
     pointer of a thread. While the thread is active, this field reads
     zero. But during a coredump, it should have a valid value

     However, at the moment, kstkesp is zero even during coredump

   - Don't wake up the writer if the pipe is still full

   - Fix unbalanced user_access_end() in select code"

* tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  gfs2: use lockref_init for qd_lockref
  erofs: use lockref_init for pcl->lockref
  dcache: use lockref_init for d_lockref
  lockref: add a lockref_init helper
  lockref: drop superfluous externs
  lockref: use bool for false/true returns
  lockref: improve the lockref_get_not_zero description
  lockref: remove lockref_put_not_zero
  fs: Fix return type of do_mount() from long to int
  select: Fix unbalanced user_access_end()
  vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64
  pipe_read: don't wake up the writer if the pipe is still full
  selftests: coredump: Add stackdump test
  fs/proc: do_task_stat: Fix ESP not readable during coredump
  fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
  fs: sort out a stale comment about races between fd alloc and dup2
  fs: Fix grammar and spelling in propagate_umount()
  fs: fc_log replace magic number 7 with ARRAY_SIZE()
  fs: use a consume fence in mnt_idmap()
  file: flush delayed work in delayed fput()
  ...
  • Loading branch information
torvalds committed Jan 20, 2025
2 parents d582952 + c859df5 commit 4b84a4c
Show file tree
Hide file tree
Showing 33 changed files with 415 additions and 180 deletions.
49 changes: 15 additions & 34 deletions Documentation/filesystems/fiemap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,10 @@ returns a list of extents.
Request Basics
--------------

A fiemap request is encoded within struct fiemap::

struct fiemap {
__u64 fm_start; /* logical offset (inclusive) at
* which to start mapping (in) */
__u64 fm_length; /* logical length of mapping which
* userspace cares about (in) */
__u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */
__u32 fm_mapped_extents; /* number of extents that were
* mapped (out) */
__u32 fm_extent_count; /* size of fm_extents array (in) */
__u32 fm_reserved;
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
};
A fiemap request is encoded within struct fiemap:

.. kernel-doc:: include/uapi/linux/fiemap.h
:identifiers: fiemap

fm_start, and fm_length specify the logical range within the file
which the process would like mappings for. Extents returned mirror
Expand Down Expand Up @@ -60,6 +49,8 @@ FIEMAP_FLAG_XATTR
If this flag is set, the extents returned will describe the inodes
extended attribute lookup tree, instead of its data tree.

FIEMAP_FLAG_CACHE
This flag requests caching of the extents.

Extent Mapping
--------------
Expand All @@ -77,18 +68,10 @@ complete the requested range and will not have the FIEMAP_EXTENT_LAST
flag set (see the next section on extent flags).

Each extent is described by a single fiemap_extent structure as
returned in fm_extents::

struct fiemap_extent {
__u64 fe_logical; /* logical offset in bytes for the start of
* the extent */
__u64 fe_physical; /* physical offset in bytes for the start
* of the extent */
__u64 fe_length; /* length in bytes for the extent */
__u64 fe_reserved64[2];
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
__u32 fe_reserved[3];
};
returned in fm_extents:

.. kernel-doc:: include/uapi/linux/fiemap.h
:identifiers: fiemap_extent

All offsets and lengths are in bytes and mirror those on disk. It is valid
for an extents logical offset to start before the request or its logical
Expand Down Expand Up @@ -175,6 +158,8 @@ FIEMAP_EXTENT_MERGED
userspace would be highly inefficient, the kernel will try to merge most
adjacent blocks into 'extents'.

FIEMAP_EXTENT_SHARED
This flag is set to request that space be shared with other files.

VFS -> File System Implementation
---------------------------------
Expand All @@ -191,14 +176,10 @@ each discovered extent::
u64 len);

->fiemap is passed struct fiemap_extent_info which describes the
fiemap request::

struct fiemap_extent_info {
unsigned int fi_flags; /* Flags as passed from user */
unsigned int fi_extents_mapped; /* Number of mapped extents */
unsigned int fi_extents_max; /* Size of fiemap_extent array */
struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */
};
fiemap request:

.. kernel-doc:: include/linux/fiemap.h
:identifiers: fiemap_extent_info

It is intended that the file system should not need to access any of this
structure directly. Filesystem handlers should be tolerant to signals and return
Expand Down
2 changes: 1 addition & 1 deletion drivers/virt/vboxguest/Kconfig
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
config VBOXGUEST
tristate "Virtual Box Guest integration support"
depends on X86 && PCI && INPUT
depends on (ARM64 || X86) && PCI && INPUT
help
This is a driver for the Virtual Box Guest PCI device used in
Virtual Box virtual machines. Enabling this driver will add
Expand Down
3 changes: 1 addition & 2 deletions fs/dcache.c
Original file line number Diff line number Diff line change
Expand Up @@ -1681,9 +1681,8 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
/* Make sure we always see the terminating NUL character */
smp_store_release(&dentry->d_name.name, dname); /* ^^^ */

dentry->d_lockref.count = 1;
dentry->d_flags = 0;
spin_lock_init(&dentry->d_lock);
lockref_init(&dentry->d_lockref, 1);
seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
dentry->d_inode = NULL;
dentry->d_parent = dentry;
Expand Down
3 changes: 1 addition & 2 deletions fs/erofs/zdata.c
Original file line number Diff line number Diff line change
Expand Up @@ -747,8 +747,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
if (IS_ERR(pcl))
return PTR_ERR(pcl);

spin_lock_init(&pcl->lockref.lock);
pcl->lockref.count = 1; /* one ref for this request */
lockref_init(&pcl->lockref, 1); /* one ref for this request */
pcl->algorithmformat = map->m_algorithmformat;
pcl->length = 0;
pcl->partial = true;
Expand Down
3 changes: 2 additions & 1 deletion fs/ext4/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -5006,10 +5006,11 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
if (IS_ENCRYPTED(inode)) {
inode->i_op = &ext4_encrypted_symlink_inode_operations;
} else if (ext4_inode_is_fast_symlink(inode)) {
inode->i_link = (char *)ei->i_data;
inode->i_op = &ext4_fast_symlink_inode_operations;
nd_terminate_link(ei->i_data, inode->i_size,
sizeof(ei->i_data) - 1);
inode_set_cached_link(inode, (char *)ei->i_data,
inode->i_size);
} else {
inode->i_op = &ext4_symlink_inode_operations;
}
Expand Down
4 changes: 3 additions & 1 deletion fs/ext4/namei.c
Original file line number Diff line number Diff line change
Expand Up @@ -3418,7 +3418,6 @@ static int ext4_symlink(struct mnt_idmap *idmap, struct inode *dir,
inode->i_op = &ext4_symlink_inode_operations;
} else {
inode->i_op = &ext4_fast_symlink_inode_operations;
inode->i_link = (char *)&EXT4_I(inode)->i_data;
}
}

Expand All @@ -3434,6 +3433,9 @@ static int ext4_symlink(struct mnt_idmap *idmap, struct inode *dir,
disk_link.len);
inode->i_size = disk_link.len - 1;
EXT4_I(inode)->i_disksize = inode->i_size;
if (!IS_ENCRYPTED(inode))
inode_set_cached_link(inode, (char *)&EXT4_I(inode)->i_data,
inode->i_size);
}
err = ext4_add_nondir(handle, dentry, &inode);
if (handle)
Expand Down
22 changes: 7 additions & 15 deletions fs/file.c
Original file line number Diff line number Diff line change
Expand Up @@ -279,17 +279,17 @@ static int expand_files(struct files_struct *files, unsigned int nr)
if (nr < fdt->max_fds)
return 0;

/* Can we expand? */
if (nr >= sysctl_nr_open)
return -EMFILE;

if (unlikely(files->resize_in_progress)) {
spin_unlock(&files->file_lock);
wait_event(files->resize_wait, !files->resize_in_progress);
spin_lock(&files->file_lock);
goto repeat;
}

/* Can we expand? */
if (unlikely(nr >= sysctl_nr_open))
return -EMFILE;

/* All good, so we try */
files->resize_in_progress = true;
error = expand_fdtable(files, nr);
Expand Down Expand Up @@ -1231,17 +1231,9 @@ __releases(&files->file_lock)

/*
* We need to detect attempts to do dup2() over allocated but still
* not finished descriptor. NB: OpenBSD avoids that at the price of
* extra work in their equivalent of fget() - they insert struct
* file immediately after grabbing descriptor, mark it larval if
* more work (e.g. actual opening) is needed and make sure that
* fget() treats larval files as absent. Potentially interesting,
* but while extra work in fget() is trivial, locking implications
* and amount of surgery on open()-related paths in VFS are not.
* FreeBSD fails with -EBADF in the same situation, NetBSD "solution"
* deadlocks in rather amusing ways, AFAICS. All of that is out of
* scope of POSIX or SUS, since neither considers shared descriptor
* tables and this condition does not arise without those.
* not finished descriptor.
*
* POSIX is silent on the issue, we return -EBUSY.
*/
fdt = files_fdtable(files);
fd = array_index_nospec(fd, fdt->max_fds);
Expand Down
7 changes: 4 additions & 3 deletions fs/file_table.c
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ static struct ctl_table fs_stat_sysctls[] = {
.data = &sysctl_nr_open,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.proc_handler = proc_douintvec_minmax,
.extra1 = &sysctl_nr_open_min,
.extra2 = &sysctl_nr_open_max,
},
Expand Down Expand Up @@ -478,6 +478,8 @@ static void ____fput(struct callback_head *work)
__fput(container_of(work, struct file, f_task_work));
}

static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);

/*
* If kernel thread really needs to have the final fput() it has done
* to complete, call this. The only user right now is the boot - we
Expand All @@ -491,11 +493,10 @@ static void ____fput(struct callback_head *work)
void flush_delayed_fput(void)
{
delayed_fput(NULL);
flush_delayed_work(&delayed_fput_work);
}
EXPORT_SYMBOL_GPL(flush_delayed_fput);

static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);

void fput(struct file *file)
{
if (file_ref_put(&file->f_ref)) {
Expand Down
2 changes: 1 addition & 1 deletion fs/fs_context.c
Original file line number Diff line number Diff line change
Expand Up @@ -493,7 +493,7 @@ static void put_fc_log(struct fs_context *fc)
if (log) {
if (refcount_dec_and_test(&log->usage)) {
fc->log.log = NULL;
for (i = 0; i <= 7; i++)
for (i = 0; i < ARRAY_SIZE(log->buffer) ; i++)
if (log->need_free & (1 << i))
kfree(log->buffer[i]);
kfree(log);
Expand Down
3 changes: 1 addition & 2 deletions fs/gfs2/quota.c
Original file line number Diff line number Diff line change
Expand Up @@ -236,8 +236,7 @@ static struct gfs2_quota_data *qd_alloc(unsigned hash, struct gfs2_sbd *sdp, str
return NULL;

qd->qd_sbd = sdp;
qd->qd_lockref.count = 0;
spin_lock_init(&qd->qd_lockref.lock);
lockref_init(&qd->qd_lockref, 0);
qd->qd_id = qid;
qd->qd_slot = -1;
INIT_LIST_HEAD(&qd->qd_lru);
Expand Down
34 changes: 19 additions & 15 deletions fs/namei.c
Original file line number Diff line number Diff line change
Expand Up @@ -5272,19 +5272,16 @@ SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newna
getname(newname), 0);
}

int readlink_copy(char __user *buffer, int buflen, const char *link)
int readlink_copy(char __user *buffer, int buflen, const char *link, int linklen)
{
int len = PTR_ERR(link);
if (IS_ERR(link))
goto out;
int copylen;

len = strlen(link);
if (len > (unsigned) buflen)
len = buflen;
if (copy_to_user(buffer, link, len))
len = -EFAULT;
out:
return len;
copylen = linklen;
if (unlikely(copylen > (unsigned) buflen))
copylen = buflen;
if (copy_to_user(buffer, link, copylen))
copylen = -EFAULT;
return copylen;
}

/**
Expand All @@ -5304,6 +5301,9 @@ int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen)
const char *link;
int res;

if (inode->i_opflags & IOP_CACHED_LINK)
return readlink_copy(buffer, buflen, inode->i_link, inode->i_linklen);

if (unlikely(!(inode->i_opflags & IOP_DEFAULT_READLINK))) {
if (unlikely(inode->i_op->readlink))
return inode->i_op->readlink(dentry, buffer, buflen);
Expand All @@ -5322,7 +5322,7 @@ int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen)
if (IS_ERR(link))
return PTR_ERR(link);
}
res = readlink_copy(buffer, buflen, link);
res = readlink_copy(buffer, buflen, link, strlen(link));
do_delayed_call(&done);
return res;
}
Expand Down Expand Up @@ -5391,10 +5391,14 @@ EXPORT_SYMBOL(page_put_link);

int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
{
const char *link;
int res;

DEFINE_DELAYED_CALL(done);
int res = readlink_copy(buffer, buflen,
page_get_link(dentry, d_inode(dentry),
&done));
link = page_get_link(dentry, d_inode(dentry), &done);
res = PTR_ERR(link);
if (!IS_ERR(link))
res = readlink_copy(buffer, buflen, link, strlen(link));
do_delayed_call(&done);
return res;
}
Expand Down
2 changes: 1 addition & 1 deletion fs/namespace.c
Original file line number Diff line number Diff line change
Expand Up @@ -3839,7 +3839,7 @@ int path_mount(const char *dev_name, struct path *path,
data_page);
}

long do_mount(const char *dev_name, const char __user *dir_name,
int do_mount(const char *dev_name, const char __user *dir_name,
const char *type_page, unsigned long flags, void *data_page)
{
struct path path;
Expand Down
19 changes: 10 additions & 9 deletions fs/pipe.c
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
size_t total_len = iov_iter_count(to);
struct file *filp = iocb->ki_filp;
struct pipe_inode_info *pipe = filp->private_data;
bool was_full, wake_next_reader = false;
bool wake_writer = false, wake_next_reader = false;
ssize_t ret;

/* Null read succeeds. */
Expand All @@ -264,14 +264,13 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
mutex_lock(&pipe->mutex);

/*
* We only wake up writers if the pipe was full when we started
* reading in order to avoid unnecessary wakeups.
* We only wake up writers if the pipe was full when we started reading
* and it is no longer full after reading to avoid unnecessary wakeups.
*
* But when we do wake up writers, we do so using a sync wakeup
* (WF_SYNC), because we want them to get going and generate more
* data for us.
*/
was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
for (;;) {
/* Read ->head with a barrier vs post_one_notification() */
unsigned int head = smp_load_acquire(&pipe->head);
Expand Down Expand Up @@ -340,8 +339,10 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
buf->len = 0;
}

if (!buf->len)
if (!buf->len) {
wake_writer |= pipe_full(head, tail, pipe->max_usage);
tail = pipe_update_tail(pipe, buf, tail);
}
total_len -= chars;
if (!total_len)
break; /* common path: read succeeded */
Expand Down Expand Up @@ -377,7 +378,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
* _very_ unlikely case that the pipe was full, but we got
* no data.
*/
if (unlikely(was_full))
if (unlikely(wake_writer))
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);

Expand All @@ -390,15 +391,15 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
if (wait_event_interruptible_exclusive(pipe->rd_wait, pipe_readable(pipe)) < 0)
return -ERESTARTSYS;

mutex_lock(&pipe->mutex);
was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
wake_writer = false;
wake_next_reader = true;
mutex_lock(&pipe->mutex);
}
if (pipe_empty(pipe->head, pipe->tail))
wake_next_reader = false;
mutex_unlock(&pipe->mutex);

if (was_full)
if (wake_writer)
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
if (wake_next_reader)
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
Expand Down
8 changes: 4 additions & 4 deletions fs/pnode.c
Original file line number Diff line number Diff line change
Expand Up @@ -611,10 +611,10 @@ int propagate_umount(struct list_head *list)
continue;
} else if (child->mnt.mnt_flags & MNT_UMOUNT) {
/*
* We have come accross an partially unmounted
* mount in list that has not been visited yet.
* Remember it has been visited and continue
* about our merry way.
* We have come across a partially unmounted
* mount in a list that has not been visited
* yet. Remember it has been visited and
* continue about our merry way.
*/
list_add_tail(&child->mnt_umounting, &visited);
continue;
Expand Down
Loading

0 comments on commit 4b84a4c

Please sign in to comment.