Skip to content

Commit

Permalink
Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/l…
Browse files Browse the repository at this point in the history
…inux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Enable using direct IO with localio
   - Added localio related tracepoints

  Bugfixes:
   - Sunrpc fixes for working with a very large cl_tasks list
   - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
   - Fixes for handling reconnections with localio
   - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
   - Fix COPY_NOTIFY xdr_buf size calculations
   - pNFS/Flexfiles fix for retrying requesting a layout segment for
     reads
   - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
     expired

  Cleanups:
   - Various other nfs & nfsd localio cleanups
   - Prepratory patches for async copy improvements that are under
     development
   - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
     xprts
   - Add netns inum and srcaddr to debugfs rpc_xprt info"

* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
  SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
  sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
  pnfs/flexfiles: retry getting layout segment for reads
  NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
  NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
  NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
  NFS: Rename struct nfs4_offloadcancel_data
  NFS: Fix typo in OFFLOAD_CANCEL comment
  NFS: CB_OFFLOAD can return NFS4ERR_DELAY
  nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
  nfs: fix incorrect error handling in LOCALIO
  nfs: probe for LOCALIO when v3 client reconnects to server
  nfs: probe for LOCALIO when v4 client reconnects to server
  nfs/localio: remove redundant code and simplify LOCALIO enablement
  nfs_common: add nfs_localio trace events
  nfs_common: track all open nfsd_files per LOCALIO nfs_client
  nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
  nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
  nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
  nfsd: update percpu_ref to manage references on nfsd_net
  ...
  • Loading branch information
torvalds committed Jan 28, 2025
2 parents 3673f5b + 6f56971 commit b88fe2b
Show file tree
Hide file tree
Showing 36 changed files with 838 additions and 317 deletions.
104 changes: 52 additions & 52 deletions Documentation/filesystems/nfs/localio.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,64 +218,30 @@ NFS Client and Server Interlock
===============================

LOCALIO provides the nfs_uuid_t object and associated interfaces to
allow proper network namespace (net-ns) and NFSD object refcounting:

We don't want to keep a long-term counted reference on each NFSD's
net-ns in the client because that prevents a server container from
completely shutting down.

So we avoid taking a reference at all and rely on the per-cpu
reference to the server (detailed below) being sufficient to keep
the net-ns active. This involves allowing the NFSD's net-ns exit
code to iterate all active clients and clear their ->net pointers
(which are needed to find the per-cpu-refcount for the nfsd_serv).

Details:

- Embed nfs_uuid_t in nfs_client. nfs_uuid_t provides a list_head
that can be used to find the client. It does add the 16-byte
uuid_t to nfs_client so it is bigger than needed (given that
uuid_t is only used during the initial NFS client and server
LOCALIO handshake to determine if they are local to each other).
If that is really a problem we can find a fix.

- When the nfs server confirms that the uuid_t is local, it moves
the nfs_uuid_t onto a per-net-ns list in NFSD's nfsd_net.

- When each server's net-ns is shutting down - in a "pre_exit"
handler, all these nfs_uuid_t have their ->net cleared. There is
an rcu_synchronize() call between pre_exit() handlers and exit()
handlers so any caller that sees nfs_uuid_t ->net as not NULL can
safely manage the per-cpu-refcount for nfsd_serv.

- The client's nfs_uuid_t is passed to nfsd_open_local_fh() so it
can safely dereference ->net in a private rcu_read_lock() section
to allow safe access to the associated nfsd_net and nfsd_serv.

So LOCALIO required the introduction and use of NFSD's percpu_ref to
interlock nfsd_destroy_serv() and nfsd_open_local_fh(), to ensure each
nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh(), and
allow proper network namespace (net-ns) and NFSD object refcounting.

LOCALIO required the introduction and use of NFSD's percpu nfsd_net_ref
to interlock nfsd_shutdown_net() and nfsd_open_local_fh(), to ensure
each net-ns is not destroyed while in use by nfsd_open_local_fh(), and
warrants a more detailed explanation:

nfsd_open_local_fh() uses nfsd_serv_try_get() before opening its
nfsd_open_local_fh() uses nfsd_net_try_get() before opening its
nfsd_file handle and then the caller (NFS client) must drop the
reference for the nfsd_file and associated nn->nfsd_serv using
nfs_file_put_local() once it has completed its IO.
reference for the nfsd_file and associated net-ns using
nfsd_file_put_local() once it has completed its IO.

This interlock working relies heavily on nfsd_open_local_fh() being
afforded the ability to safely deal with the possibility that the
NFSD's net-ns (and nfsd_net by association) may have been destroyed
by nfsd_destroy_serv() via nfsd_shutdown_net() -- which is only
possible given the nfs_uuid_t ->net pointer managemenet detailed
above.

All told, this elaborate interlock of the NFS client and server has been
verified to fix an easy to hit crash that would occur if an NFSD
instance running in a container, with a LOCALIO client mounted, is
shutdown. Upon restart of the container and associated NFSD the client
would go on to crash due to NULL pointer dereference that occurred due
to the LOCALIO client's attempting to nfsd_open_local_fh(), using
nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
by nfsd_destroy_serv() via nfsd_shutdown_net().

This interlock of the NFS client and server has been verified to fix an
easy to hit crash that would occur if an NFSD instance running in a
container, with a LOCALIO client mounted, is shutdown. Upon restart of
the container and associated NFSD, the client would go on to crash due
to NULL pointer dereference that occurred due to the LOCALIO client's
attempting to nfsd_open_local_fh() without having a proper reference on
NFSD's net-ns.

NFS Client issues IO instead of Server
======================================
Expand Down Expand Up @@ -306,10 +272,26 @@ is issuing IO to the underlying local filesystem that it is sharing with
the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and
fs/nfs/localio.c:nfs_local_commit().

With normal NFS that makes use of RPC to issue IO to the server, if an
application uses O_DIRECT the NFS client will bypass the pagecache but
the NFS server will not. The NFS server's use of buffered IO affords
applications to be less precise with their alignment when issuing IO to
the NFS client. But if all applications properly align their IO, LOCALIO
can be configured to use end-to-end O_DIRECT semantics from the NFS
client to the underlying local filesystem, that it is sharing with
the NFS server, by setting the 'localio_O_DIRECT_semantics' nfs module
parameter to Y, e.g.:

echo Y > /sys/module/nfs/parameters/localio_O_DIRECT_semantics

Once enabled, it will cause LOCALIO to use end-to-end O_DIRECT semantics
(but again, this may cause IO to fail if applications do not properly
align their IO).

Security
========

Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
LOCALIO is only supported when UNIX-style authentication (AUTH_UNIX, aka
AUTH_SYS) is used.

Care is taken to ensure the same NFS security mechanisms are used
Expand All @@ -324,6 +306,24 @@ client is afforded this same level of access (albeit in terms of the NFS
protocol via SUNRPC). No other namespaces (user, mount, etc) have been
altered or purposely extended from the server to the client.

Module Parameters
=================

/sys/module/nfs/parameters/localio_enabled (bool)
controls if LOCALIO is enabled, defaults to Y. If client and server are
local but 'localio_enabled' is set to N then LOCALIO will not be used.

/sys/module/nfs/parameters/localio_O_DIRECT_semantics (bool)
controls if O_DIRECT extends down to the underlying filesystem, defaults
to N. Application IO must be logical blocksize aligned, otherwise
O_DIRECT will fail.

/sys/module/nfsv3/parameters/nfs3_localio_probe_throttle (uint)
controls if NFSv3 read and write IOs will trigger (re)enabling of
LOCALIO every N (nfs3_localio_probe_throttle) IOs, defaults to 0
(disabled). Must be power-of-2, admin keeps all the pieces if they
misconfigure (too low a value or non-power-of-2).

Testing
=======

Expand Down
3 changes: 2 additions & 1 deletion fs/nfs/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,8 @@ config ROOT_NFS

config NFS_FSCACHE
bool "Provide NFS client caching support"
depends on NFS_FS=m && NETFS_SUPPORT || NFS_FS=y && NETFS_SUPPORT=y
depends on NFS_FS
select NETFS_SUPPORT
select FSCACHE
help
Say Y here if you want NFS data to be cached locally on disc through
Expand Down
2 changes: 1 addition & 1 deletion fs/nfs/callback_proc.c
Original file line number Diff line number Diff line change
Expand Up @@ -718,7 +718,7 @@ __be32 nfs4_callback_offload(void *data, void *dummy,

copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_KERNEL);
if (!copy)
return htonl(NFS4ERR_SERVERFAULT);
return cpu_to_be32(NFS4ERR_DELAY);

spin_lock(&cps->clp->cl_lock);
rcu_read_lock();
Expand Down
6 changes: 3 additions & 3 deletions fs/nfs/client.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
#include <linux/sunrpc/bc_xprt.h>
#include <linux/nsproxy.h>
#include <linux/pid_namespace.h>

#include <linux/nfslocalio.h>

#include "nfs4_fs.h"
#include "callback.h"
Expand Down Expand Up @@ -186,7 +186,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
seqlock_init(&clp->cl_boot_lock);
ktime_get_real_ts64(&clp->cl_nfssvc_boot);
nfs_uuid_init(&clp->cl_uuid);
spin_lock_init(&clp->cl_localio_lock);
INIT_WORK(&clp->cl_local_probe_work, nfs_local_probe_async_work);
#endif /* CONFIG_NFS_LOCALIO */

clp->cl_principal = "*";
Expand Down Expand Up @@ -244,7 +244,7 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
nfs_local_disable(clp);
nfs_localio_disable_client(clp);

/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
Expand Down
1 change: 1 addition & 0 deletions fs/nfs/direct.c
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,7 @@ static void nfs_read_sync_pgio_error(struct list_head *head, int error)
static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr)
{
get_dreq(hdr->dreq);
set_bit(NFS_IOHDR_ODIRECT, &hdr->flags);
}

static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {
Expand Down
52 changes: 34 additions & 18 deletions fs/nfs/flexfilelayout/flexfilelayout.c
Original file line number Diff line number Diff line change
Expand Up @@ -164,18 +164,17 @@ decode_name(struct xdr_stream *xdr, u32 *id)
}

static struct nfsd_file *
ff_local_open_fh(struct nfs_client *clp, const struct cred *cred,
ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx,
struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, fmode_t mode)
{
if (mode & FMODE_WRITE) {
/*
* Always request read and write access since this corresponds
* to a rw layout.
*/
mode |= FMODE_READ;
}
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);

return nfs_local_open_fh(clp, cred, fh, mode);
return nfs_local_open_fh(clp, cred, fh, &mirror->nfl, mode);
#else
return NULL;
#endif
}

static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
Expand Down Expand Up @@ -247,6 +246,7 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
spin_lock_init(&mirror->lock);
refcount_set(&mirror->ref, 1);
INIT_LIST_HEAD(&mirror->mirrors);
nfs_localio_file_init(&mirror->nfl);
}
return mirror;
}
Expand All @@ -257,6 +257,7 @@ static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)

ff_layout_remove_mirror(mirror);
kfree(mirror->fh_versions);
nfs_close_local_fh(&mirror->nfl);
cred = rcu_access_pointer(mirror->ro_cred);
put_cred(cred);
cred = rcu_access_pointer(mirror->rw_cred);
Expand Down Expand Up @@ -847,6 +848,9 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
struct nfs4_pnfs_ds *ds;
u32 ds_idx;

if (NFS_SERVER(pgio->pg_inode)->flags &
(NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR))
pgio->pg_maxretrans = io_maxretrans;
retry:
pnfs_generic_pg_check_layout(pgio, req);
/* Use full layout for now */
Expand All @@ -860,6 +864,8 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
if (!pgio->pg_lseg)
goto out_nolseg;
}
/* Reset wb_nio, since getting layout segment was successful */
req->wb_nio = 0;

ds = ff_layout_get_ds_for_read(pgio, &ds_idx);
if (!ds) {
Expand All @@ -876,14 +882,24 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;

pgio->pg_mirror_idx = ds_idx;

if (NFS_SERVER(pgio->pg_inode)->flags &
(NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR))
pgio->pg_maxretrans = io_maxretrans;
return;
out_nolseg:
if (pgio->pg_error < 0)
return;
if (pgio->pg_error < 0) {
if (pgio->pg_error != -EAGAIN)
return;
/* Retry getting layout segment if lower layer returned -EAGAIN */
if (pgio->pg_maxretrans && req->wb_nio++ > pgio->pg_maxretrans) {
if (NFS_SERVER(pgio->pg_inode)->flags & NFS_MOUNT_SOFTERR)
pgio->pg_error = -ETIMEDOUT;
else
pgio->pg_error = -EIO;
return;
}
pgio->pg_error = 0;
/* Sleep for 1 second before retrying */
ssleep(1);
goto retry;
}
out_mds:
trace_pnfs_mds_fallback_pg_init_read(pgio->pg_inode,
0, NFS4_MAX_UINT64, IOMODE_READ,
Expand Down Expand Up @@ -1820,7 +1836,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;

/* Start IO accounting for local read */
localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, FMODE_READ);
localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ);
if (localio) {
hdr->task.tk_start = ktime_get();
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
Expand Down Expand Up @@ -1896,7 +1912,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;

/* Start IO accounting for local write */
localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
hdr->task.tk_start = ktime_get();
Expand Down Expand Up @@ -1981,7 +1997,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
data->args.fh = fh;

/* Start IO accounting for local commit */
localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
data->task.tk_start = ktime_get();
Expand Down
1 change: 1 addition & 0 deletions fs/nfs/flexfilelayout/flexfilelayout.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ struct nfs4_ff_layout_mirror {
nfs4_stateid stateid;
const struct cred __rcu *ro_cred;
const struct cred __rcu *rw_cred;
struct nfs_file_localio nfl;
refcount_t ref;
spinlock_t lock;
unsigned long flags;
Expand Down
3 changes: 3 additions & 0 deletions fs/nfs/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1137,6 +1137,8 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
ctx->lock_context.open_context = ctx;
INIT_LIST_HEAD(&ctx->list);
ctx->mdsthreshold = NULL;
nfs_localio_file_init(&ctx->nfl);

return ctx;
}
EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
Expand Down Expand Up @@ -1168,6 +1170,7 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
nfs_sb_deactive(sb);
put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
kfree(ctx->mdsthreshold);
nfs_close_local_fh(&ctx->nfl);
kfree_rcu(ctx, rcu_head);
}

Expand Down
9 changes: 6 additions & 3 deletions fs/nfs/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -455,11 +455,13 @@ extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);

#if IS_ENABLED(CONFIG_NFS_LOCALIO)
/* localio.c */
extern void nfs_local_disable(struct nfs_client *);
extern void nfs_local_probe(struct nfs_client *);
extern void nfs_local_probe_async(struct nfs_client *);
extern void nfs_local_probe_async_work(struct work_struct *);
extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *,
const struct cred *,
struct nfs_fh *,
struct nfs_file_localio *,
const fmode_t);
extern int nfs_local_doio(struct nfs_client *,
struct nfsd_file *,
Expand All @@ -471,11 +473,12 @@ extern int nfs_local_commit(struct nfsd_file *,
extern bool nfs_server_is_local(const struct nfs_client *clp);

#else /* CONFIG_NFS_LOCALIO */
static inline void nfs_local_disable(struct nfs_client *clp) {}
static inline void nfs_local_probe(struct nfs_client *clp) {}
static inline void nfs_local_probe_async(struct nfs_client *clp) {}
static inline struct nfsd_file *
nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, const fmode_t mode)
struct nfs_fh *fh, struct nfs_file_localio *nfl,
const fmode_t mode)
{
return NULL;
}
Expand Down
Loading

0 comments on commit b88fe2b

Please sign in to comment.