diff options
author | Anand Avati <avati@gluster.com> | 2012-01-13 13:27:15 +0530 |
---|---|---|
committer | Anand Avati <avati@gluster.com> | 2012-01-20 05:03:42 -0800 |
commit | 7e1f8e3bac201f88e2d9ef62fc69a044716dfced (patch) | |
tree | 77540dbf1def2c864f8ae55f2293dba4a1d47488 /xlators/protocol/client/src | |
parent | 33c568ce1a28c1739f095611b40b7acf40e4e6df (diff) |
core: GFID filehandle based backend and anonymous FDs
1. What
--------
This change introduces an infrastructure change in the filesystem
which lets filesystem operation address objects (inodes) just by its
GFID. Thus far GFID has been a unique identifier of a user-visible
inode. But in terms of addressability the only mechanism thus far has
been the backend filesystem path, which could be derived from the
GFID only if it was cached in the inode table along with the entire set
of dentry ancestry leading up to the root.
This change essentially decouples addressability from the namespace. It
is no more necessary to be aware of the parent directory to address a
file or directory.
2. Why
-------
The biggest use case for such a feature is NFS for generating
persistent filehandles. So far the technique for generating filehandles
in NFS has been to encode path components so that the appropriate
inode_t can be repopulated into the inode table by means of a recursive
lookup of each component top-down.
Another use case is the ability to perform more intelligent self-healing
and rebalancing of inodes with hardlinks and also to detect renames.
A derived feature from GFID filehandles is anonymous FDs. An anonymous FD
is an internal USABLE "fd_t" which does not map to a user opened file
descriptor or to an internal ->open()'d fd. The ability to address a file
by the GFID eliminates the need to have a persistent ->open()'d fd for the
purpose of avoiding the namespace. This improves NFS read/write performance
significantly eliminating open/close calls and also fixes some of today's
limitations (like keeping an FD open longer than necessary resulting
in disk space leakage)
3. How
-------
At each storage/posix translator level, every file is hardlinked inside
a hidden .glusterfs directory (under the top level export) with the name
as the ascii-encoded standard UUID format string. For reasons of performance
and scalability there is a two-tier classification of those hardlinks
under directories with the initial parts of the UUID string as the directory
names.
For directories (which cannot be hardlinked), the approach is to use a symlink
which dereferences the parent GFID path along with basename of the directory.
The parent GFID dereference will in turn be a dereference of the grandparent
with the parent's basename, and so on recursively up to the root export.
4. Development
---------------
4a. To leverage the ability to address an inode by its GFID, the technique is
to perform a "nameless lookup". This means, to populate a loc_t structure as:
loc_t {
pargfid: NULL
parent: NULL
name: NULL
path: NULL
gfid: GFID to be looked up [out parameter]
inode: inode_new () result [in parameter]
}
and performing such lookup will return in its callback an inode_t
populated with the right contexts and a struct iatt which can be
used to perform an inode_link () on the inode (without a parent and
basename). The inode will now be hashed and linked in the inode table
and findable via inode_find().
A fundamental change moving forward is that the primary fields in a
loc_t structure are now going to be (pargfid, name) and (gfid) depending
on the kind of FOP. So far path had been the primary field for operations.
The remaining fields only serve as hints/helpers.
4b. If read/write is to be performed on an inode_t, the approach so far
has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and
then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do
fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost
in performance in the inbuilt NFS server.
5. Misc
-------
The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent
with the rest of the codebase.
Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f
BUG: 781318
Reviewed-on: http://review.gluster.com/669
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
Diffstat (limited to 'xlators/protocol/client/src')
-rw-r--r-- | xlators/protocol/client/src/client.h | 24 | ||||
-rw-r--r-- | xlators/protocol/client/src/client3_1-fops.c | 124 |
2 files changed, 77 insertions, 71 deletions
diff --git a/xlators/protocol/client/src/client.h b/xlators/protocol/client/src/client.h index 459ceed70..69830db9d 100644 --- a/xlators/protocol/client/src/client.h +++ b/xlators/protocol/client/src/client.h @@ -35,28 +35,18 @@ #define CLIENT_CMD_DISCONNECT "trusted.glusterfs.client-disconnect" #define CLIENT_DUMP_LOCKS "trusted.glusterfs.clientlk-dump" -#define CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, label) \ +#define CLIENT_GET_REMOTE_FD(conf, fd, remote_fd, label) \ do { \ + clnt_fd_ctx_t *fdctx = NULL; \ pthread_mutex_lock (&conf->lock); \ { \ - fdctx = this_fd_get_ctx (args->fd, this); \ + fdctx = this_fd_get_ctx (fd, THIS); \ } \ pthread_mutex_unlock (&conf->lock); \ - \ - if (fdctx == NULL) { \ - gf_log (this->name, GF_LOG_WARNING, \ - "(%s): failed to get fd ctx. EBADFD", \ - uuid_utoa (args->fd->inode->gfid)); \ - op_errno = EBADFD; \ - goto label; \ - } \ - \ - if (fdctx->remote_fd == -1) { \ - gf_log (this->name, GF_LOG_WARNING, \ - "(%s): failed to get fd ctx. EBADFD", \ - uuid_utoa (args->fd->inode->gfid)); \ - op_errno = EBADFD; \ - goto label; \ + if (!fdctx) { \ + remote_fd = -2; \ + } else { \ + remote_fd = fdctx->remote_fd; \ } \ } while (0); diff --git a/xlators/protocol/client/src/client3_1-fops.c b/xlators/protocol/client/src/client3_1-fops.c index 9b0fd63cc..6300b264f 100644 --- a/xlators/protocol/client/src/client3_1-fops.c +++ b/xlators/protocol/client/src/client3_1-fops.c @@ -2600,7 +2600,10 @@ client3_1_lookup (call_frame_t *frame, xlator_t *this, } req.path = (char *)args->loc->path; - req.bname = (char *)args->loc->name; + if (args->loc->name) + req.bname = (char *)args->loc->name; + else + req.bname = ""; req.dict.dict_len = dict_len; ret = client_submit_request (this, &req, frame, conf->fops, @@ -2750,7 +2753,7 @@ client3_1_ftruncate (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_ftruncate_req req = {{0,},}; int op_errno = EINVAL; @@ -2763,10 +2766,11 @@ client3_1_ftruncate (call_frame_t *frame, xlator_t *this, conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); req.offset = args->offset; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_FTRUNCATE, @@ -3510,7 +3514,7 @@ client3_1_readv (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; int op_errno = ESTALE; gfs3_read_req req = {{0,},}; @@ -3526,11 +3530,12 @@ client3_1_readv (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); req.size = args->size; req.offset = args->offset; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; + memcpy (req.gfid, args->fd->inode->gfid, 16); /* TODO: what is the size we should send ? */ rsp_iobuf = iobuf_get (this->ctx->iobuf_pool); @@ -3601,7 +3606,7 @@ int32_t client3_1_writev (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_write_req req = {{0,},}; int op_errno = ESTALE; @@ -3613,11 +3618,12 @@ client3_1_writev (call_frame_t *frame, xlator_t *this, void *data) args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); req.size = args->size; req.offset = args->offset; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_vec_request (this, &req, frame, conf->fops, GFS3_OP_WRITE, client3_1_writev_cbk, args->vector, @@ -3647,7 +3653,7 @@ client3_1_flush (call_frame_t *frame, xlator_t *this, { clnt_args_t *args = NULL; gfs3_flush_req req = {{0,},}; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; clnt_local_t *local = NULL; int op_errno = ESTALE; @@ -3659,7 +3665,7 @@ client3_1_flush (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); conf = this->private; @@ -3674,7 +3680,8 @@ client3_1_flush (call_frame_t *frame, xlator_t *this, local->owner = frame->root->lk_owner; frame->local = local; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_FLUSH, client3_1_flush_cbk, NULL, @@ -3699,7 +3706,7 @@ client3_1_fsync (call_frame_t *frame, xlator_t *this, { clnt_args_t *args = NULL; gfs3_fsync_req req = {{0,},}; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; int op_errno = 0; int ret = 0; @@ -3710,10 +3717,11 @@ client3_1_fsync (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.data = args->flags; + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_FSYNC, client3_1_fsync_cbk, NULL, @@ -3738,7 +3746,7 @@ client3_1_fstat (call_frame_t *frame, xlator_t *this, { clnt_args_t *args = NULL; gfs3_fstat_req req = {{0,},}; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; int op_errno = ESTALE; int ret = 0; @@ -3749,9 +3757,10 @@ client3_1_fstat (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); - req.fd = fdctx->remote_fd; + req.fd = remote_fd; + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_FSTAT, client3_1_fstat_cbk, NULL, @@ -3834,7 +3843,7 @@ int32_t client3_1_fsyncdir (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; int op_errno = ESTALE; gfs3_fsyncdir_req req = {{0,},}; @@ -3846,10 +3855,11 @@ client3_1_fsyncdir (call_frame_t *frame, xlator_t *this, void *data) args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.data = args->flags; + memcpy (req.gfid, args->fd->inode->gfid, 16); conf = this->private; @@ -3994,7 +4004,7 @@ client3_1_fsetxattr (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_fsetxattr_req req = {{0,},}; int op_errno = ESTALE; @@ -4007,11 +4017,11 @@ client3_1_fsetxattr (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.flags = args->flags; - memcpy (req.gfid, args->fd->inode->gfid, 16); + memcpy (req.gfid, args->fd->inode->gfid, 16); if (args->dict) { ret = dict_allocate_and_serialize (args->dict, @@ -4056,7 +4066,7 @@ client3_1_fgetxattr (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_fgetxattr_req req = {{0,},}; int op_errno = ESTALE; @@ -4074,7 +4084,7 @@ client3_1_fgetxattr (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); local = GF_CALLOC (1, sizeof (*local), gf_client_mt_clnt_local_t); @@ -4108,12 +4118,13 @@ client3_1_fgetxattr (call_frame_t *frame, xlator_t *this, rsp_iobref = NULL; req.namelen = 1; /* Use it as a flag */ - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.name = (char *)args->name; if (!req.name) { req.name = ""; req.namelen = 0; } + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_FGETXATTR, @@ -4413,7 +4424,7 @@ client3_1_fxattrop (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_fxattrop_req req = {{0,},}; int op_errno = ESTALE; @@ -4432,11 +4443,11 @@ client3_1_fxattrop (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.flags = args->flags; - memcpy (req.gfid, args->fd->inode->gfid, 16); + memcpy (req.gfid, args->fd->inode->gfid, 16); local = GF_CALLOC (1, sizeof (*local), gf_client_mt_clnt_local_t); @@ -4580,7 +4591,7 @@ client3_1_lk (call_frame_t *frame, xlator_t *this, gfs3_lk_req req = {{0,},}; int32_t gf_cmd = 0; int32_t gf_type = 0; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_local_t *local = NULL; clnt_conf_t *conf = NULL; int op_errno = ESTALE; @@ -4597,7 +4608,7 @@ client3_1_lk (call_frame_t *frame, xlator_t *this, goto unwind; } - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); ret = client_cmd_to_gf_cmd (args->cmd, &gf_cmd); if (ret) { @@ -4624,10 +4635,11 @@ client3_1_lk (call_frame_t *frame, xlator_t *this, local->fd = fd_ref (args->fd); frame->local = local; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.cmd = gf_cmd; req.type = gf_type; gf_proto_flock_from_flock (&req.flock, args->flock); + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_LK, client3_1_lk_cbk, NULL, @@ -4733,7 +4745,7 @@ client3_1_finodelk (call_frame_t *frame, xlator_t *this, gfs3_finodelk_req req = {{0,},}; int32_t gf_cmd = 0; int32_t gf_type = 0; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; int op_errno = ESTALE; int ret = 0; @@ -4744,7 +4756,7 @@ client3_1_finodelk (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); if (args->cmd == F_GETLK || args->cmd == F_GETLK64) gf_cmd = GF_LK_GETLK; @@ -4771,10 +4783,11 @@ client3_1_finodelk (call_frame_t *frame, xlator_t *this, } req.volume = (char *)args->volume; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.cmd = gf_cmd; req.type = gf_type; gf_proto_flock_from_flock (&req.flock, args->flock); + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_FINODELK, @@ -4857,7 +4870,7 @@ client3_1_fentrylk (call_frame_t *frame, xlator_t *this, { clnt_args_t *args = NULL; gfs3_fentrylk_req req = {{0,},}; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; int op_errno = ESTALE; int ret = 0; @@ -4868,9 +4881,9 @@ client3_1_fentrylk (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.cmd = args->cmd_entrylk; req.type = args->type; req.volume = (char *)args->volume; @@ -4879,6 +4892,7 @@ client3_1_fentrylk (call_frame_t *frame, xlator_t *this, req.name = (char *)args->basename; req.namelen = 1; } + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_FENTRYLK, @@ -4903,7 +4917,7 @@ client3_1_rchecksum (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_rchecksum_req req = {0,}; int op_errno = ESTALE; @@ -4915,11 +4929,11 @@ client3_1_rchecksum (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); req.len = args->len; req.offset = args->offset; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_RCHECKSUM, @@ -4945,7 +4959,7 @@ client3_1_readdir (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_readdir_req req = {{0,},}; gfs3_readdir_rsp rsp = {0, }; @@ -4965,7 +4979,7 @@ client3_1_readdir (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); readdir_rsp_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_readdir_rsp, &rsp) + args->size; @@ -5004,7 +5018,8 @@ client3_1_readdir (call_frame_t *frame, xlator_t *this, req.size = args->size; req.offset = args->offset; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_READDIR, @@ -5047,7 +5062,7 @@ client3_1_readdirp (call_frame_t *frame, xlator_t *this, clnt_args_t *args = NULL; gfs3_readdirp_req req = {{0,},}; gfs3_readdirp_rsp rsp = {0,}; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; int op_errno = ESTALE; int ret = 0; @@ -5065,7 +5080,7 @@ client3_1_readdirp (call_frame_t *frame, xlator_t *this, args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); readdirp_rsp_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_readdirp_rsp, &rsp) + args->size; @@ -5106,7 +5121,8 @@ client3_1_readdirp (call_frame_t *frame, xlator_t *this, req.size = args->size; req.offset = args->offset; - req.fd = fdctx->remote_fd; + req.fd = remote_fd; + memcpy (req.gfid, args->fd->inode->gfid, 16); ret = client_submit_request (this, &req, frame, conf->fops, GFS3_OP_READDIRP, @@ -5192,7 +5208,7 @@ int32_t client3_1_fsetattr (call_frame_t *frame, xlator_t *this, void *data) { clnt_args_t *args = NULL; - clnt_fd_ctx_t *fdctx = NULL; + int64_t remote_fd = -1; clnt_conf_t *conf = NULL; gfs3_fsetattr_req req = {0,}; int op_errno = ESTALE; @@ -5204,9 +5220,9 @@ client3_1_fsetattr (call_frame_t *frame, xlator_t *this, void *data) args = data; conf = this->private; - CLIENT_GET_FD_CTX(conf, args, fdctx, op_errno, unwind); + CLIENT_GET_REMOTE_FD(conf, args->fd, remote_fd, unwind); - req.fd = fdctx->remote_fd; + req.fd = remote_fd; req.valid = args->valid; gf_stat_from_iatt (&req.stbuf, args->stbuf); |