| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Remove client side self-healing completely (opendir, openfd, lookup)
- Re-work readdir-failover to work reliably in case of NFS
- Remove unused/dead lock recovery code
- Consistently use xdata in both calls and callbacks in all FOPs
- Per-inode event generation, used to force inode ctx refresh
- Implement dirty flag support (in place of pending counts)
- Eliminate inode ctx structure, use read subvol bits + event_generation
- Implement inode ctx refreshing based on event generation
- Provide backward compatibility in transactions
- remove unused variables and functions
- make code more consistent in style and pattern
- regularize and clean up inode-write transaction code
- regularize and clean up dir-write transaction code
- regularize and clean up common FOPs
- reorganize transaction framework code
- skip setting xattrs in pending dict if nothing is pending
- re-write self-healing code using syncops
- re-write simpler self-heal-daemon
Change-Id: I1e4080c9796c8a2815c2dab4be3073f389d614a8
BUG: 1021686
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6010
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glfs_zerofill() can be potentially called to zero-out entire file and
hence allow for bigger value of length parameter.
Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63
BUG: 1028673
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-on: http://review.gluster.org/6266
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com>
Tested-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
the specified range. This fop will be useful when a whole file needs to
be initialized with zero (could be useful for zero filled VM disk image
provisioning or during scrubbing of VM disk images).
Client/application can issue this FOP for zeroing out. Gluster server
will zero out required range of bytes ie server offloaded zeroing. In
the absence of this fop, client/application has to repetitively issue
write (zero) fop to the server, which is very inefficient method because
of the overheads involved in RPC calls and acknowledgements.
WRITESAME is a SCSI T10 command that takes a block of data as input and
writes the same data to other blocks and this write is handled
completely within the storage and hence is known as offload . Linux ,now
has support for SCSI WRITESAME command which is exposed to the user in
the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
implement this fop. Thus zeroing out operations can be completely
offloaded to the storage device , making it highly efficient.
The fop takes two arguments offset and size. It zeroes out 'size' number
of bytes in an opened file starting from 'offset' position.
This patch adds zerofill support to the following areas:
- libglusterfs
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
Client applications can exloit this fop by using glfs_zerofill introduced in
libgfapi.FUSE support to this fop has not been added as there is no system call
for this fop.
Changes from previous version 3:
* Removed redundant memory failure log messages
Changes from previous version 2:
* Rebased and fixed build error
Changes from previous version 1:
* Rebased for latest master
TODO :
* Add zerofill support to trace xlator
* Expose zerofill capability as part of gluster volume info
Here is a performance comparison of server offloaded zeofill vs zeroing
out using repeated writes.
[root@llmvm02 remote]# time ./offloaded aakash-test log 20
real 3m34.155s
user 0m0.018s
sys 0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20
real 4m23.043s
user 0m2.197s
sys 0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;
real 4m28.363s
user 0m0.021s
sys 0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25
real 5m34.278s
user 0m2.957s
sys 0m18.808s
The argument log is a file which we want to set for logging purpose and
the third argument is size in GB .
As we can see there is a performance improvement of around 20% with this
fop.
Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
BUG: 1028673
Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5327
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia7b324b86d6a7051d187106d7a060155e77defc5
BUG: 910217
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5238
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If truncate/ftruncate is called with the offset as the current size
of file, then skip the durability fsync and unwind quickly.
Change-Id: I0baec68d96c6d4d8217d33bd9738f7ed0d1b40c5
BUG: 958118
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5737
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Idea is to not leave the file in FOOL-FOOL scenario in case on
all the bricks data transaction failed with EDQUOT to avoid
increasing un-necessary load of self-heals in the system.
For directory transactions don't leave pending changelog in case
the failures are seen on all the subvolumes.
Change-Id: I38a5561d1d581a78347a76a4a509514e4a0c3fb7
BUG: 969461
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5709
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib0c3af6babc61dc3ed45252582876e2f243d6446
BUG: 958118
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5635
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For Write Once Read Many times type of work-load choosing largest
file to be the source will always resolve fool-fool
scenarios correctly. In other cases we fsync() the files and
will have a reliable 'wise man'.
Change-Id: Ic4dbea8d06db6d578fbcb866fb65ee2d066ac7ba
BUG: 958118
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5519
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Durability of appending writes is implicit in the file size. Therefore
performing an explicit fsync() is unnecessary in such cases as self-heal
can check for the size of file when pending changelog is not unambiguous.
Change-Id: I05446180a91d20e0dbee5de5a7085b87d57f178a
BUG: 927146
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5501
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lets say mount1 has eager-lock(full-lock) and after the eager-lock
is taken mount2 opened the same file, it won't be able to
perform any data operations until mount1 releases eager-lock.
To avoid such scenario do not enable eager-lock for transaction
if open-fd-count is > 1. Delaying of changelog piggybacking is
avoided in this situation.
Change-Id: I51b45d6a7c216a78860aff0265a0b8dabc6423a5
BUG: 910217
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5432
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: venkatesh somyajulu <vsomyaju@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The changelogging scheme of AFR stores information about the state
of all replicas in all replicas (in the extended attribute of the
respective files on each server) in the form of 'pending counts'
of operations (effectively "dirty flags"). These xattrs are blindly
trusted while performing self-heal, and therefore utmost care has
to be taken while updating and maintaing them.
The most critical updation is the clearing of the pending counts
corresponding to the *other* server in the changelog of a given
server. Before clearing the pending count, we need durability
guarantee of the write which was performed on the other server.
To obtain such a guarantee, it may be necessary to explicitly
introduce an fsync() phase (if the file itself wasn't already
opened with O_SYNC).
This patch introduces the detection of unstable stable writes on
a file and issues explicit fsync() on the servers before performing
the POST-OP clearing of pending flags.
Change-Id: I2171b86a74ec91e40e5877eef0a4e7379578ecf7
BUG: 927146
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4721
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before Anonymous fds are available, afr had to queue up
transactions if the file is not opened on one of its
subvolumes. This happens until the attempt to open the
file either succeeds or fails. These attempts happen
until the file is successfully opened on the subvolume.
Now client xlator uses anonymous fds to perform the fops
if the fd used for the fop is not 'opened'.
Fops will be successful even when the file is not opened
so there is no need to queue up the transactions anymore in afr.
Open is attempted on the subvolume where it is not
opened independent of the fop.
Change-Id: Id1a4b4ebe6f89f9efe8f6a8247918b91247d0819
BUG: 913051
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4568
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fd based operations such as readv checked only for data split brain
instead of complete split-brain (i.e both data + metadata) assuming that
open would have done the complete split-brain check. However open-behind
would have unwound open, without winding to afr thus preventing the complete
split-brain check and some appliations will be able to read the contents
of the file even though the file has metadata split-brain. So let all
the fd based fops do a defensive check of complete split-brain.
Change-Id: Ia90b35f2b08426dfcad804b7f8105278c86fbd2d
BUG: 846240
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4548
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no necessity for the delayed-post-op to wait until
the next fop phase on the fd completes. Change-log,
locks are inherited by the time next fop phase is attempted
so the wakeup can happen just before the fop phase is started.
Change-Id: I0b8e591f591b0f7565eb55265ab51f476ed2b165
BUG: 908302
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4073
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally inode-write fops do transaction.unwind then
transaction.resume, but writev needs to make sure that
delayed post-op frame is placed in fdctx before unwind
happens. This prevents the race of flush doing the
changelog wakeup first in fuse thread and then this
writev placing its delayed post-op frame in fdctx.
This helps flush make sure all the delayed post-ops are
completed.
Change-Id: Ia78ca556f69cab3073c21172bb15f34ff8c3f4be
BUG: 888174
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4428
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current failure to handle short writes on writev fops leaves
us open to file corruption. A short write on a user request is
ignored and leaves replicas in an inconsistent state. A short write
during a self-heal is ignored and incorrectly marks the files as
consistent if the heal completes.
Modify user writev handling to return the best case return value
from each of the replicas. Short writes that occur relative to this
value are marked as failed and will require a heal. Modify
self-heal to set an error on a short write and abort the heal.
BUG: 853690
Change-Id: I18b30f58702326249230eeebb361b29e40b535f5
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4150
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values from all children need to be aggregated into a dictionary
and serialized buffer of this aggregated dictionary has to be
the value of GF_XATTR_LOCKINFO_KEY in the dict sent as a result of
fgetxattr.
Change-Id: Ie877f7c637c07feaee4c44d7ef86aa967a17b7e7
BUG: 808400
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Reviewed-on: http://review.gluster.org/4121
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the callings of GF_CALLOC can seldom come to a failure, glusterfs client
will crash due to segment fault. We should have returned once the variables
of transaction's local can't be alloced.
Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b
BUG: 861335
Signed-off-by: linbaiye <linbaiye@gmail.com>
Reviewed-on: http://review.gluster.org/4005
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* ie, don't dereference dict_t pointer, instead use APIs everywhere
* other than dict_t only 'data_t' should be the valid export from dict.h
* added 'dict_foreach_fnmatch()' API
* changed dict_lookup() to use data_t, instead of data_pair_t
Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 850917
Reviewed-on: http://review.gluster.org/3829
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
See comments in http://bugzilla.redhat.com/839925 for
the code to perform this change.
Signed-off-by: Jim Meyering <meyering@redhat.com>
BUG: 839925
Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a
Reviewed-on: http://review.gluster.com/3661
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that the license was not changed in any of the following:
.../argp-standalone/...
.../booster/...
.../cli/...
.../contrib/...
.../extras/...
.../glusterfsd/...
.../glusterfs-hadoop/...
.../mod_clusterfs/...
.../scheduler/...
.../swift/...
The license was not changed in any of the non-building xlators. The
license was not changed in any of the xlators that seemed — to me — to
be clearly server-side only, e.g. protocol/server
Note too that copyright was changed along with the license; I did
not change the copyright in files where the license did not change.
If you find any errors or ommissions please don't hesitate to let me know.
The complete list of files with the license change is:
libglusterfs/src/byte-order.h
libglusterfs/src/call-stub.c
libglusterfs/src/call-stub.h
libglusterfs/src/checksum.c
libglusterfs/src/checksum.h
libglusterfs/src/circ-buff.c
libglusterfs/src/circ-buff.h
libglusterfs/src/common-utils.c
libglusterfs/src/common-utils.h
libglusterfs/src/compat-errno.c
libglusterfs/src/compat-errno.h
libglusterfs/src/compat.c
libglusterfs/src/compat.h
libglusterfs/src/daemon.c
libglusterfs/src/daemon.h
libglusterfs/src/defaults.c
libglusterfs/src/defaults.h
libglusterfs/src/dict.c
libglusterfs/src/dict.h
libglusterfs/src/event-history.c
libglusterfs/src/event-history.h
libglusterfs/src/event.c
libglusterfs/src/event.h
libglusterfs/src/fd-lk.c
libglusterfs/src/fd-lk.h
libglusterfs/src/fd.c
libglusterfs/src/fd.h
libglusterfs/src/gf-dirent.c
libglusterfs/src/gf-dirent.h
libglusterfs/src/globals.c
libglusterfs/src/globals.h
libglusterfs/src/glusterfs.h
libglusterfs/src/graph-print.c
libglusterfs/src/graph-utils.h
libglusterfs/src/graph.c
libglusterfs/src/hashfn.c
libglusterfs/src/hashfn.h
libglusterfs/src/iatt.h
libglusterfs/src/inode.c
libglusterfs/src/inode.h
libglusterfs/src/iobuf.c
libglusterfs/src/iobuf.h
libglusterfs/src/latency.c
libglusterfs/src/latency.h
libglusterfs/src/list.h
libglusterfs/src/lkowner.h
libglusterfs/src/locking.h
libglusterfs/src/logging.c
libglusterfs/src/logging.h
libglusterfs/src/mem-pool.c
libglusterfs/src/mem-pool.h
libglusterfs/src/mem-types.h
libglusterfs/src/options.c
libglusterfs/src/options.h
libglusterfs/src/rbthash.c
libglusterfs/src/rbthash.h
libglusterfs/src/run.c
libglusterfs/src/run.h
libglusterfs/src/scheduler.c
libglusterfs/src/scheduler.h
libglusterfs/src/stack.c
libglusterfs/src/stack.h
libglusterfs/src/statedump.c
libglusterfs/src/statedump.h
libglusterfs/src/syncop.c
libglusterfs/src/syncop.h
libglusterfs/src/syscall.c
libglusterfs/src/syscall.h
libglusterfs/src/timer.c
libglusterfs/src/timer.h
libglusterfs/src/trie.c
libglusterfs/src/trie.h
libglusterfs/src/xlator.c
libglusterfs/src/xlator.h
libglusterfsclient/src/libglusterfsclient-dentry.c
libglusterfsclient/src/libglusterfsclient-internals.h
libglusterfsclient/src/libglusterfsclient.c
libglusterfsclient/src/libglusterfsclient.h
rpc/rpc-lib/src/auth-glusterfs.c
rpc/rpc-lib/src/auth-null.c
rpc/rpc-lib/src/auth-unix.c
rpc/rpc-lib/src/protocol-common.h
rpc/rpc-lib/src/rpc-clnt.c
rpc/rpc-lib/src/rpc-clnt.h
rpc/rpc-lib/src/rpc-transport.c
rpc/rpc-lib/src/rpc-transport.h
rpc/rpc-lib/src/rpcsvc-auth.c
rpc/rpc-lib/src/rpcsvc-common.h
rpc/rpc-lib/src/rpcsvc.c
rpc/rpc-lib/src/rpcsvc.h
rpc/rpc-lib/src/xdr-common.h
rpc/rpc-lib/src/xdr-rpc.c
rpc/rpc-lib/src/xdr-rpc.h
rpc/rpc-lib/src/xdr-rpcclnt.c
rpc/rpc-lib/src/xdr-rpcclnt.h
rpc/rpc-transport/rdma/src/name.c
rpc/rpc-transport/rdma/src/name.h
rpc/rpc-transport/rdma/src/rdma.c
rpc/rpc-transport/rdma/src/rdma.h
rpc/rpc-transport/socket/src/name.c
rpc/rpc-transport/socket/src/name.h
rpc/rpc-transport/socket/src/socket.c
rpc/rpc-transport/socket/src/socket.h
xlators/cluster/afr/src/afr-common.c
xlators/cluster/afr/src/afr-dir-read.c
xlators/cluster/afr/src/afr-dir-read.h
xlators/cluster/afr/src/afr-dir-write.c
xlators/cluster/afr/src/afr-dir-write.h
xlators/cluster/afr/src/afr-inode-read.c
xlators/cluster/afr/src/afr-inode-read.h
xlators/cluster/afr/src/afr-inode-write.c
xlators/cluster/afr/src/afr-inode-write.h
xlators/cluster/afr/src/afr-lk-common.c
xlators/cluster/afr/src/afr-mem-types.h
xlators/cluster/afr/src/afr-open.c
xlators/cluster/afr/src/afr-self-heal-algorithm.c
xlators/cluster/afr/src/afr-self-heal-algorithm.h
xlators/cluster/afr/src/afr-self-heal-common.c
xlators/cluster/afr/src/afr-self-heal-common.h
xlators/cluster/afr/src/afr-self-heal-data.c
xlators/cluster/afr/src/afr-self-heal-entry.c
xlators/cluster/afr/src/afr-self-heal-metadata.c
xlators/cluster/afr/src/afr-self-heal.h
xlators/cluster/afr/src/afr-self-heald.c
xlators/cluster/afr/src/afr-self-heald.h
xlators/cluster/afr/src/afr-transaction.c
xlators/cluster/afr/src/afr-transaction.h
xlators/cluster/afr/src/afr.c
xlators/cluster/afr/src/afr.h
xlators/cluster/afr/src/pump.c
xlators/cluster/afr/src/pump.h
xlators/cluster/dht/src/dht-common.c
xlators/cluster/dht/src/dht-common.h
xlators/cluster/dht/src/dht-diskusage.c
xlators/cluster/dht/src/dht-hashfn.c
xlators/cluster/dht/src/dht-helper.c
xlators/cluster/dht/src/dht-inode-read.c
xlators/cluster/dht/src/dht-inode-write.c
xlators/cluster/dht/src/dht-layout.c
xlators/cluster/dht/src/dht-linkfile.c
xlators/cluster/dht/src/dht-mem-types.h
xlators/cluster/dht/src/dht-rebalance.c
xlators/cluster/dht/src/dht-rename.c
xlators/cluster/dht/src/dht-selfheal.c
xlators/cluster/dht/src/dht.c
xlators/cluster/dht/src/nufa.c
xlators/cluster/dht/src/switch.c
xlators/cluster/stripe/src/stripe-helpers.c
xlators/cluster/stripe/src/stripe-mem-types.h
xlators/cluster/stripe/src/stripe.c
xlators/cluster/stripe/src/stripe.h
xlators/features/index/src/index-mem-types.h ¹
xlators/features/index/src/index.c ¹
xlators/features/index/src/index.h ¹
xlators/performance/io-cache/src/io-cache.c
xlators/performance/io-cache/src/io-cache.h
xlators/performance/io-cache/src/ioc-inode.c
xlators/performance/io-cache/src/ioc-mem-types.h
xlators/performance/io-cache/src/page.c
xlators/performance/io-threads/src/io-threads.c
xlators/performance/io-threads/src/io-threads.h
xlators/performance/io-threads/src/iot-mem-types.h
xlators/performance/md-cache/src/md-cache-mem-types.h
xlators/performance/md-cache/src/md-cache.c
xlators/performance/quick-read/src/quick-read-mem-types.h
xlators/performance/quick-read/src/quick-read.c
xlators/performance/quick-read/src/quick-read.h
xlators/performance/read-ahead/src/page.c
xlators/performance/read-ahead/src/read-ahead-mem-types.h
xlators/performance/read-ahead/src/read-ahead.c
xlators/performance/read-ahead/src/read-ahead.h
xlators/performance/symlink-cache/src/symlink-cache.c
xlators/performance/write-behind/src/write-behind-mem-types.h
xlators/performance/write-behind/src/write-behind.c
xlators/protocol/auth/addr/src/addr.c ¹
xlators/protocol/auth/login/src/login.c ¹
xlators/protocol/client/src/client-callback.c
xlators/protocol/client/src/client-handshake.c
xlators/protocol/client/src/client-helpers.c
xlators/protocol/client/src/client-lk.c
xlators/protocol/client/src/client-mem-types.h
xlators/protocol/client/src/client.c
xlators/protocol/client/src/client.h
xlators/protocol/client/src/client3_1-fops.c
¹ Copyright only, license reverted to original
Change-Id: If560e826c61b6b26f8b9af7bed6e4bcbaeba31a8
BUG: 820551
Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.com/3304
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ie009fb4b8b7745ebd5b76f7a40287998be35eef3
BUG: 804914
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3045
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with this change, the xlator APIs will have a dictionary as extra
argument, which is passed between all the layers. This can be
utilized for overloading in some of the operations.
Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 782265
Reviewed-on: http://review.gluster.com/2960
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Each xlator prevents the user from removing xlator-specific
xattrs like trusted.gfid by handling it in respective removexattr
functions.
* For xlators which did not define remove and fremovexattr,
the functions have been implemented with appropriate checks.
xlator | fops-added
_______________|__________________________
|
1. stripe | removexattr and fremovexattr
2. quota | removexattr and fremovexattr
Change-Id: I98e22109717978134378bc75b2eca83fefb2abba
BUG: 783525
Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com>
Reviewed-on: http://review.gluster.com/2836
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in each translator, which uses 'frame->local', we are using
GF_CALLOC/GF_FREE, which would be costly considering the
number of allocation happening in a lifetime of 'fop'. It
would be good to utilize the mem pool framework for xlator's
local structures, so there is no allocation overhead.
Change-Id: Ida6e65039a24d9c219b380aa1c3559f36046dc94
Signed-off-by: Amar Tumballi <amar@gluster.com>
BUG: 765336
Reviewed-on: http://review.gluster.com/2772
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
needed to implement a proper handling of open flag alterations
using fcntl() on fd.
Change-Id: Ic280d5db6f1dc0418d5c439abb8db1d3ac21ced0
Signed-off-by: Amar Tumballi <amar@gluster.com>
BUG: 782265
Reviewed-on: http://review.gluster.com/2723
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in debug/* and cluster/* translators and a syncop_fsetxattr()
added a test case for testing the working of 'f-fop()' on
fuse mount.
Change-Id: I0c2aeeb30a0fb382ef2495cca1e66b00abaffd35
Signed-off-by: Amar Tumballi <amar@gluster.com>
BUG: 766571
Reviewed-on: http://review.gluster.com/802
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
so operations can be done on fd for extended attribute removal
Change-Id: Ie026f1b53793aeb4ae33e96ea5408c7a97f34bf6
Signed-off-by: Amar Tumballi <amar@gluster.com>
BUG: 766571
Reviewed-on: http://review.gluster.com/778
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. What
--------
This change introduces an infrastructure change in the filesystem
which lets filesystem operation address objects (inodes) just by its
GFID. Thus far GFID has been a unique identifier of a user-visible
inode. But in terms of addressability the only mechanism thus far has
been the backend filesystem path, which could be derived from the
GFID only if it was cached in the inode table along with the entire set
of dentry ancestry leading up to the root.
This change essentially decouples addressability from the namespace. It
is no more necessary to be aware of the parent directory to address a
file or directory.
2. Why
-------
The biggest use case for such a feature is NFS for generating
persistent filehandles. So far the technique for generating filehandles
in NFS has been to encode path components so that the appropriate
inode_t can be repopulated into the inode table by means of a recursive
lookup of each component top-down.
Another use case is the ability to perform more intelligent self-healing
and rebalancing of inodes with hardlinks and also to detect renames.
A derived feature from GFID filehandles is anonymous FDs. An anonymous FD
is an internal USABLE "fd_t" which does not map to a user opened file
descriptor or to an internal ->open()'d fd. The ability to address a file
by the GFID eliminates the need to have a persistent ->open()'d fd for the
purpose of avoiding the namespace. This improves NFS read/write performance
significantly eliminating open/close calls and also fixes some of today's
limitations (like keeping an FD open longer than necessary resulting
in disk space leakage)
3. How
-------
At each storage/posix translator level, every file is hardlinked inside
a hidden .glusterfs directory (under the top level export) with the name
as the ascii-encoded standard UUID format string. For reasons of performance
and scalability there is a two-tier classification of those hardlinks
under directories with the initial parts of the UUID string as the directory
names.
For directories (which cannot be hardlinked), the approach is to use a symlink
which dereferences the parent GFID path along with basename of the directory.
The parent GFID dereference will in turn be a dereference of the grandparent
with the parent's basename, and so on recursively up to the root export.
4. Development
---------------
4a. To leverage the ability to address an inode by its GFID, the technique is
to perform a "nameless lookup". This means, to populate a loc_t structure as:
loc_t {
pargfid: NULL
parent: NULL
name: NULL
path: NULL
gfid: GFID to be looked up [out parameter]
inode: inode_new () result [in parameter]
}
and performing such lookup will return in its callback an inode_t
populated with the right contexts and a struct iatt which can be
used to perform an inode_link () on the inode (without a parent and
basename). The inode will now be hashed and linked in the inode table
and findable via inode_find().
A fundamental change moving forward is that the primary fields in a
loc_t structure are now going to be (pargfid, name) and (gfid) depending
on the kind of FOP. So far path had been the primary field for operations.
The remaining fields only serve as hints/helpers.
4b. If read/write is to be performed on an inode_t, the approach so far
has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and
then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do
fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost
in performance in the inbuilt NFS server.
5. Misc
-------
The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent
with the rest of the codebase.
Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f
BUG: 781318
Reviewed-on: http://review.gluster.com/669
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In afr_open_fd_fix we were unlocking the local->fd->lock, without
holding the lock on it if we were not able to get the fd context.
Now we are directly going to out and returning, instead of going
to unlock without holding the lock.
Change-Id: I0da638bbd2c269127cf111b3aac707e4a95d20c6
BUG: 783036
Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com>
Reviewed-on: http://review.gluster.com/2658
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Each xlator prevents the user from setting glusterfs-internal
xattrs like trusted.gfid by handling it in respective setxattr
functions. The speacial case of trusted.gfid is handled in
fuse (Not in posix because posix_setxattr is used to set gfid).
* For xlators which did not define setxattr and/or fsetxattr,
the functions have been implemented with appropriate checks.
xlator | fops-added
_______________|__________________________
|
1. afr | fsetxattr
2. stripe | setxatrr and fsetxattr
3. quota | setxattr and fsetxattr
Change-Id: Ib62abb7067415b23a708002f884d30e8866fbf48
BUG: 765487
Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com>
Reviewed-on: http://review.gluster.com/685
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fop should unwind with appropriate errno
- Local is de-allocated on errors
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Change-Id: I4db40342ae184fe1cc29e51072e8fea72ef2cb15
BUG: 770513
Reviewed-on: http://review.gluster.com/2539
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ia52ddb551e24c27969f7f5fa0f94c1044789731f
BUG: 3823
Reviewed-on: http://review.gluster.com/743
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I2f123ef93989862aa796903a45682981d5d7fc3c
BUG: 3533
Reviewed-on: http://review.gluster.com/473
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I425e2d23e9e45f10ddeff2eacf918dd90f8baee7
BUG: 3744
Reviewed-on: http://review.gluster.com/639
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Afr transaction performs lock, pre-op, op, post-op and unlock steps in that
order. The child_up[] is overloaded with the information of where all
the first two steps succeeded. This works perfectly fine for
Transaction, but the locking/unlocking part of the code is re-used by
data self-heal. In that each loop_frame does lock, rchecksum,
read-from-source and write-to-sinks, unlock steps.
Rchecksum fop assumes that the fop needs to happen on one source + all
sinks and sets the call_count to that number. But if the lock step fails
on any of the sinks it will mark the child_up of that child to 0, which
will result in call_count mismatch and the frame will hang thinking that
some more cbks need to come. When this happens loop_frame will never go
to unlock step leading to hangs on that file.
Change-Id: I3dd0449cc6193a980bacf637d935881f4b22210a
BUG: 3597
Reviewed-on: http://review.gluster.com/474
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, lookup triggers data self-heal but that is not the preferred way
of operating replicated volumes. We would like the data self heals to be
triggered in open instead.
Number of back-ground self-heals allowed is 16 and lookups block until
self-heal is completed. We want to prevent blocking in fops. We can not make
lookups independent of self-heal frames because when there are gfid conflicts
the decision of which file is correct is determined in self-heal phase.
So in afr, lookup self-heal is going to guarantee name space consistency
and open/fd fops will take responsibility for data consistency, these
are non blocking. The user needs to set the option cluster.data-self-heal
"open" for this behavior.
Change-Id: If9463cdb9ebac114708558ec13bbca0270acd659
BUG: 3503
Reviewed-on: http://review.gluster.com/334
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Idce22a6266c354e327d5d717715d2e62533eec58
BUG: 3448
Reviewed-on: http://review.gluster.com/292
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ie026ebed98cf5ff75ae1a13437d29f67d0e0254a
BUG: 3448
Reviewed-on: http://review.gluster.com/286
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendrabhat@gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I861b3c4494735b0ba6e038cdc39c50b9866747a8
BUG: 3448
Reviewed-on: http://review.gluster.com/283
Reviewed-by: Raghavendra Bhat <raghavendrabhat@gluster.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I206571c77f2d7b3c9f9d7bb82a936366fd99ce5c
BUG: 3182
Reviewed-on: http://review.gluster.com/141
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I2d10f2be44f518f496427f257988f1858e888084
BUG: 3348
Reviewed-on: http://review.gluster.com/200
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I3914467611e573cccee0d22df93920cf1b2eb79f
BUG: 3348
Reviewed-on: http://review.gluster.com/182
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2840 (files not getting self-healed when the first child goes down)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2840
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3077 (afr [f]truncate locks wrong region in transaction)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3077
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
number
take the least significant 64bit from gfid and assign it to 'ia_ino',
hence for a given file (or directory), the 'ia_ino' number is always
same, and we need not worry about the 'itransform' in 'cluster/*'
translators.
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3042 (inode number should be constant on storage)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3042
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2346 (Log message enhancements in GlusterFS - phase 1)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2346 (Log message enhancements in GlusterFS - phase 1)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vijay Bellur <vijay@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 971 (dynamic volume management)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1388 ()
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1388
|