| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose of this patch is to let protocol/client know when its transports can
be disconnected, without application running on gluster mount noticing any
effects of graph switch.
In order to do this, we migrate all fds and blocked locks to new graph.
Once this migration is complete and there are no in-transit frames as viewed
by fuse-bridge, we send a PARENT_DOWN event to its children. protocol/client
on receiving this event, can disconnect up its transports.
Change-Id: Idcea4bc43e23fb077ac16538b61335ebad84ba16
BUG: 767862
Reviewed-on: http://review.gluster.com/2734
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ic31b8bb10a28408da2a623f4ecc0c60af01c64af
BUG: 795421
Signed-off-by: Krishna Srinivas <ksriniva@redhat.com>
Reviewed-on: http://review.gluster.com/2711
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently(with out this patch), on a disconnect the server cleans up
the transport which inturn closes the fd's and releases the locks acquired on
those fd's by that client. On a reconnect, client just reopens the fd's but
doesn't reacquire the locks. The application that had previously acquired
the locks still is under the assumption that it is the owner of those locks
which might have been granted to other clients(if they request) by the server
leading to data corruption.
This patch allows the client to reacquire the fcntl locks (held on the fd's)
during client-server handshake.
* The server identifies the client via process-uuid-xl (which is a combination
of uuid and client-protocol name, it is assumed to be unique) and lk-version
number.
* The client maintains a list of process-uuid-xl, lk-version pair for each
accepted connection. On a connect, the server traverses the list for a
matching pair, if a matching pair is not found the the server returns
lk-version with value 0, else it returns the lk-version it has in store.
* On a disconnect, the server and client enter grace period, and on the
completion of the grace period, the client bumps up its lk-version number
(which means, it will reacquire the locks the next time) and the server will
distroy the connection. If reconnection happens within the grace period, the
server will find the matching (process-uuid-xl, lk-version) pair in its list
which guarantees that the fd's and there corresponding locks are still valid
for this client.
Configurable options:
To set grace-timeout, the following options are
option server.grace-timeout value
option client.grace-timeout value
To enable or disable the lk-heal,
option lk-heal [on|off]
gluster volume set command can be used to configurable options
Change-Id: Id677ef1087b300d649f278b8b2aa0d94eae85ed2
BUG: 795386
Signed-off-by: Mohammed Junaid <junaid@redhat.com>
Reviewed-on: http://review.gluster.com/2766
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. What
--------
This change introduces an infrastructure change in the filesystem
which lets filesystem operation address objects (inodes) just by its
GFID. Thus far GFID has been a unique identifier of a user-visible
inode. But in terms of addressability the only mechanism thus far has
been the backend filesystem path, which could be derived from the
GFID only if it was cached in the inode table along with the entire set
of dentry ancestry leading up to the root.
This change essentially decouples addressability from the namespace. It
is no more necessary to be aware of the parent directory to address a
file or directory.
2. Why
-------
The biggest use case for such a feature is NFS for generating
persistent filehandles. So far the technique for generating filehandles
in NFS has been to encode path components so that the appropriate
inode_t can be repopulated into the inode table by means of a recursive
lookup of each component top-down.
Another use case is the ability to perform more intelligent self-healing
and rebalancing of inodes with hardlinks and also to detect renames.
A derived feature from GFID filehandles is anonymous FDs. An anonymous FD
is an internal USABLE "fd_t" which does not map to a user opened file
descriptor or to an internal ->open()'d fd. The ability to address a file
by the GFID eliminates the need to have a persistent ->open()'d fd for the
purpose of avoiding the namespace. This improves NFS read/write performance
significantly eliminating open/close calls and also fixes some of today's
limitations (like keeping an FD open longer than necessary resulting
in disk space leakage)
3. How
-------
At each storage/posix translator level, every file is hardlinked inside
a hidden .glusterfs directory (under the top level export) with the name
as the ascii-encoded standard UUID format string. For reasons of performance
and scalability there is a two-tier classification of those hardlinks
under directories with the initial parts of the UUID string as the directory
names.
For directories (which cannot be hardlinked), the approach is to use a symlink
which dereferences the parent GFID path along with basename of the directory.
The parent GFID dereference will in turn be a dereference of the grandparent
with the parent's basename, and so on recursively up to the root export.
4. Development
---------------
4a. To leverage the ability to address an inode by its GFID, the technique is
to perform a "nameless lookup". This means, to populate a loc_t structure as:
loc_t {
pargfid: NULL
parent: NULL
name: NULL
path: NULL
gfid: GFID to be looked up [out parameter]
inode: inode_new () result [in parameter]
}
and performing such lookup will return in its callback an inode_t
populated with the right contexts and a struct iatt which can be
used to perform an inode_link () on the inode (without a parent and
basename). The inode will now be hashed and linked in the inode table
and findable via inode_find().
A fundamental change moving forward is that the primary fields in a
loc_t structure are now going to be (pargfid, name) and (gfid) depending
on the kind of FOP. So far path had been the primary field for operations.
The remaining fields only serve as hints/helpers.
4b. If read/write is to be performed on an inode_t, the approach so far
has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and
then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do
fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost
in performance in the inbuilt NFS server.
5. Misc
-------
The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent
with the rest of the codebase.
Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f
BUG: 781318
Reviewed-on: http://review.gluster.com/669
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I2d10f2be44f518f496427f257988f1858e888084
BUG: 3348
Reviewed-on: http://review.gluster.com/200
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I3914467611e573cccee0d22df93920cf1b2eb79f
BUG: 3348
Reviewed-on: http://review.gluster.com/182
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revert "nfs3: Unref & unbind dir fd with inode lock on EOF"
This reverts commit 4e6fb304ce41acbaf7c9ba67c06bf443e65082e8.
The above commit (which unbinds fds at EOF) does not fix the original
bug (1619) because a readdir from a second app could have already
started before the readdir_cbk of the first app's readdir reaches
NFS code. Hence the race still exists.
Performing extra unrefs when EOF is received is not a reliable way
of detecting that a client has performed a closedir (and to close
the fd ourselves). Neither is interpreting a 0 cookies a new opendir.
Clients can always use telldir/seekdir and hit EOFs twice.
Due to the way NFS3 protocol is designed, it is just not possible
for the server to reliably detect opendirs/closedirs performed by
the client and map the corresponding readdirs to the same dir fd
on the server side.
The only reliable way of fixing this is to perform opendir/closedir
at the cost of performance. Any optimization towards keeping dir fds
open attempting to map them with application's opendir/closedir will
either result in fd leaks or extra fd unrefs.
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 2061 (NFS server crashes in readdir_fstat_cbk due to extra fd unref)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2061
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now the code has lived up to the glorious state of not relying
on getting the FLUSH whenever a file is released. So we don't need
to forge one in release for the cases when the kernel doesn't send
it.
Undo commits:
- 155ffe5c
- c50bc710
- b8779318 (partly, just release related parts)
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 223 (flush not sent)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=223
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vijay Bellur <vijay@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 971 (dynamic volume management)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1388 ()
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
..so that when EOF is reached on this fd, any further
requests on the same inode do not get handled through this
fd but result in a new fd being opened.
Unbinding results in the fd getting deleted from the inode's fd list.
Signed-off-by: Shehjar Tikoo <shehjart@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1619 (glusterfs nfs server crashed on dht+replica(2x2))
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1619
|
|
|
|
|
|
|
|
| |
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1059 (enhancements for getting statistics from performance translators)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1059
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now fuse is fully complaint with DVM, as even if there is a fop
request on inode belonging to old graph, it will be resolved
corresponding to new graph and operations will be performed wrt.
new graph, which makes DVM truely spontaneous.
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 1240 (DVM: after graph change, inodes should resolve to new inode-table)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1240
|
|
|
|
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 971 (dynamic volume management)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
|
|
|
|
|
| |
from the kernel
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 223 (flush not sent)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=223
|
|
|
|
|
|
|
|
|
| |
Removing unused 'dict_t *ctx' from both inode and fd structures.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 128 (cleanup unwanted ctx dictionary in 'inode' and 'fd' structures.)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=128
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
|
|
|
|
|
| |
This change is being brought in so that we can
differentiate between fdentry_ts when debugging using
gdb.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
|
|
|
|
|
| |
This commit reduces CPU usage of gf_fd_unused_get drastically by
making it O(1) instead of O(n).
Related to: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=16
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
|
|
|
|
| |
Added __fd_ctx_get
__fd_ctx_set
__fd_ctx_del which do not hold any lock.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
- first phase, which happens when POLLERR is received on transport,
releases all locks, flushes all open fds.
- second phase, which happens when both the transports of connection destroyed,
destroys the containers like lock table, fd table along with the connection.
- the first phase, clears up any references to transport held by translators
like posix-locks(in the form of blocked locks) paving way for the second phase.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
| |
updated copyright header to include 2009.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
| |
fd->_ctx access and modifications are now protected by fd->lock.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|