summaryrefslogtreecommitdiffstats
path: root/xlators/protocol/server
Commit message (Collapse)AuthorAgeFilesLines
...
* protocol/server: validate connection object before dereferencingAmar Tumballi2012-04-111-19/+10
| | | | | | | | | | | | | | | in 'release()' and 'releasedir()' fops the check for 'connection object' was not done before dereferencing it. the check was in place for all other fops. handling the missing cases now. also removed some warnings related to 'set-but-unused' Change-Id: I47b95318e8f2f28233179be509ce090b2fb7276d Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 801411 Reviewed-on: http://review.gluster.com/3125 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fix compiler warnings and typos from Debian build.Jeff Darcy2012-04-101-1/+1
| | | | | | | | | | | | | Mostly to do with "-Werror=format-security" being buggy, but while we're here we might as well fix some typos and such. Credit goes to Patrick Matthäi <pmatthaei@debian.org> for pointing these out. Change-Id: Ia32d1111d7c10b1f213df85d86b17a1326248ffd BUG: 811387 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/3117 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Replace GPLV3 MD5 with OpenSSL MD5Kaleb KEITHLEY2012-04-042-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ric asked me to look at replacing the GPL licensed MD5 code with something better, i.e. perhaps faster, and with a less restrictive license, etc. So I took a couple hour holiday from working on wrapping up the client_t and did this. OpenSSL (nee SSLeay) is released under the OpenSSL license, a BSD/MIT style license. OpenSSL (libcrypto.so) is used on Linux, OS X and *BSD, Open Solaris, etc. IOW it's universally available on the platforms we care about. It's written by Eric Young (eay), now at EMC/RSA, and I can say from experience that the OpenSSL implementation of MD5 (at least) is every bit as fast as RSA's proprietary implementation (primarily because the implementations are very, very similar.) The last time I surveyed MD5 implementations I found they're all pretty much the same speed. I changed the APIs (and ABIs) for the strong and weak checksums. Strictly speaking I didn't need to do that. They're only called on short strings of data, i.e. pathnames, so using int32_t and uint32_t is ostensibly okay. My change is arguably a better, more general API for this sort of thing. It's also what bit me when gerrit/jenkins validation failed due to glusterfs segv-ing. (I didn't pay close enough attention to the implementation of the weak checksum. But it forced me to learn what gerrit/jenkins are doing and going forward I can do better testing before submitting to gerrit.) Now resubmitting with a BZ Change-Id: I545fade1604e74fc68399894550229bd57a5e0df BUG: 807718 Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.com/3019 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cli, glusterd: "volume set help" additionsKaushal M2012-03-281-3/+11
| | | | | | | | | | | | "auth.allow/reject" and "server.statedump" options are included in "volume set help" now. Change-Id: I7f9ba89d1782c26792347ffd2cd4042c3c396934 BUG: 783390 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/3025 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: adding extra data for fopsAmar Tumballi2012-03-224-407/+1049
| | | | | | | | | | | | | with this change, the xlator APIs will have a dictionary as extra argument, which is passed between all the layers. This can be utilized for overloading in some of the operations. Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2960 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/server: remove the transport from the list irrespective of ↵Raghavendra Bhat2012-03-201-6/+11
| | | | | | | | | | | | | | | lock-heal is on/off Upon getting disconnect, remove the transport from the list of transports that protocol server maintains irrespective of whether lock-heal is on or off, since upon reconnect a new transport would be created. Change-Id: I18a269a93487d6e2cb11c345b6dde813d089809c BUG: 803815 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2980 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* protocol/server: Handle server send reply failure gracefully.Mohammed Junaid2012-03-192-22/+64
| | | | | | | | | | | | | Server send reply failure should not call server connection cleanup because if a reconnection happens with in the grace-timeout the connection object is reused. We must cleanup only on grace-timeout. Change-Id: I7d171a863382646ff392031c2b845fe4f0d3d5dc BUG: 803365 Signed-off-by: Mohammed Junaid <junaid@redhat.com> Reviewed-on: http://review.gluster.com/2947 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* print bound_xl only once in server statedumpRahul C S2012-03-191-0/+12
| | | | | | | | | | | | | since, currently there is only one bound_xl for all connection objects, it does not make sense to print the bound_xl for every conn object. Change-Id: Iaf4a170fe63fb7e80133c449a20f298f0e86364c BUG: 765495 Signed-off-by: Rahul C S <rahulcs@redhat.com> Reviewed-on: http://review.gluster.com/2975 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Logs: Improved logs in lock/unlock execution pathPranith Kumar K2012-03-182-2/+3
| | | | | | | | | | | Statedump will now start showing the lk-owner of the stack. Change-Id: I9f650ce9a8b528cd626c8bb595c1bd1050462c86 BUG: 803209 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2968 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/server: Clear internal locks on disconnectPranith Kumar K2012-03-184-12/+26
| | | | | | | | | | | | | | | If there is a disconnect observed on the client when the inode/entry unlock is issued, but the reconnection to server happens with in the grace-time period the inode/entry lk will live and the unlock will never come from that client. The internal locks should be cleared on disconnect. Change-Id: Ib45b1035cfe3b1de381ef3b331c930011e7403be BUG: 803209 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2966 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* protocol/server: send forget on the renamed inodeRaghavendra Bhat2012-03-181-0/+28
| | | | | | | | | | | | | | | | | | | | If rename is given on a file "a" to "b" ("b" is already existing file), then after rename, the inode for "b" would still be in the inode table and would not get forget (for fuse client, the fuse kernel module would send, but on server forget will not come on that inode), thus leading to inode leak even when the mount point is empty. To avoid that before doing inode rename, unlink the previous inode that "b" is pointing to and send forget on that, if "b" is the last dentry for that inode. Change-Id: Ie4dcc39ea190ee8f28029b4d7661df576d9cf319 BUG: 799833 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2874 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd : volume set validation fixesKaushal M2012-03-181-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | This is the new version of the patch by Kaushik at review.gluster.com/699 The following new option types have been introduced: * GF_OPTION_TYPE_INTERNET_ADDRESS_LIST * GF_OPTION_TYPE_PRIORITY_LIST * GF_OPTION_TYPE_SIZE_LIST and option types of several options in translators have been updated to use the new types. valid_internet_address(), valid_ipv4_address() & valid_ipv6_address() functions has been updated for * wildcard matching. Previously used standalone wildcard address checking functions have been removed. Changes have been done to stripe translator to correctly set, update and use stripe-blocksize. Also minimum value for block-size has been set to 16KB. Change-Id: I2aa484ff695f6a915a8fc9a9f965cf0344f41d59 BUG: 765248 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/2899 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <shishirng@gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/server: memory leak fixes.Raghavendra G2012-03-172-0/+9
| | | | | | | | | | Change-Id: I203832d9d52373f068f90e30dc7672329d65bbea BUG: 803675 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.com/2954 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/server: add and remove the transports from the list, inside the lockRaghavendra Bhat2012-03-172-26/+54
| | | | | | | | | | | | | | | | | | | | | | | Till now for graph changes, glusterfs client used to remember the old graph also. Hence the transport object on the server corresponding the old graph never received disconnect. But now since the graph cleanup is happening, transport on the server side gets disconnect for the cleaned up graph. Server maintains, all the transports in a list. But addition of the new transport to the list, or removal of the transport from the list is not happening within the lock. Thus if a thread is accessing a transport (in cases of statedump, where each transprt's information is dumped), and the server gets a disconnect on that transport, then it leads to segfault of the process. To avoid it do the list (of transports) manipulation inside the lock. Change-Id: I50e8389d5ec8f1c52b8d401ef8c8ddd262e82548 BUG: 803815 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2958 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/server: Avoid race in add/del locker, connection_cleanupPranith Kumar K2012-03-174-38/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | conn->ltable address keeps changing in server_connection_cleanup every time it is called. i.e. New ltable is created every time it is called. Here is the race that happened: --------------------------------------------------- thread-1 | thread-2 add_locker is called with | conn->ltable. lets call the | ltable address lt1 | | connection cleanup is called | and do_lock_table_cleanup is | triggered for lt1. locker | lists are splice_inited under | the lt1->lock lt1 adds the locker under | lt1->lock (lets call this l1) | | GF_FREE(lt1) happens in | do_lock_table_cleanup The locker l1 that is added just before lt1 is freed will never be cleared in the subsequent server_connection_cleanups as there does not exist a reference to the locker. The stale lock remains in the locks xlator even though the transport on which it was issued is destroyed. Change-Id: I0a02f16c703d1e7598b083aa1057cda9624eb3fe BUG: 787601 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2957 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/server: Remove connection from conf->conns w.o. racePranith Kumar K2012-03-134-8/+47
| | | | | | | | | | | | | | | | | | | | | | | | 1) Adding the connection to conf->conns used to happen in conf->mutex, but removing happened under conn->lock. Fixed that as below. When the connection object is created conn's ref, bind_ref count is set to '1'. For bind_ref ref/unref happens under conf->mutex whenever server_connection_get, put is called. When bind_ref goes to '0' connection object is removed from conf->conns under conf->mutex. After it is removed from the list, conn_unref is called outside the conf->mutex. conn_ref/unref still happens under conn->lock. 2) Fixed races in server_connection_cleaup in grace_timer_handler and server_setvolume. Change-Id: Ie7b63b10f658af909a11c3327066667f5b7bd114 BUG: 801675 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2911 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* protocol/server: Make conn object ref-countedPranith Kumar K2012-03-015-198/+90
| | | | | | | | | Change-Id: I992a7f8a75edfe7d75afaa1abe0ad45e8f351c8b BUG: 796581 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2806 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* transport/socket: configuring tcp window-sizeRajesh Amaravathi2012-02-292-0/+7
| | | | | | | | | | | | | | | | | | | | | | | Till now, send and recieve buffer window sizes for sockets were set to a default glusterfs-specific value. Linux's default window sizes have been found to be better w.r.t performance, and hence, no more setting it to any default value. However, if one wishes, there's the new configuration option: network.tcp-window-size <sane_size> which takes a size value (int or human readable) and will set the window size of sockets for both clients and servers. Nfs clients will also be updated with the same. Change-Id: I841479bbaea791b01086c42f58401ed297ff16ea BUG: 795635 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/2821 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterfsd: unref the dict and free the memory to avoid memleakRaghavendra Bhat2012-02-271-0/+2
| | | | | | | | | | Change-Id: Ib7a1f8cbab039fefb73dc35560a035d5688b0e32 BUG: 796186 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* mempool: adjustments in pool sizesAmar Tumballi2012-02-222-4/+4
| | | | | | | | | | | | | | | | | * while creating 'rpc_clnt', the caller knows what would be the ideal load on it, so an extra argument to set some pool sizes * while creating 'rpcsvc', the caller knows what would be the ideal load of it, so an extra argument to set request pool size * cli memory footprint is reduced Change-Id: Ie245216525b450e3373ef55b654b4cd30741347f Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 765336 Reviewed-on: http://review.gluster.com/2784 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* protocol/client,server: fcntl lock self healing.Mohammed Junaid2012-02-206-21/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently(with out this patch), on a disconnect the server cleans up the transport which inturn closes the fd's and releases the locks acquired on those fd's by that client. On a reconnect, client just reopens the fd's but doesn't reacquire the locks. The application that had previously acquired the locks still is under the assumption that it is the owner of those locks which might have been granted to other clients(if they request) by the server leading to data corruption. This patch allows the client to reacquire the fcntl locks (held on the fd's) during client-server handshake. * The server identifies the client via process-uuid-xl (which is a combination of uuid and client-protocol name, it is assumed to be unique) and lk-version number. * The client maintains a list of process-uuid-xl, lk-version pair for each accepted connection. On a connect, the server traverses the list for a matching pair, if a matching pair is not found the the server returns lk-version with value 0, else it returns the lk-version it has in store. * On a disconnect, the server and client enter grace period, and on the completion of the grace period, the client bumps up its lk-version number (which means, it will reacquire the locks the next time) and the server will distroy the connection. If reconnection happens within the grace period, the server will find the matching (process-uuid-xl, lk-version) pair in its list which guarantees that the fd's and there corresponding locks are still valid for this client. Configurable options: To set grace-timeout, the following options are option server.grace-timeout value option client.grace-timeout value To enable or disable the lk-heal, option lk-heal [on|off] gluster volume set command can be used to configurable options Change-Id: Id677ef1087b300d649f278b8b2aa0d94eae85ed2 BUG: 795386 Signed-off-by: Mohammed Junaid <junaid@redhat.com> Reviewed-on: http://review.gluster.com/2766 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* protocol: remove the 'path<>' from rename() and link()Amar Tumballi2012-02-161-8/+0
| | | | | | | | | | | | missed it in the previous round of cleanup, path is completely useless in resolve function. Change-Id: I1aef0f5276afb77dfacfcc0c337ac80b4fcacc55 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 790298 Reviewed-on: http://review.gluster.com/2756 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: add an extra flag to readv()/writev() APIAmar Tumballi2012-02-141-2/+5
| | | | | | | | | | | | needed to implement a proper handling of open flag alterations using fcntl() on fd. Change-Id: Ic280d5db6f1dc0418d5c439abb8db1d3ac21ced0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2723 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol: code cleanupAmar Tumballi2012-02-141-437/+120
| | | | | | | | | | | make dict serialize and unserialization code a macro Change-Id: I459c77c6c1f54118c6c94390162670f4159b9690 BUG: 764890 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/2742 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli, protocol/server : improve validation for the option auth.(allow/reject)Kaushal M2012-02-051-1/+42
| | | | | | | | | | | | | | | | | | cli now checks validity of address list given for 'volume set auth.*' Server xlator checks addresses supplied to auth.(allow/reject) option including wildcards for correctness in case volfile is manually edited. Original patch done by shylesh@gluster.com Original patch is at http://patches.gluster.com/patch/7566/ Change-Id: Icf52d6eeef64d6632b15aa90a379fadacdf74fef BUG: 764197 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/306 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cli: Extend "volume status" with statedump infoKaushal M2012-01-271-3/+136
| | | | | | | | | | | | | | | | | This patch enhances and extends the "volume status" command with information obtained from the statedump of the bricks of volumes. Adds new status types : clients, inode, fd, mem, callpool The new syntax of "volume status" is, #gluster volume status [all|{<volname> [<brickname>] [misc-details|clients|inode|fd|mem|callpool]}] Change-Id: I8d019718465bbc3de727653a839de7238f45da5c BUG: 765495 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/2637 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kp@gluster.com>
* core: add 'fremovexattr()' fopAmar Tumballi2012-01-251-0/+94
| | | | | | | | | | | so operations can be done on fd for extended attribute removal Change-Id: Ie026f1b53793aeb4ae33e96ea5408c7a97f34bf6 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 766571 Reviewed-on: http://review.gluster.com/778 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: get xattrs also as part of readdirpAmar Tumballi2012-01-253-5/+60
| | | | | | | | | | | | | readdirp_req() call sends a dict_t * as an argument, which contains all the xattr keys for which the entries got in readdirp_rsp() are having xattr value filled dictionary. Change-Id: I8b7e1290740ea3e884e67d19156ce849227167c0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765785 Reviewed-on: http://review.gluster.com/771 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: change lk-owner as a 1k bufferAmar Tumballi2012-01-244-33/+40
| | | | | | | | | | | | | so, NLM can send the lk-owner field directly to the locks translators, while doing the same effort, also enabled sending maximum of 500 aux gid over protocol. Change-Id: I87c2514392748416f7ffe21d5154faad2e413969 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 767229 Reviewed-on: http://review.gluster.com/779 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* rpc: extend actors with flag signing if privilege is requiredCsaba Henk2012-01-212-47/+47
| | | | | | | | | | | | Currently we allow the following RPC messages for unprivileged users: GLUSTER_CLI_GETWD, GLUSTER_CLI_MOUNT, GLUSTER_CLI_UMOUNT Change-Id: I05414f3ca7cbe47de45c5e5cfba1537efc774e6c BUG: 781256 Signed-off-by: Csaba Henk <csaba@gluster.com> Reviewed-on: http://review.gluster.com/2641 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: GFID filehandle based backend and anonymous FDsAnand Avati2012-01-204-300/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* protocol/server: Do connection cleanup if reply failsPranith Kumar K2011-12-223-18/+14
| | | | | | | | | | | | | | | | | | | | We observed that after the first connection cleanup happens on DISCONNECT the lock calls in transit are granted or added in blocked locks queue. These locks were never cleaned up after that because no unlock would come up on that connection. This would leave references on that transport so it would never be destroyed. Now, the connection cleanup happens whenever the reply submission fails. Also cleaned up the old code which is not used any more. Change-Id: Ie4fe6f388ed18d9c907cf8ae06b0b7fd0601a660 BUG: 765430 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: remove 'ino' variable from 'inode_t' structureAmar Tumballi2011-11-164-208/+182
| | | | | | | | Change-Id: I0f078d1753db65d2f2e0380d1b0450c114cf40dd BUG: 3518 Reviewed-on: http://review.gluster.com/522 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* statedump: do not print the inode number in the statedumpRaghavendra Bhat2011-10-011-10/+5
| | | | | | | | | | | | | Since gfid is used to uniquely identify a inode, in the statedump printing inode number is not necessary. Its suffecient if the gfid of the inode is printed. And do not print the the inodelks, entrylks and posixlks if the lock count is 0. Change-Id: Idac115fbce3a5684a0f02f8f5f20b194df8fb27f BUG: 3476 Reviewed-on: http://review.gluster.com/530 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* protocol/server: check for the fd being NULL and unwindRaghavendra Bhat2011-09-291-0/+8
| | | | | | | | Change-Id: I400e515431cf739fe0b2f90840359496a2b529d2 BUG: 3158 Reviewed-on: http://review.gluster.com/528 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <shishirng@gluster.com>
* cli : new volume statedump commandKaushal M2011-09-271-2/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: 1. Add a new 'volume statedump' command, that performs statedumps of all the bricks in the volume and saves them in a specified location. 2. Add new server option 'server.statedump-path'. 3. Remove multiple function definitions in glusterd.h Statedump Information: The 'volume statedump' command performs statedumps on all the bricks in a given volume. The syntax of the command is, gluster volume statedump <VOLNAME> [type]...... Types include, * all * mem * iobuf * callpool * priv * fd * inode Defaults to 'all' when no type is specified. The statedump files are created by default in /tmp directory of the server on which the bricks are present. This path can be changed by setting the 'server.statedump-path' option. The statedump files will be named as, <brick-name>.<pid of brick process>.dump Change-Id: I01c0e1a8aad490da818e086d89f292bd2ed06fd4 BUG: 1964 Reviewed-on: http://review.gluster.com/321 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* glusterfs protocol: handshake to log the version of the peerAmar Tumballi2011-09-212-5/+15
| | | | | | | | | | | | | | | | | * As RPC program's name is just used for logging, we now have 'PACKAGE_VERSION' part of the string, which gets logged in client side. * From client, we send the PACKAGE_VERSION in handshake dictionary, which gets logged on serverside handshake. The change doesn't break any compatibility between client or server as it would only enhance the logging part of handshake. Change-Id: Ie7f498af2f5d3f97be37c8d982061cb6021883ce BUG: 3589 Reviewed-on: http://review.gluster.com/467 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* distribute rebalance: handle the open file migrationAmar Tumballi2011-09-121-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Complexity involved: To migrate a file with open fd, we have to notify the other client process which has the open fd, and make sure the write()s happening on that fd is properly synced to the migrated file. Once the migration is complete, the client process which has open-fd should get notified and it should start performing all the operations on the new subvolume, instead of earlier cached volume. How to solve the notification part: We can overload the 'postbuf' attribute in the _cbk() function to understand if a file is 'under-migration' or 'migration-complete' state. (This will be something similar to deciding whether a file is DHT-linkfile by its 'mode'). Overall change includes below mentioned major changes: 1. dht_linkfile is decided by only 2 factors (mode(01000), xattr(trusted.glusterfs.dht.linkto)), instead of earlier 3 factors (size==0) 2. in linkfile self-heal part (in 'dht_lookup_everywhere_cbk()'), don't delete a linkfile if there is a open-fd on it. It means, there may be a migration in progress. 3. if a file's revalidate fails with ENOENT, it may be due to file migration, and hence need a lookup_everywhere() 4. There will be 2 phases of file-migration. -> Phase 1: Migration in progress * The source data file will have SGID and STICKY bit set in its mode. * The source data file will have a 'linkto' xattr pointing the destination. * Destination file will have mode set to '01000', and 'linkto' xattr set to itself. -> Phase 2: File migration Complete * The source data file will have mode '01000', and will be 'truncated' to size 0. * The destination file will have inherited mode from the source. (without sgid and sticky bit) and its 'linkto' attribute will be removed. 4. Changes in distribute to work smoothly with a file which is in migration / got migrated. The 'fops' are divided into 3 categories, inode-read, inode-write and others. inode-read fops need to handle only 'phase 2' notification, where as, the inode-write fops need to handle both 'phase 1' and phase2. The inode-write operations will be done on source file, and if any of 'file-migration' procedures are detected in _cbk(), then the operations should be performed on the destination too. when a phase-2 is detected, then the inode-ctx itself should be changed to represent a new layout. With these changes, the open file migration will work smoothly with multiple clients. Change-Id: I512408463814e650f34c62ed009bf2101d016fd6 BUG: 3071 Reviewed-on: http://review.gluster.com/209 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Eliminate many "var set but not used" warnings with newer gcc.Jeff Darcy2011-09-071-2/+0
| | | | | | | | | | | | | | | | This fixes ~200 such warnings, but leaves three categories untouched. (1) Rpcgen code. (2) Macros which set variables in the outer (calling function) scope. (3) Variables which are set via function calls which may have side effects. Change-Id: I6554555f78ed26134251504b038da7e94adacbcd BUG: 2550 Reviewed-on: http://review.gluster.com/371 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* modify to the way we used XDR definitions files (.x files)Amar Tumballi2011-09-074-129/+58
| | | | | | | | | | | | | | | | | | | | | | | | Earlier: step 1: copy the existing <xdr>.x files to /tmp step 2: generate '.[ch]' files using 'rpcgen <xdr>.x' step 3: check diff with the to the existing files, add only your part of changes back to the original file. (ignore other changes). step 4: there is another file to write wrapper functions to convert structures to/from XDR buffers, update it with your new structure. step 5: use these wrapper functions in the newly written procedures. step 6: commit :-| Now: step 1: update (mostly adding only) the <xdr>.x file step 2: run '<path-to-src>/extras/generate-xdr-files.sh <xdr>.x' command step 3: implement rpc procedure to handle the request/response. step 4: commit :-) Change-Id: I219f9159fc980438c86e847c6b030be96e595ea2 BUG: 3488 Reviewed-on: http://review.gluster.com/341 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* glusterfs protocol: bring in variable sized iobuf supportAmar Tumballi2011-09-074-76/+131
| | | | | | | | | | | is a step towards reducing glusterfs memory footprint. should also help a bit in overall performance. Change-Id: I074d5813602b2c960d59562e792b3dc6e43d2f42 BUG: 3475 Reviewed-on: http://review.gluster.com/322 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* xlator options: revamp xlator option validation/reconfigure codeAnand Avati2011-08-182-30/+3
| | | | | | | | | | | | | | | | | - move option handling to options.c (new file) - remove duplication of option validation code - remove duplication of gf_log / sprintf - get rid of xlator_t->validate_options - get rid of option validation in rpc-transport - get rid of validate_options() in every xlator - use xlator_volume_option_get to clean up many functions - introduce primitives to init/reconfigure option types Change-Id: I51798af72c8dc0a2b9e017424036eb3667dfc7ff BUG: 3415 Reviewed-on: http://review.gluster.com/235 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* Change Copyright current yearPranith Kumar K2011-08-1010-10/+10
| | | | | | | | Change-Id: I2d10f2be44f518f496427f257988f1858e888084 BUG: 3348 Reviewed-on: http://review.gluster.com/200 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* pass xlator pointer to rpcsvc_init() so that it can init svc->mydata to xlatorv3.3.0qa1krishna2011-08-091-1/+1
| | | | | | | | Change-Id: Icfd95cc67400c16a951d6a9f922fbdc07f40c5b6 BUG: 3314 Reviewed-on: http://review.gluster.com/180 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* LICENSE: s/GNU Affero General Public/GNU General Public/Pranith Kumar K2011-08-0610-30/+30
| | | | | | | | Change-Id: I3914467611e573cccee0d22df93920cf1b2eb79f BUG: 3348 Reviewed-on: http://review.gluster.com/182 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* RPC unify code changekrishna2011-07-291-1/+26
| | | | | | | | | Change-Id: Ibe18a2a63fd023ac57652c4dfc8ac8a69d983b83 BUG: 3112 Signed-off-by: krishna <krishna@gluster.com> Reviewed-on: http://review.gluster.com/116 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* server: Reassociating 'old' conn can lead to chaos.Krishnan P2011-07-121-22/+12
| | | | | | | | | | | | | | | | | Since we moved to a single socket connection b/w client and server, inherting connection state is unnecessary and can sometimes be dangerous when clients reconnect even before the server detects a socket error on the old connection. Dirty detail: This reassociation results in 'ref count' not decreasing in tandem with the connection disconnects. This results in a resource leak. Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 3104 (Self-heal does not work in dis-rep!) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3104
* protocol/server: logging changesRaghavendra Bhat2011-06-082-11/+2
| | | | | | | | Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 2982 ([glusterfs-3.2.1qa2]: logging changes is server protocol) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2982
* prevent few excessive logsAmar Tumballi2011-04-211-7/+5
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 2346 (Log message enhancements in GlusterFS - phase 1) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
* remove excessive logs due to log enhancementAmar Tumballi2011-04-131-5/+5
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 2346 (Log message enhancements in GlusterFS - phase 1) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346