glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	gNFS: Fix multi-homed m/c issue in NFS subdir auth	Santosh Kumar Pradhan	2014-06-25	1	-27/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NFS subdir authentication doesn't correctly handle multi-homed (host with multiple NIC having multiple IP addr) OR multi-protocol (IPv4 and IPv6) network addresses. When user/admin sets HOSTNAME in gluster CLI for NFS subdir auth, mnt3_verify_auth() routine does not iterate over all the resolved n/w addrs returned by getaddrinfo() n/w API. Instead, it just tests with the one returned first. 1. Iterate over all the n/w addrs (linked list) returned by getaddrinfo(). 2. Move the n/w mask calculation part to mnt3_export_fill_hostspec() instead of doing it in mnt3_verify_auth() i.e. calculating for each mount request. It does not change for MOUNT req. 3. Integrate "subnet support code rpc-auth.addr.<volname>.allow" and "NFS subdir auth code" to remove code duplication. Change-Id: I26b0def52c22cda35ca11766afca3df5fd4360bf BUG: 1102293 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/8048 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
*	glusterd: Fail peer probe/detach commands when peer detach is ongoing	Krishnan Parthasarathi	2014-06-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ifd8099bc235eb395e8fd9ead3197bef71c78042b BUG: 1109812 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/8079 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	Get snapshot info dynamically via new rpc and infra for snapview-server to ↵	Anand Subramanian	2014-06-15	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	refresh snaplist BUG: 1105439 Change-Id: I4bb312a53d88f6f4955e69a3ef2b4955ec17f26d Signed-off-by: Anand Subramanian <anands@redhat.com> Reviewed-on: http://review.gluster.org/8001 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cleanup: Fix order of arguments passed in log message	Krutika Dhananjay	2014-06-13	1	-1/+1
\| \| \| \| \| \| \| \| \|	Change-Id: Iae85cdfc223875688ea17155fffcf2a3a435d245 BUG: 764890 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8044 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cleanup: Fix domain in log message	Krutika Dhananjay	2014-06-12	1	-1/+1
\| \| \| \| \| \| \| \| \|	Change-Id: I554b9bcacf6c8acd6dffea0a485fc50e82c3dc04 BUG: 764890 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8043 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	rpc: Reconfigure() does not work for auth-reject	Santosh Kumar Pradhan	2014-06-09	1	-6/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If volume is set for rpc-auth.addr.<volname>.reject with value as "host1", ideally the NFS mount from "host1" should FAIL. It works as expected. But when the volume is RESET, then previous value set for auth-reject should go off, and further NFS mount from "host1" should PASS. But it FAILs because of stale value in dict for key "rpc-auth.addr.<volname>.reject". It does not impact rpc-auth.addr.<volname>.allow key because, each time NFS volfile gets generated, allow key ll have "*" as default value. But reject key does not have default value. FIX: Delete the OLD value for key irrespective of anything. Add NEW value for the key, if and only if that is SET in the reconfigured new volfile. Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Change-Id: Ie80bd16cd1f9e32c51f324f2236122f6d118d860 BUG: 1103050 Reviewed-on: http://review.gluster.org/7931 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	gNFS: Make NFS DRC off by default	Niels de Vos	2014-06-09	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DRC in NFS causes memory bloat and there are known memory corruptions. It would be good to disable drc by default till the feature is stable. Change-Id: I93db6ef5298672c56fb117370bb582a5e5550b17 BUG: 1105524 Original-patch-by: Santosh Kumar Pradhan <spradhan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8004 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: Handle rpc_connect failure in the event handler	Vijaikumar M	2014-06-05	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently rpc_connect calls the notification function on failure in the same thread, glusterd notification holds the big_lock and hence big_lock is released before rpc_connect In snapshot creation, releasing the big-lock before completeing operation can cause problem like deadlock or memory corruption. Bricks are started as part of snapshot created operation. brick_start releases the big_lock when doing brick_connect and this might cause glusterd crash. There is a similar issue in bug# 1088355. Solution is let the event handler handle the failure than doing it in the rpc_connect. Change-Id: I088d44092ce845a07516c1d67abd02b220e08b38 BUG: 1101507 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7843 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
*	glusterd: Disable ping-timer between glusterd and brick process	Vijaikumar M	2014-05-19	3	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there are too many IO happening, brick process epoll thread will be busy and fails to respond to the glusterd pick packet within 30sec. Also epoll thread can be blocked by a big-lock. Solution is to disable ping-timer by default and only enable where ever required Later when the epoll thread model changed and made lighter, we need to revert back this change. http://review.gluster.com/3842 is one such approach. Change-Id: I7f80ad3eb00f7d9c4d4527305932f7cf4920e73f BUG: 1097224 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7753 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpcsvc: Validate RPC procedure number before fetch	Santosh Kumar Pradhan	2014-05-17	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While accessing the procedures of given RPC program in, rpcsvc_get_program_vector_sizer(), It was not checking boundary conditions which would cause buffer overflow and subsequently SEGV. Make sure rpcsvc_actor_t arrays have numactors number of actors. FIX: Validate the RPC procedure number before fetching the actor. Special Thanks to: Murray Ketchion, Grant Byers Change-Id: I8b5abd406d47fab8fca65b3beb73cdfe8cd85b72 BUG: 1096020 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/7726 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	libgfapi: Added support to fetch volume info from glusterd and store in ↵	Soumya Koduri	2014-05-11	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glfs object. Defined new APIs in the libgfapi module, given a glfs object, * to send handshake RPC call to glusterd process to fetch UUID of the volume * store it in the glusterfs_context linked to the glfs object. * to parse UUID from its cannonical string format into 16-byte array before sending it to the libgfapi users. Defined a RPC call in glusterd which can be used to query volume related info by other processes using 'clnt_handshake_procs'. Note - Currently this RPC call to glusterd process is used only to fetch UUID. But it can be extended to get other volume related structures as well. In addition to the above, defined a new variable to keep track of such handshake RPCs still in progress to make sure all the corresponding RPC callbacks have been processed before libgfapi returns the glfs object initialized. Also bumping up the GFAPI current version number since there is a new API "glfs_get_volume_id" defined and exposed by libgfapi as part of these changes. Change-Id: I303f76d7177d32d25bdb301b1dbcf5cd73f42807 BUG: 1090363 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/7218 Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpc: don't stop sending ping packets to an active server.	Krishnan Parthasarathi	2014-05-06	2	-40/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Removed an unnecessary ref on rpc_clnt object. - Removed saved_frames_delete function, which was unused. Change-Id: Ie8a9c4bb20c1fd59744b64b56eb043eca095e5e3 BUG: 1094655 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/7678 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
*	glusterd : Volname, brickpath & volfpath length validation	Atin Mukherjee	2014-05-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While creating a volume and adding a brick validation for _POSIX_PATH_MAX is done on absolute pathname instead of relative pathname due to which a brickpath having less than _POSIX_PATH_MAX may also fail the validation if the directory length is greater than (_POSIX_PATH_MAX -strlen(brickpath/volume name). Also this fix addresses one cli response message correction which says the volume file is too long instead of brick path is too long (when brickpath length validation doesn't fail and vol file length validation fails.) It is also important to note that with the current design of volfile naming, it can not be guranteed that volname and brickpath can have max of _POSIX_PATH_MAX characters. Change-Id: I1283d1f9dea96ae797620002c8723719f26a866d BUG: 1085330 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/7420 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: Ping timer implmentation	Krishnan Parthasarathi	2014-04-29	7	-4/+331
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch refactors the existing client ping timer implementation, and makes use of the common code for implementing both client ping timer and the glusterd ping timer. A new gluster rpc program for ping is introduced. The ping timer is only started for peers that have this new program. The deafult glusterd ping timeout is 30 seconds. It is configurable by setting the option 'ping-timeout' in glusterd.vol . Also, this patch introduces changes in the glusterd-handshake path. The client programs for a peer are now set in the callback of dump_versions, for both the older handshake and the newer op-version handshake. This is the only place in the handshake process where we know what programs a peer supports. Change-Id: I035815ac13449ca47080ecc3253c0a9afbe9016a BUG: 1038261 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5202 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	cli: Add a cli command to enable/disable barrier	Kaushal M	2014-04-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new 'gluster volume barrier <VOLNAME> {enable\|disable}' cli command. This helps in testing the brick op code path when testing the barrier xlator. This patch can be reverted later if not required for end users. Change-Id: Icd86a2d13e7f276dda1ecbb2593d60638ece7dcd BUG: 1060002 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6958 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: Add a barrier brick-op	Kaushal M	2014-04-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a new 'barrier' brick-op which will be used to activate/deactivate the barriering on the bricks. This includes barriering in the barrier xlator and in the changelog xlator. All the required code has been including a bricks select function, a payload builder and a brick-op handler. Change-Id: I91d9d77f691c2e89823f7dc4e84900ec40dc4dd2 BUG: 1060002 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6943 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd/snapshot: Compare and update snapshots during peer handshake	Avra Sengupta	2014-04-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During a peer-handshake, after the volumes have synced, and the list of missed snapshots have synced, the node will perform the pending deletes and restores on this list. At this point, the current snapshot list in the node will be updated, and hence in case of conflicts arising during snapshot handshake, the peer hosting the bricks will be given precedence Likewise, if there will be a conflict, and both peers will be in the same state, i.e either both would be hosting bricks or both would not be hosting bricks, then a decision can't be taken and a peer-reject will happen. glusterd_compare_and_update_snap() implements the following algorithm to perform the above task: Step 1: Start. Step 2: Check if the peer is missing a delete on the said snap. If yes, goto step 6. Step 3: Check if there is a conflict between the peer's data and the local snap. If no, goto step 5. Step 4: As there is a conflict, check if both the peer and the local nodes are hosting bricks. Based on the results perform the following: Peer Hosts Bricks Local Node Hosts Bricks Action Yes Yes Goto Step 7 No No Goto Step 7 Yes No Goto Step 8 No Yes Goto Step 6 Step 5: Check if the local node is missing the peer's data. If yes, goto step 9. Step 6: It's a no-op. Goto step 10 Step 7: Peer Reject. Goto step 10 Step 8: Delete local node's data. Step 9: Accept Peer Data. Step 10: Stop Change-Id: I79be0f0f5f2a4f5c72277a4e77c2be732af432e1 BUG: 1061685 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7525 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	gNFS: gNFS drc cache failed to detecte duplicates.	Yuan Ding	2014-04-28	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the drc cache get full, message "DRC failed to detect duplicates" keep printed in the log. The root cause is drc_compare_reqs use the wrong compare type. This function should use type drc_cache_op_t as its input type. Since all rbtree related code (except function rpcsvc_drc_lookup) in drc cache pass drc_cache_op_t as compare type. Only rpcsvc_drc_lookup use type rpcsvc_request_t. It has been modified too. Change-Id: I925c097debe6b82f267986961fd4e7755f3de9af BUG: 1089676 Signed-off-by: Yuan Ding <beback198611@gmail.com> Reviewed-on: http://review.gluster.org/7519 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterd/snapshot-handshake: Perform handshake of missed_snaps_list.	Avra Sengupta	2014-04-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a handshake, create a union of the missed_snap_lists of the two peers. If an entry is present, its no op. If an entry is pendng, and the peer entry is done, mark own entry as done. If an entry is done, and the peer ertry is pending, its a no-op. If its a new entry, add it. Change-Id: Idbfa49cc34871631ba8c7c56d915666311024887 BUG: 1061685 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7453 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpcgen: Remove autogenerated files instead build on demand	Harshavardhana	2014-04-25	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid modifying autogenerated files and keeping them in repository - autogenerate them on demand from ".x" files Change-Id: I2cdb1fe9b99768ceb80a8cb100fa00bd1d8fe2c6 BUG: 1090807 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/7526 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	rpcsvc: Ignore INODELK/ENTRYLK/LK for throttling	Pranith Kumar K	2014-04-24	1	-16/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When iozone is in progress, number of blocking inodelks sometimes becomes greater than the threshold number of rpc requests allowed for that client (RPCSVC_DEFAULT_OUTSTANDING_RPC_LIMIT). Subsequent requests from that client will not be read until all the outstanding requests are processed and replied to. But because no more requests are read from that client, unlocks on the already granted locks will never come thus the number of outstanding requests would never come down. This leads to a ping-timeout on the client. Fix: Do not account INODELK/ENTRYLK/LK for throttling Change-Id: I59c6b54e7ec24ed7375ff977e817a9cb73469806 BUG: 1089470 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7531 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	build: MacOSX Porting fixes	Harshavardhana	2014-04-24	6	-30/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs Working functionality on MacOSX - GlusterD (management daemon) - GlusterCLI (management cli) - GlusterFS FUSE (using OSXFUSE) - GlusterNFS (without NLM - issues with rpc.statd) Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac BUG: 1089172 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Signed-off-by: Dennis Schafroth <dennis@schafroth.com> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Dennis Schafroth <dennis@schafroth.com> Reviewed-on: http://review.gluster.org/7503 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	gNFS: Support wildcard in RPC auth allow/reject	Santosh Kumar Pradhan	2014-04-22	1	-2/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFE: Support wildcard in "nfs.rpc-auth-allow" and "nfs.rpc-auth-reject". e.g. .redhat.com 192.168.1[1-5]. 192.168.1[1-5]., .redhat.com, 192.168.21.9 Along with wildcard, support for subnetwork or IP range e.g. 192.168.10.23/24 The option will be validated for following categories: 1) Anonymous i.e. "" 2) Wildcard pattern i.e. string containing any ('', '?', '[') 3) IPv4 address 4) IPv6 address 5) FQDN 6) subnetwork or IPv4 range Currently this does not support IPv6 subnetwork. Change-Id: Iac8caf5e490c8174d61111dad47fd547d4f67bf4 BUG: 1086097 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/7485 Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	gluster: GlusterFS Volume Snapshot Feature	Avra Sengupta	2014-04-11	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the initial patch for the Snapshot feature. Current patch includes following features: * Snapshot create * Snapshot delete * Snapshot restore * Snapshot list * Snapshot info * Snapshot status * Snapshot config Change-Id: I2f46920c0d61c515f6a60e0f8b46fff886d9f6a9 BUG: 1061685 Signed-off-by: shishir gowda <sgowda@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Signed-off-by: Vijaikumar M <vmallika@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7128 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpc: warn and truncate grouplist if RPC/AUTH can not hold everything	Niels de Vos	2014-04-08	3	-8/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GlusterFS protocol currently uses AUTH_GLUSTERFS_V2 in the RPC/AUTH header. This header contains the uid, gid and auxiliary groups of the user/process that accesses the Gluster Volume. The AUTH_GLUSTERFS_V2 structure allows up to 65535 auxiliary groups to be passed on. Unfortunately, the RPC/AUTH header is limited to 400 bytes by the RPC specification: http://tools.ietf.org/html/rfc5531#section-8.2 In order to not cause complete failures on the client-side when trying to encode a AUTH_GLUSTERFS_V2 that would result in more than 400 bytes, we can calculate the expected size of the other elements: 1 \| pid 1 \| uid 1 \| gid 1 \| groups_len XX \| groups_val (GF_MAX_AUX_GROUPS=65535) 1 \| lk_owner_len YY \| lk_owner_val (GF_MAX_LOCK_OWNER_LEN=1024) ----+------------------------------------------- 5 \| total xdr-units one XDR-unit is defined as BYTES_PER_XDR_UNIT = 4 bytes MAX_AUTH_BYTES = 400 is the maximum, this is 100 xdr-units. XX + YY can be 95 to fill the 100 xdr-units. Note that the on-wire protocol has tighter requirements than the internal structures. It is possible for xlators to use more groups and a bigger lk_owner than that can be sent by a GlusterFS-client. This change prevents overflows when allocating the RPC/AUTH header. Two new macros are introduced to calculate the number of groups that fit in the RPC/AUTH header, when taking the size of the lk_owner in account. In case the list of groups exceeds the maximum possible, only the first groups are passed over the RPC/GlusterFS protocol to the bricks. A warning is added to the logs, so that most system administrators will get informed. The reducing of the number of groups is not a new inventions. The RPC/AUTH header (AUTH_SYS or AUTH_UNIX) that NFS uses has a limit of 16 groups. Most, if not all, NFS-clients will reduce any bigger number of groups to 16. (nfs.server-aux-gids can be used to workaround the limit of 16 groups, but the Gluster NFS-server will be limited to a maximum of 93 groups, or fewer in case the lk_owner structure contains more items.) Change-Id: I8410e59d0fd246d601b54b961d3ae9cb5a858c10 BUG: 1053579 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/7202 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpc: transport may be destroyed while rpc isn't	Krishnan Parthasarathi	2014-03-05	2	-60/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rpc_clnt object is destroyed after the corresponding transport object is destroyed. But rpc_clnt_reconnect, a timer driven function, refers to the transport object beyond its 'life'. Instead, using the embedded connection object prevents use after free problem wrt transport object. Also, access transport object under conn->lock. Change-Id: Iae28e8a657d02689963c510114ad7cb7e6764e62 BUG: 962619 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6751 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	rpc: Fix a crash due to NULL dereference	Vijay Bellur	2014-02-16	1	-2/+2
\| \| \| \| \| \| \| \| \|	Change-Id: Ib2bf6dd564fb7e754d5441c96715b65ad2e21441 BUG: 1065611 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/7007 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	protocol/server: do not do root-squashing for trusted clients	Raghavendra Bhat	2014-02-10	2	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* As of now clients mounting within the storage pool using that machine's ip/hostname are trusted clients (i.e clients local to the glusterd). * Be careful when the request itself comes in as nfsnobody (ex: posix tests). So move the squashing part to protocol/server when it creates a new frame for the request, instead of auth part of rpc layer. * For nfs servers do root-squashing without checking if it is trusted client, as all the nfs servers would be running within the storage pool, hence will be trusted clients for the bricks. * Provide one more option for mounting which actually says root-squash should/should not happen. This value is given priority only for the trusted clients. For non trusted clients, the volume option takes the priority. But for trusted clients if root-squash should not happen, then they have to be mounted with root-squash=no option. (This is done because by default blocking root-squashing for the trusted clients will cause problems for smb and UFO clients for which the requests have to be squashed if the option is enabled). * For geo-replication and defrag clients do not do root-squashing. * Introduce a new option in open-behind for doing read after successful open. Change-Id: I8a8359840313dffc34824f3ea80a9c48375067f0 BUG: 954057 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4863 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: Volume locks and transaction specific opinfos	Avra Sengupta	2014-02-10	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this patch we are replacing the existing cluster-wide lock taken on glusterds across the cluster, with volume locks which are also taken on glusterds across the cluster, but are volume specific. So with the volume locks we are able to perform more than one gluster operation at the same time, as long as the operations are being performed on different volumes. We maintain a global list of volume-locks (using a dict for a list) where the key is the volume name, and which saves the uuid of the originator glusterd. These locks are held and released per volume transaction. In order to acheive multiple gluster operations occuring at the same time, we also separate opinfos in the op-state-machine, as a part of this patch. To do so, we generate a unique transaction-id (uuid) per gluster transaction. An opinfo is then associated with this transaction id, which is used throughout the transaction. We maintain a run-time global list(using a dict) of transaction-ids, and their respective opinfos to achieve this. Upstream Feature Page: http://www.gluster.org/community/documentation/index.php/Features/glusterd-volume-locks Change-Id: Iaad505a854bac8de8f83beec0357eb6cde3f7ea8 BUG: 1011470 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5994 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpc: use GF_FREE when a string is gf_strdup'd.	Krishnan Parthasarathi	2014-01-22	1	-1/+1
\| \| \| \| \| \| \| \|	Change-Id: I522c30a600e712be9cc09393104e228e4d8e13f5 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6752 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	build: Start using library versioning for various libraries	Harshavardhana	2014-01-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to libtool three individual numbers stand for CURRENT:REVISION:AGE, or C:R:A for short. The libtool script typically tacks these three numbers onto the end of the name of the .so file it creates. The formula for calculating the file numbers on Linux and Solaris is /path/to/library/<library_name>.(C - A).(A).(R) As you release new versions of your library, you will update the library's C:R:A. Although the rules for changing these version numbers can quickly become confusing, a few simple tips should help keep you on track. The libtool documentation goes into greater depth. In essence, every time you make a change to the library and release it, the C:R:A should change. A new library should start with 0:0:0. Each time you change the public interface (i.e., your installed header files), you should increment the CURRENT number. This is called your interface number. The main use of this interface number is to tag successive revisions of your API. The AGE number is how many consecutive versions of the API the current implementation supports. Thus if the CURRENT library API is the sixth published version of the interface and it is also binary compatible with the fourth and fifth versions (i.e., the last two), the C:R:A might be 6:0:2. When you break binary compatibility, you need to set AGE to 0 and of course increment CURRENT. The REVISION marks a change in the source code of the library that doesn't affect the interface-for example, a minor bug fix. Anytime you increment CURRENT, you should set REVISION back to 0. Change-Id: Id72e74c1642c804fea6f93ec109135c7c16f1810 BUG: 862082 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/5645 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	gNFS: Set default outstanding RPC limit to 16	Santosh Kumar Pradhan	2014-01-15	2	-23/+28
\| \| \| \| \| \| \| \| \| \| \| \| \|	With 64, NFS server hangs with large I/O load (~ 64 threads writing to NFS server). The test results from Ben England (Performance expert) suggest to set it as 16 instead of 64. Change-Id: I418ff5ba0a3e9fdb14f395b8736438ee1bbd95f4 BUG: 1008301 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6696 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: ben england <bengland@redhat.com>
*	rpc/auth: Avoid NULL dereference in rpcsvc_auth_request_init()	Harshavardhana	2014-01-08	6	-46/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code section is bogus! ------------------------------------------ 370: if (!auth->authops->request_init) 371: ret = auth->authops->request_init (req, auth->authprivate); ------------------------------------------ Seems to have been never been used historically since logically above code has never been true to actually execute "authops->request_init() --> auth_glusterfs_{v2,}_request_init()" On top of that under "rpcsvc_request_init()" verf.flavour and verf.datalen are initialized from what is provided through 'callmsg'. ------------------------------------------ req->verf.flavour = rpc_call_verf_flavour (callmsg); req->verf.datalen = rpc_call_verf_len (callmsg); /* AUTH */ rpcsvc_auth_request_init (req); return req; ------------------------------------------ So the code in 'auth_glusterfs_{v2,}_request_init()' performing this operation will over-write the original flavour and datalen. ------------------------------------------ if (!req) return -1; memset (req->verf.authdata, 0, GF_MAX_AUTH_BYTES); req->verf.datalen = 0; req->verf.flavour = AUTH_NULL; ------------------------------------------ Refactoring the whole code into a more understandable version and also avoiding a potential NULL dereference Change-Id: I1a430fcb4d26de8de219bd0cb3c46c141649d47d BUG: 1049735 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6591 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	gNFS: Small memory leak in rpcsvc_drc_init()	Santosh Kumar Pradhan	2014-01-03	1	-25/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. The routine rpcsvc_drc_init() is only used while initialization of NFS xlator. It should just check for nfs.drc option and init DRC feature accordingly. If it's set to OFF, then rpcsvc_drc_init() allocates memory for svc.drc and set ret value to 0 and goes to out: block where drc is leaked. 2. rpcsvc_drc_init() should just allocate svc.drc and init it. Here svc.drc can never be valid. 3. If svc.drc gets init'd here, no point of checking for drc type here. Change-Id: I4085771cdb8c9c15d1b9c548b77929a37f27c124 BUG: 1047902 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6628 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	gNFS: Possible SEGV crash in NFS while DRC is OFF	Santosh Kumar Pradhan	2014-01-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In rpcsvc_submit_generic(), FILE: rpc/rpc-lib/src/rpcsvc.c, while caching the reply (DRC), the code does not check if DRC is ON and goes ahead assuming DRC is on and try to take a LOCK on drc. FIX: Put a check on svc->drc by rpcsvc_need_drc(). Change-Id: I52c57280487e6061c68fd0b784e1cafceb2f3690 BUG: 1048072 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6632 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpc/server: add anonuid and anongid options for root-squash	Niels de Vos	2013-12-30	3	-4/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce new options to modify the behaviour of server.root-squash. With server.anonuid and server.anongid the uid/gid can be specified and the root user (uid=0 and gid=0) will be mapped to the given uid/gid instead of nfsnobody (uid=65534 and gid=65534). Many thanks to Vikhyat Umrao for writing the majority of the test-case! Change-Id: I6379a3d2ef52b9b9707f2f6f0529657580c8d779 BUG: 1043886 CC: Vikhyat Umrao <vumrao@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/6546 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Vikhyat Umrao <vumrao@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpc,glusterd: Use rpc_clnt notifyfn to cleanup mydata	Kaushal M	2013-12-16	2	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rpc: - On a RPC_TRANSPORT_CLEANUP event, rpc_clnt_notify calls the registered notifyfn with a RPC_CLNT_DESTROY event. The notifyfn should properly cleanup the saved mydata on this event. - Break the reconnect chain when an rpc client is disabled. This will prevent new disconnect events which can lead to crashes. glusterd: - Added support for RPC_CLNT_DESTROY in glusterd_brick_rpc_notify - Use a common glusterd_rpc_clnt_unref() function throught glusterd in place of rpc_clnt_unref(). This function correctly gives up the big-lock before performing the unref. Change-Id: I93230441c5089039643fc9f5632477ef1b695348 BUG: 962619 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5512 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd/geo-rep: more glusterd and cli fixes for geo-rep.	Ajeet Jha	2013-12-12	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-> handle option validation cases in reset case. -> Creating valid conf path when glusterd restarts. -> Reading the gsyncd worker thread status and displaying it. -> Displaying status-detail per worker. -> Fetch checkpoint info in geo-rep status. -> use-tarssh value validation added. misc: misc geo-rep fixes based on cluster, logrotate etc.. -> cluster/dht: fix 'stime' getxattr getting overwritten. -> cluster/afr: return max of 'stime' values in subvol. -> geo-rep-logrotate: Sending SIGHUP to geo-rep auxiliary. -> cluster/dht: fix convoluted logic while aggregating. -> cluster/*: fix 'stime' min/max fetch logic. Change-Id: I811acea0bbd6194797a3e55d89295d1ea021ac85 BUG: 1036552 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6405 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	features/quota: Improvements to quota	Raghavendra G	2013-11-26	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Two stages of quota enforcement is done: Soft and hard quota Upon reaching soft quota limit on the directory it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log) and no more writes allowed after hard quota limit. After reaching the soft-limit the daemon alerts the user/admin repeatively for every 'alert-time', which is configurable. * Quota enforcer is moved to server-side. It takes care of enforcing quota. Since enforcer doesn't have the cluster view, it relies on another service called quota-aggregator. Aggregator, on query can return the size of a directory based on the cluster view. Enforcer is always loaded in the server graph and is by passed if the feature is not enabled. Options specific to enforcer: server-quota - Specifies whether the feature is on/off. It is used to by pass the quota if turned off. deem-statfs - If set to on, it takes quota limits into consideration while estimating fs size. (df command). The algorithm followed is, i. Adjust statvfs based on limit configured on root. ii. If limit is set on the inode passed, use size/limits on that inode to populate statvfs. Otherwise, use size/limits configured on root. iii. Upon statvfs, update the ctx->size on the inode. iv. Don't let DHT aggregate, instead take the maximum of the usages from the subvols of the DHT, since each of it contains the complete information. Enforcer also makes use of gfid-to-path conversion functionality to work correctly when a client like nfs predominently relies on nameless lookups. * Quota Aggregator acts as a thin client to provide cluster view Its a lightweight gluster client process with no mount point, started upon enabling quota or restarting the volume. This is a single process run on each brick, which can answer queries on all volumes in the cluster. Its volfile stored in GLUSTERD_DEFAULT_WORKING_DIR/quotad/quotad.vol. Credits: Raghavendra Bhat <rabhat@redhat.com> Varun Shastry <vshastry@redhat.com> Shishir Gowda <sgowda@redhat.com> Kruthika Dhananjay <kdhananj@redhat.com> Brian Foster <bfoster@redhat.com> Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: Id1cb25b414951da34c665a55f77385d482e0f9de BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5952 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	gNFS: More clean up for Gluster NFS	Santosh Kumar Pradhan	2013-11-25	2	-39/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Fix the typo in NFS default ACL The typo was introduced as part of the Fix to BZ 1009210 i.e. http://review.gluster.org/5980. The user ACL xattr structure was passed to default ACL xattr. 2) Clean up NFS code to avoid unnecessary SEGV in rpcsvc_drc_reconfigure() which was not validating the svc->drc. Add a routine rpcsvc_drc_deinit() to handle the clean up of DRC specific data structures. For init(), use rpcsvc_drc_init(). 3) nfs_init_state() was returning wrong value even if the registration with portmapper failed, causing the NFS server process to hang around. As a result it used to get SEGV during rpcsvc_drc_reconfigure(). 4) Clean up memfactor usage across nfs.c nfs3.c. Change-Id: I5cea26cb68dd8a822ec0ae104952f67fe63fa703 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6329 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	gNFS: RFE for NFS connection behavior	Santosh Kumar Pradhan	2013-11-14	7	-39/+269
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	bd_map: Remove bd_map xlator	M. Mohan Kumar	2013-11-13	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \|	Remove bd_map xlator and CLI related changes. Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterfs: zerofill support	M. Mohan Kumar	2013-11-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpcsvc: implement per-client RPC throttling	Anand Avati	2013-10-28	5	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a limit on the total number of outstanding RPC requests from a given cient. Once the limit is reached the client socket is removed from POLL-IN event polling. Change-Id: I8071b8c89b78d02e830e6af5a540308199d6bdcd BUG: 1008301 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6114 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	rpc: add remote peer's hostname to call_bail log msgs	Krishnan Parthasarathi	2013-10-17	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	Change-Id: I982cf7619463983c04b401d70a76635991d072d2 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6091 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
*	libglusterfs: Add monotonic clocking counter for timer thread	Harshavardhana	2013-10-15	1	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gettimeofday() returns the current wall clock time and timezone. Using these functions in order to measure the passage of time (how long an operation took) therefore seems like a no-brainer. This time suffer's from some limitations: a. They have a low resolution: “High-performance” timing by definition, requires clock resolutions into the microseconds or better. b. They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive). In such cases timer thread could go into an infinite loop. From 'man gettimeofday': ---------- .. .. The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2). .. .. ---------- Rationale: For calculating interval timing for Timer thread, all that’s needed should be clock as a simple counter that increments at a stable rate. This is necessary to avoid the jumps which are caused by using "wall time", this counter must be monotonic that can never “tick” backwards, ever. Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb BUG: 1017993 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	cluster/afr: [Feature] Command implementation to get heal-count	Venkatesh Somyajulu	2013-10-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently to know the number of files to be healed, either user has to go to backend and check the number of entries present in indices/xattrop directory. But if a volume consists of large number of bricks, going to each backend and counting the number of entries is a time-taking task. Otherwise user can give gluster volume heal vol-name info command but with this approach if no. of entries are very hugh in the indices/ xattrop directory, it will comsume time. So as a feature, new command is implemented. Command 1: gluster volume heal vn statistics heal-count This command will get the number of entries present in every brick of a volume. The output displays only entries count. Command 2: gluster volume heal vn statistics heal-count replica 192.168.122.1:/home/user/brickname Here if we are concerned with just one replica. So providing any one of the brick of a replica will get the number of entries to be healed for that replica only. Example: Replicate volume with replica count 2. Backend status: -------------- [root@dhcp-0-17 xattrop]# ls -lia \| wc -l 1918 NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy entries so actual no. of entries to be healed are 1916. [root@dhcp-0-17 xattrop]# pwd /home/user/2ty/.glusterfs/indices/xattrop Command output: -------------- Gathering count of entries to be healed on volume volume3 has been successful Brick 192.168.122.1:/home/user/22iu Status: Brick is Not connected Entries count is not available Brick 192.168.122.1:/home/user/2ty Number of entries: 1916 Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290 BUG: 1015990 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6044 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	cluster/afr : Implementation of command "gluster volume heal vn statistics"	Venkatesh Somyajulu	2013-10-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"gluster volume heal volumename statistics" command gives the summary of the afr crawl done based on the entries present in the xattrop directory. Whenever afr crawls are attempted, the beginning time of crawl, end time of crawl, no of files healed, heal-failed count and number of files in split brain are shown along with the type of the crawl. If crawl is already in progress then it will give the number of files healed, heal failed count and number of files in split-brain from the beginning of the crawl and instead of telling the end time of the crawl, "CRAWL IN PROGRESS" message will be shown. Output format: command: "gluster volume heal volume-name statistics" Output: Gathering afr crawl statistics crawl statistics on volume volume-name has been successful ------------------------------------------------ Crawl statistics for brick no 0 Hostname of brick 192.168.122.248 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 ------------------------------------------------ Crawl statistics for brick no 1 Hostname of brick 192.168.122.1 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 -------------------------------------------------- Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9 BUG: 949400 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4790 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	gNFS: Incorrect NFS ACL encoding for XFS	Santosh Kumar Pradhan	2013-09-29	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	rpcsvc: allocate large auxgid list on demand	Anand Avati	2013-09-17	5	-1/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For rpc requests having large aux group list, allocate large list on demand. Else use small static array by default. Without this patch, glusterfsd allocates 140+MB of resident memory just to get started and initialized. Change-Id: I3a07212b0076079cff67cdde18926e8f3b196258 Signed-off-by: Anand Avati <avati@redhat.com> BUG: 953694 Reviewed-on: http://review.gluster.org/5927 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>