summaryrefslogtreecommitdiffstats
path: root/rpc/rpc-lib/src
Commit message (Collapse)AuthorAgeFilesLines
* rpc: Don't reset auth_value in disconnectKotresh HR2018-05-241-12/+48
| | | | | | | | | | | | | The auth_value was being reset to AUTH_GLUSTERFS_v2 during rpc disconnect. It shoud not be reset. The disconnect during portmap request can race with handshake. If handshake happens first and disconnect later, auth_value would set to default value and it never sets back to actual auth_value fixes: bz#1579276 Change-Id: Ib46c9e01a97f6defb3fd1e0423fdb4b899b4a361 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* rpc: update tirpc registration to "force" unregister old mapping before ↵Shreyas Siravara2018-03-282-0/+8
| | | | | | | | | | | | re-registering > Reviewed-on: https://review.gluster.org/16849 > Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I05ed6b7c715a71e5819fbe8116e7c3146010f836 BUG: 1521030 Signed-off-by: Kevin Vigor <kvigor@fb.com> Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rpc: simplify parameters when a saved frame is forced to unwindZhang Huan2018-03-281-4/+2
| | | | | | | | | When a saved frame is to be forced unwind, there is no need to pass an empty iovector without any data pointed to. Change-Id: I6e858fb38644326e22239b83272b15db656035e5 BUG: 1523122 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* rpc: fix incorrect return value when xdr decode failsZhang Huan2018-03-281-1/+0
| | | | | | | | | | | | | xdr_replymsg is called to decode reply message, and it returns failure if the message is corrupted. However, retrieving return value from the global errno is 0 even xdr_replymsg fails. Fix this issue by simply returning a negative value if call to xdr_replymsg fails. Change-Id: I2b9a1dc97652fbb6cf6568ea617f120713784a55 BUG: 1523122 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* cleanup: xlator_t structure's 'client_latency' variable is not usedSven Fischer2018-03-191-2/+0
| | | | | | | | | | | | | | | | | - Removed unused struct member and its one time usage. - cleaned up wrong white space member 'client_latency' was not used otherwise since it was added by commit 07cc8679cdf3b29680f4f105d0222da168d8bfc1 Author: Kevin Vigor <kvigor@fb.com> Date: Tue Mar 21 08:23:25 2017 -0700 Halo Replication feature for AFR translator Change-Id: Ibb0ea828d4090bbe8897f6af326b317884162a00 BUG: 1495153 Signed-off-by: Sven Fischer <sven@fischer-abc.de>
* build: address linkage issuesKaleb S. KEITHLEY2018-03-056-1/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | We have the following undefined symbol error from protocol/server.so: glusterfs_mgmt_pmap_signout glusterfs_autoscale_threads See https://review.gluster.org/19225 (bz#1532238) and https://review.gluster.org/19657 (bz#1550895) (why are there two different bzs for the same bug?) IMO this is a cleaner solution. I.e. moving the above two functions to libgfrpc (.../rpc/rpc-lib/...) I would also, for (foolish) consistency sake, like to see glusterfs_mgmt_pmap_signin() moved from glusterfsd to libgfrpc as well. This works on f28/rawhide, with its new, more restrictive run-time link semantics. The smoke and regression tests on earlier fedora and centos will confirm that it works on those platforms too. Change-Id: I9cfbd1cc15e7ebd9fc31b56ac791287fa2c584de BUG: 1550895 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* rpcsvc: scale rpcsvc_request_handler threadsMilind Changire2018-02-263-17/+123
| | | | | | | | | | | | Scale rpcsvc_request_handler threads to match the scaling of event handler threads. Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c51 for a discussion about why we need multi-threaded rpcsvc request handlers. Change-Id: Ib6838fb8b928e15602a3d36fd66b7ba08999430b Signed-off-by: Milind Changire <mchangir@redhat.com>
* Fetch backup volfile servers from glusterd2Prashanth Pai2018-02-161-0/+5
| | | | | | | | | | | | | | | | Clients will request for a list of volfile servers from glusterd2 by setting a (optional) flag in GETSPEC RPC call. glusterd2 will check for the presence of this flag and accordingly return a list of glusterd2 servers in GETSPEC RPC reply. Currently, this list of servers returned only contains servers which have bricks belonging to the volume. See: https://github.com/gluster/glusterd2/issues/382 https://github.com/gluster/glusterfs/issues/351 Updates #351 Change-Id: I0eee3d0bf25a87627e562380ef73063926a16b81 Signed-off-by: Prashanth Pai <ppai@redhat.com>
* rpc: Adds rpcbind6 programs to libgfrpc symbolsSheena Artrip2018-02-131-0/+2
| | | | | | | | | | | | | Building with --with-default-ipv6 causes shared components of gluster calling the rpcbind6 functions to fail. Adding the symbols in the list is all that is necessary. Building without ipv6 keeps the same behavior. No test cases as this is a build-specific fix. Change-Id: I248d3291bf17326b07d152d9b79cdcfaf9068f0d BUG: 1544961 Signed-off-by: Sheena Artrip <sheenobu@fb.com>
* protocol: utilize the version 4 xdrAmar Tumballi2018-02-011-1/+2
| | | | | | | updates #384 Change-Id: Id80bf470988dbecc69779de9eb64088559cb1f6a Signed-off-by: Amar Tumballi <amarts@redhat.com>
* protocol: Implement put fopPoornima G2018-01-311-0/+1
| | | | | | Updates #353 Change-Id: I755b9208690be76935d763688fa414521eba3a40 Signed-off-by: Poornima G <pgurusid@redhat.com>
* rpc/*: auth-header changesAmar Tumballi2018-01-177-119/+362
| | | | | | | | | | | | | | | | | Introduce another authentication header which can now send more data. This is useful because this data can be common for all the fops, and we don't need to change all the signatures. As part of this, made rpc-clnt.c little more modular to support multiple authentication structures. stack.h changes are placeholder for the ctime etc, can be moved later based on need. updates #384 Change-Id: I6111c13cfd2ec92e2b4e9295896bf62a8a33b2c7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* locks: added inodelk/entrylk contention upcall notificationsXavier Hernandez2018-01-161-0/+2
| | | | | | | | | | | | | | The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* rpc: use export map to minimize exported symbols in libgf{rpc,xdr}.soKaleb S. KEITHLEY2018-01-122-1/+69
| | | | | | | | | | | | | | | | | | | | | | | | Without an export map (at link time) libgrpc and libgfxdr export over 150 and 450 symbols each, respectively. Many are not used by anything else. (Unclear what the unused symbols are, some may be simple sloppiness, e.g. not declaring functions static that should be. Others may be intra-library calls that can't be static but aren't part of the API, per se.) By linking with an export map the number of exported symbols is reduced to ~60 and ~250 respectively. This parallels the similar change made to libglusterfs recently and the older changes to the xlators to minimize the symbols that are visible (exported) from the .so. And I don't know, do we want to go all the way to symbol versions? For these libs? And for libglusterfs? fixes gluster/glusterfs#392 Change-Id: I9cdc3eee10e5f1408d7e7f2f29fad597c97e4003 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* rpc: fix use after freed of clnt after rpc transport clenupKinglong Mee2017-12-271-1/+4
| | | | | | | | | | | If the transport object is freed in rpc_transport_unref, a notify of RPC_TRANSPORT_CLEANUP is push to rpc_clnt_notify, where the rpc_clnt(contains conn) is freed. After that, using of conn after rpc_transport_unref is use after freed. Change-Id: I5cac8a8e7ced7c1079930080a12abf02d46667d5 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* Use RTLD_LOCAL for symbol resolutionPrashanth Pai2017-12-271-1/+1
| | | | | | | | | | | | | RTLD_LOCAL is the default value for symbol visibility flag of dlopen() in Linux and NetBSD. Using it avoids conflicts during symbol resolution. This also allows us to detect xlators that have not been explicitly linked with libraries that they use. This used to go unnoticed when RTLD_GLOBAL was being used. BUG: 1193929 Change-Id: I50db6ea14ffdee96596060c4d6bf71cd3c432f7b Signed-off-by: Prashanth Pai <ppai@redhat.com>
* all: Simplify component message id's definitionXavier Hernandez2017-12-141-60/+23
| | | | | | | | | This patch creates a new way of defining message id's that is easier and less error prone because it doesn't require so many manual changes each time a new component is defined or a new message created. Change-Id: I71ba8af9ac068f5add7e74f316a2478bc991c67b Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* glusterfs: Use gcc builtin ATOMIC operator to increase/decreate refcount.Mohit Agrawal2017-12-124-22/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In glusterfs code base we call mutex_lock/unlock to take reference/dereference for a object.Sometime it could be reason for lock contention also. Solution: There is no need to use mutex to increase/decrease ref counter, instead of using mutex use gcc builtin ATOMIC operation. Test: I have not observed yet how much performance gain after apply this patch specific to glusterfs but i have tested same with below small program(mutex and atomic both) and get good difference. static int numOuterLoops; static void * threadFunc(void *arg) { int j; for (j = 0; j < numOuterLoops; j++) { __atomic_add_fetch (&glob, 1,__ATOMIC_ACQ_REL); } return NULL; } int main(int argc, char *argv[]) { int opt, s, j; int numThreads; pthread_t *thread; int verbose; int64_t n = 0; if (argc < 2 ) { printf(" Please provide 2 args Num of threads && Outer Loop\n"); exit (-1); } numThreads = atoi(argv[1]); numOuterLoops = atoi (argv[2]); if (1) { printf("\tthreads: %d; outer loops: %d;\n", numThreads, numOuterLoops); } thread = calloc(numThreads, sizeof(pthread_t)); if (thread == NULL) { printf ("calloc error so exit\n"); exit (-1); } __atomic_store (&glob, &n, __ATOMIC_RELEASE); for (j = 0; j < numThreads; j++) { s = pthread_create(&thread[j], NULL, threadFunc, NULL); if (s != 0) { printf ("pthread_create failed so exit\n"); exit (-1); } } for (j = 0; j < numThreads; j++) { s = pthread_join(thread[j], NULL); if (s != 0) { printf ("pthread_join failed so exit\n"); exit (-1); } } printf("glob value is %ld\n",__atomic_load_n (&glob,__ATOMIC_RELAXED)); exit(0); } time ./thr_count 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 1m10.288s user 0m57.269s sys 3m31.565s time ./thr_count_atomic 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 0m20.313s user 1m20.558s sys 0m0.028 Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* rpc: Fix format warnings when using IPV6_DEFAULTShreyas Siravara2017-12-072-2/+2
| | | | | | Change-Id: I22e622212f30defe6f2af1a67d7b48a88d37a097 BUG: 1520974 Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
* metrics: provide options to dump metrics from xlatorsAmar Tumballi2017-12-061-0/+1
| | | | | | | | | | * Introduce xlator methods to allow dumping of metrics * Separate options to get the metrics dumped in a path Updates #168 Change-Id: I7df80df33b71d6f449f03c2332665b4a45f6ddf2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* xdr: Fix build errors due to missing xdr symbol when building against TIRPCShreyas Siravara2017-12-061-0/+3
| | | | | Change-Id: Ic52045f5dd19e551612242450b8982f42ff327e9 Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
* rio/everywhere: add icreate/namelink fopSusant Palai2017-12-051-0/+2
| | | | | | | | | | | | | | | | | | | | | icreate creates inode, while namelink links the basename to it's parent gfid. For now mkdir is the primary user of these fops. Better distribution is acheived by creating the inode on ,(say) mds1 and linking the basename to it's parent gfid on mds2. The inode serves readdirp, stat etc. More details about the fops are present at: https://review.gluster.org/#/c/13395/3/design/DHT2/DHT2_Icreate_Namelink_Notes.md This backport of three patches from experimental branch. 1- https://review.gluster.org/#/c/18085/ 2- https://review.gluster.org/#/c/18086/ 3- https://review.gluster.org/#/c/18094/ Updates gluster/glusterfs#243 Change-Id: I1bd3d5a441a3cfab1acfeb52f15c6c867d362592 Signed-off-by: Susant Palai <spalai@redhat.com>
* rpc: Eliminate conn->lock contention by using more granular locksMohit Agrawal2017-11-283-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | rpc_clnt_submit() acquires conn->lock before call to rpc_transport_submit_request() and subsequent queuing of frame into saved_frames list. However, as part of handling RPC_TRANSPORT_MSG_RECEIVED and RPC_TRANSPORT_MSG_SENT notifications in rpc_clnt_notify(), conn->lock is again used to atomically update conn->last_received and conn->last_sent event timestamps. So when conn->lock is acquired as part of submitting a request, a parallel POLLIN notification gets blocked at rpc layer until the request submission completes and the lock is released. To get around this, this patch call clock_gettime (instead to call gettimeofday) to update event timestamps in conn->last_received and conn->last_sent and to call clock_gettime don't need to call mutex_lock because it (clock_gettime) is thread safe call. Note: Run fio on vm after apply the patch, iops is improved after apply the patch. Change-Id: I347b5031d61c426b276bc5e07136a7172645d763 BUG: 1467614 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* rpc-lib: coverity fixesMilind Changire2017-11-225-38/+70
| | | | | | | | | | | | | | | | | Scan URL: https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-11-10-0f524f07/html/ ID: 9 (BAD_SHIFT) ID: 58 (CHECKED_RETURN) ID: 98 (DEAD_CODE) ID: 249, 250, 251, 252 (MIXED_ENUMS) ID: 289, 297 (NULL_RETURNS) ID: 609, 613, 622, 644, 653, 655 (UNUSED_VALUE) ID: 432 (RESOURCE_LEAK) Change-Id: I2349877214dd38b789e08b74be05539f09b751b9 BUG: 789278 Signed-off-by: Milind Changire <mchangir@redhat.com>
* rpc : Change the way client uuid is builtPoornima G2017-11-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Today the main users of client uuid are protocol layers, locks, leases. Protocol layers requires each client uuid to be unique, even across connects and disconnects. Locks and leases on the server side also use the same client uid which changes across file migrations. Which makes the graph switch and file migration tedious for locks and leases. file migration across bricks becomes difficult as client uuid for the same client, is different on the other brick. The exact set of issues exists for leases as well. Solution would be to introduce a constant in the client-uid string which the locks and leases can use to identify the owner client across bricks. Client uuid currently: %s(ctx uuid)-%s(protocol client name)-%d(graph id)%s(setvolume count/reconnect count) Proposed Client uuid: "CTX_ID:%s-GRAPH_ID:%d-PID:%d-HOST:%s-PC_NAME:%s-RECON_NO:%s" - CTX_ID: This is will be constant per client. - GRAPH_ID, PID, HOST, PC_NAME(protocol client name), RECON_NO(setvolume count) remains the same. Change-Id: Ia81d57a9693207cd325d7b26aee4593fcbd6482c BUG: 1369028 Signed-off-by: Susant Palai <spalai@redhat.com>
* rpc: bring a new protocol versionAmar Tumballi2017-11-071-0/+2
| | | | | | | | | | * xdr: add gfid to on wire format for fsetattr/rchecksum * as it is change in on wire XDR format, needed backward compatible RPC programs. Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 827334 Change-Id: Id0a2da3632516dc1a5560dde2b151b2e5f0be8e5
* rpc: optimize fop program lookupMilind Changire2017-11-062-4/+9
| | | | | | | | | | Ensure that the fop program is the first in the program list so that there's minimum amount of time spent to search the program for the most frequently needed use case. Change-Id: I45c3dcdbf39ec90ba39d914432d13a2ace00a5ee BUG: 1509647 Signed-off-by: Milind Changire <mchangir@redhat.com>
* stack: change gettimeofday() to clock_gettime()Amar Tumballi2017-11-061-1/+2
| | | | | | | | | | | | For achieving the above, needed below changes too. * more sanity into how 'frame->op' is assigned. * infra to have 'stats' as separate section in 'xlator_t' structure Updates #137 Change-Id: I36679bf9577f3ed00a695b4e7d92870dcb3db8e1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rpc: make actor search parallelMilind Changire2017-11-062-28/+28
| | | | | | | | | | | | | | | | Problem: On a service request, the actor is searched using an exclusive mutex lock which is not really necessary since most of the time the actor list is going to be searched and not modified. Solution: Use a read-write lock instead of a mutex lock. Only modify operations on a service need to be done under a write-lock which grants exclusive access to the code. Change-Id: Ia227351b3f794bd8eee70c7a76d833cc716ab113 BUG: 1509644 Signed-off-by: Milind Changire <mchangir@redhat.com>
* gluster: IPv6 single stack supportKevin Vigor2017-10-243-0/+97
| | | | | | | | | | | | | | | | | | | | | | Summary: - This diff changes all locations in the code to prefer inet6 family instead of inet. This will allow change GlusterFS to operate via IPv6 instead of IPv4 for all internal operations while still being able to serve (FUSE or NFS) clients via IPv4. - The changes apply to NFS as well. - This diff ports D1892990, D1897341 & D1896522 to the 3.8 branch. Test Plan: Prove tests! Reviewers: dph, rwareing Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I34fdaaeb33c194782255625e00616faf75d60c33 BUG: 1406898 Reviewed-on-3.8-fb: http://review.gluster.org/16059 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Tested-by: Shreyas Siravara <sshreyas@fb.com>
* rpc: free conn->name in rpc_clnt_destroy()Niels de Vos2017-10-111-0/+1
| | | | | | Change-Id: Idf99908aa48718a7faf7f0bbc647a679ec548282 BUG: 1443145 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* rpc: free registered callback programsNiels de Vos2017-10-111-0/+7
| | | | | | Change-Id: I8c6f6b642f025d1faf74015b8f7aaecd7ebfd4d5 BUG: 1443145 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* heal: New feature heal info summary to list the status of brick and count of ↵Mohamed Ashiq Liyazudeen2017-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entries to be healed Command output: Brick 192.168.2.8:/brick/1 Status: Connected Total Number of entries: 363 Number of entries in heal pending: 362 Number of entries in split-brain: 0 Number of entries possibly healing: 1 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <healInfo> <bricks> <brick hostUuid="9105dd4b-eca8-4fdb-85b2-b81cdf77eda3"> <name>192.168.2.8:/brick/1</name> <status>Connected</status> <totalNumberOfEntries>363</numberOfEntries> <numberOfEntriesInHealPending>362</numberOfEntriesInHealPending> <numberOfEntriesInSplitBrain>0</numberOfEntriesInSplitBrain> <numberOfEntriesPossiblyHealing>1</numberOfEntriesPossiblyHealing> </brick> </bricks> </healInfo> <opRet>0</opRet> <opErrno>0</opErrno> <opErrstr/> </cliOutput> Change-Id: I40cb6f77a14131c9e41b292f4901b41a228863d7 BUG: 1261463 Signed-off-by: Mohamed Ashiq Liyazudeen <mliyazud@redhat.com> Reviewed-on: https://review.gluster.org/12154 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Karthik U S <ksubrahm@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* rpc: destroy transport after client_tMilind Changire2017-08-311-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1. Ref counting increment on the client_t object is done in rpcsvc_request_init() which is incorrect. 2. Ref not taken when delegating to grace_time_handler() Solution: 1. Only fop requests which require processing down the graph via stack 'frames' now ref count the request in get_frame_from_request() 2. Take ref on client_t object in server_rpc_notify() but avoid dropping in RPCSVC_EVENT_TRANSPORT_DESRTROY. Drop the ref unconditionally when exiting out of grace_time_handler(). Also, avoid dropping ref on client_t in RPCSVC_EVENT_TRANSPORT_DESTROY when ref mangement as been delegated to grace_time_handler() Change-Id: Ic16246bebc7ea4490545b26564658f4b081675e4 BUG: 1481600 Reported-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/17982 Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Revert "rpc: disable client on disconnection from rebalance"Milind Changire2017-08-241-4/+0
| | | | | | | | | | | | | | This reverts commit 5b14c11d3cae38bc66006b02217ede485ae30dea. BUG: 1484225 Change-Id: I3269d3fc64de3f3cc6f670ea564a87d7725e10fd Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/18113 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc: disable client on disconnection from rebalanceMilind Changire2017-08-231-0/+4
| | | | | | | | | | | | | | | | | | | Problem: glusterd rpc code path attempts to reconnect to rebalance process via the reconnect timer even after the rebalance process disconnection Solution: Set the clnt->disabled flag to 1 to avoid reconnection and cause the clnt object to be unref'd Change-Id: I4e38eaef45d2fdea86d25e9dff9f1af0cd29cf66 BUG: 1484225 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/18093 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* tier: separation of attach-tier from add-brickhari gowtham2017-08-011-0/+1
| | | | | | | | | | | | | | | | | | PROBLEM: Both attach tier and add brick have the same RPC and set of code. This becomes a hurdle while tring to implement add brick on a tiered volume. FIX: This patch separates the add brick and attach tier giving them separate RPCs. Change-Id: Iec57e972be968a9ff00b15b507e56a4f6dc398a2 BUG: 1376326 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: https://review.gluster.org/15503 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc: prevent logging null client [Invalid argument]Prashanth Pai2017-07-251-1/+3
| | | | | | | | | | | | | | | | | | | | | The following log entry is observed during volume operations such as volume create: [2017-07-20 05:13:43.213797] E [client_t.c:321:gf_client_ref] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_request_create+0x1a4) [0x7f987f66cd20] -->/usr/local/lib/libgfrpc.so.0(rpcsvc_request_init+0xd0) [0x7f987f66ca23] -->/usr/local/lib/libglusterfs.so.0(gf_client_ref+0x56) [0x7f987f91cbd5] ) 0-client_t: null client [Invalid argument] Change-Id: I49ba753e8d1a828bb275b0ccb1a181706774f387 BUG: 1193929 Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/17848 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc: include current second in timed out frame cleanupMilind Changire2017-07-211-1/+1
| | | | | | | | | | | | | | | | | | | Problem: frames which time out at current second are missed out Solution: change test to include frames timing out at current second i.e. timeout <= current.tv_sec instead of timeout < current.tv_sec Change-Id: I459d47856ade2b657a0289e49f7f63da29186d6e BUG: 1468433 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/17722 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs: Name threads on creationRaghavendra Talur2017-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set names to threads on creation for easier debugging. Output of top -H -p <PID-OF-GLUSTERFSD> Before: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd After: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703 BUG: 1254002 Updates: #271 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/11926 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* program/GF-DUMP: Shield ping processing from traffic to GlusterfsRaghavendra G2017-07-182-5/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Program Since poller thread bears the brunt of execution till the request is handed over to io-threads, poller thread experiencies lock contention(s) in the control flow till io-threads, which slows it down. This delay invariably affects reading ping requests from network and responding to them, resulting in increased ping latencies, which sometimes results in a ping-timer-expiry on client leading to disconnect of transport. So, this patch aims to free up poller thread from executing code of Glusterfs Program. We do this by making * Glusterfs Program registering itself asking rpcsvc to execute its actors in its own threads. * GF-DUMP Program registering itself asking rpcsvc to _NOT_ execute its actors in its own threads. Otherwise program's ownthreads become bottleneck in processing ping traffic. This means that poller thread reads a ping packet, invokes its actor and hands the response msg to transport queue. Change-Id: I526268c10bdd5ef93f322a4f95385137550a6a49 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1421938 Reviewed-on: https://review.gluster.org/17105 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Link against missed libraries to resolve symbolsPrashanth Pai2017-07-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | When external programs perform a dlopen("..so", RTLD_LAZY|RTLD_LOCAL) on some shared objects like xlators, it can fail with dlerror set to error string "undefined symbol <some-type>". This was observed for the following shared objects: fuse.so, quota.so, quotad.so, server.so, libgfrpc.so and socket.so P.S: This was found while running a go program which fetches the list of xlator options (volume_option_t) from xlator's shared object. BUG: 1193929 Change-Id: I7b958409cf11fb67c2be32a3f85a96fb1260236b Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/17659 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* multiple: fix struct/typedef inconsistenciesJeff Darcy2017-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The most common pattern, both in our code and elsewhere, is this: struct _xyz { ... }; typedef struct _xyz xyz_t; These exceptions - especially call_frame/call_stack - have been slowing down code navigation for years. By converging on a single pattern, navigating from xyz_t in code to the actual definition of struct _xyz (i.e. without having to visit the typedef first) might even be automatable. Change-Id: I0e5dd1f51f98e000173c62ef4ddc5b21d9ec44ed Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/17650 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* rpc: use GF_ATOMIC_INC to generate rpc_clnt's callidZhou Zhengping2017-05-082-17/+3
| | | | | | | | | | | | Change-Id: I57ad970411db1ccd3d2c56c504c7da9cc221051f BUG: 1448692 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17198 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* rpc: Remove accidental IPV6 changesKaushal M2017-05-053-97/+0
| | | | | | | | | | | | | | They snuck in with the HALO patch (07cc8679c) Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6 BUG: 1447953 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: https://review.gluster.org/17174 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-025-12/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: Fix removing pmap entry on rpc disconnectPrashanth Pai2017-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: The following line of code intended to remove pmap entry for the connection during disconnects: pmap_registry_remove (this, 0, NULL, GF_PMAP_PORT_NONE, xprt); However, no pmap entry will have it's type set to GF_PMAP_PORT_NONE at any point in time. So a call to pmap_registry_search_by_xprt() in pmap_registry_remove() will always fail to find a match. Fix: Optionally ignore pmap entry's type in pmap_registry_search_by_xprt(). BUG: 1193929 Change-Id: I705f101739ab1647ff52a92820d478354407264a Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/17129 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd : Disallow peer detach if snapshot bricks exist on itGaurav Yadav2017-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : - Deploy gluster on 2 nodes, one brick each, one volume replicated - Create a snapshot - Lose one server - Add a replacement peer and new brick with a new IP address - replace-brick the missing brick onto the new server (wait for replication to finish) - peer detach the old server - after doing above steps, glusterd fails to restart. Solution: With the fix detach peer will populate an error : "N2 is part of existing snapshots. Remove those snapshots before proceeding". While doing so we force user to stay with that peer or to delete all snapshots. Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c BUG: 1322145 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/16907 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanupAtin Mukherjee2017-03-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | Commit 086436a introduced generation number (cleanup_gen) to ensure that rpc layer doesn't end up cleaning up the connection object if application layer has already destroyed it. Bumping up cleanup_gen was done only in rpc_clnt_connection_cleanup (). However the same is needed in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed through the reconnect event in the application layer, rpc layer will still end up in trying to delete the object resulting into double free and crash. Peer probing an invalid host/IP was the basic test to catch this issue. Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46 BUG: 1433578 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16914 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* rpc: avoid logging success on failureMilind Changire2017-03-071-0/+5
| | | | | | | | | | | | | | | | Avoid logging Success in the event of failure especially when errno has no meaningful value w.r.t. the failure. In this case the errno is set to zero when there's indeed a failure at the RPC level. Change-Id: If2cc81aa1e590023ed22892dacbef7cac213e591 BUG: 1426032 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/16730 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>