summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src
Commit message (Collapse)AuthorAgeFilesLines
* gfapi : New APIs have been added to use lease feature in glusterSoumya Koduri2018-01-261-0/+1
| | | | | | | | | | | Following APIs glfs_h_lease(), glfs_lease() added, so that gfapi applications can set and get lease which enables more efficient client side caching. Updates: #350 Change-Id: Iede85be9af1d4df969b890d0937ed0afa4ca6596 Signed-off-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* build: use libtirpc by default, even if ipv6 is not the defaultKaleb S. KEITHLEY2018-01-261-1/+2
| | | | | | | | | | | | | | Another error snuck in with Change-Id I86f847dfd, or more accurately I think, with Change-Id: Ic47065e9c2... All libs, not just libgfrpc, need to be linked with libtirpc, especially on systems that still have xdr functions in (g)libc where you will get a mixture of calls to libtirpc functions and glibc functions, with catastrophic results. BUG: 1536186 Change-Id: I97dc39c7844f44c36fe210aa813480c219e1e415 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* dentry fop serializer: added new server side xlator for dentry fop serializationSakshi Bansal2018-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: #397 BUG: 1304962 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* libglusterfs: Reset errno before callv4.1devNigel Babu2018-01-231-1/+4
| | | | | | | | This was causing Gluster to return a failure when testing on Centos7. BUG: 1536913 Change-Id: Idb90baef05058123a7f69e94a51dd79abd371815 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* libgfapi: Add new api for supporting mandatory-locksAnoop C S2018-01-221-2/+3
| | | | | | | | | | | | | | | | The current API for byte-range locks [glfs_posix_lock()] doesn't allow applications to specify whether it is advisory or mandatory type locks. This particular change is to introduce an extended byte-range lock API with an additional argument for including the byte-range lock mode to be one among advisory(default) or mandatory. Patch also includes a gfapi test case which make use of this new api to acquire mandatory locks. Ref: https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/Mandatory%20Locks.md Change-Id: Ia09042c755d891895d96da857321abc4ce03e20c Updates #393 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* md-cache: Implement dynamic configuration of xattr list for cachingPoornima G2018-01-223-0/+21
| | | | | | | | | | | | | | | Currently, the list of xattrs that md-cache can cache is hard coded in the md-cache.c file, this necessiates code change and rebuild everytime a new xattr needs to be added to md-cache xattr cache list. With this patch, the user will be able to configure a comma seperated list of xattrs to be cached by md-cache Updates #297 Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e Signed-off-by: Poornima G <pgurusid@redhat.com>
* protocol: make on-wire-change of protocol using new XDR definition.Amar Tumballi2018-01-191-209/+0
| | | | | | | | | | | | | | | | | | | | | | With this patchset, some major things are changed in XDR, mainly: * Naming: Instead of gfs3/gfs4 settle for gfx_ for xdr structures * add iattx as a separate structure, and add conversion methods * the *_rsp structure is now changed, and is also reduced in number (ie, no need for different strucutes if it is similar to other response). * use proper XDR methods for sending dict on wire. Also, with the change of xdr structure, there are changes needed outside of xlator protocol layer to handle these properly. Mainly because the abstraction was broken to support 0-copy RDMA with payload for write and read FOP. This made transport layer know about the xdr payload, hence with the change of xdr payload structure, transport layer needed to know about the change. Updates #384 Change-Id: I1448fbe9deab0a1b06cb8351f2f37488cefe461f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* gfapi : added glfs_setfsleaseid() for setting lease idSoumya Koduri2018-01-191-0/+1
| | | | | | | | | | | | A new function glfs_setfsleaseid() added in gfapi. Currently lock owner is saved in the thread context. Similarly the leaseid attribute can be saved using glfs_setfsleaseid(). Updates: #350 Change-Id: I55966cca01d0f2649c32b87bd255568c3ffd1262 Signed-off-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* cluster/afr: Adding option to take full file lockkarthik-us2018-01-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In replica 3 volumes there is a possibilities of ending up in split brain scenario, when multiple clients writing data on the same file at non overlapping regions in parallel. Scenario: - Initially all the copies are good and all the clients gets the value of data readables as all good. - Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks. - C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks. - All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain. Fix: Have an option to take either full lock or range lock on files while doing data transactions, to prevent the possibility of ending up in split brains. With this change, by default the files will take full lock while doing IO. If you want to make use of the old range lock change the value of "cluster.full-lock" to "no". Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962 BUG: 1535438 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* rpc/*: auth-header changesAmar Tumballi2018-01-171-0/+5
| | | | | | | | | | | | | | | | | Introduce another authentication header which can now send more data. This is useful because this data can be common for all the fops, and we don't need to change all the signatures. As part of this, made rpc-clnt.c little more modular to support multiple authentication structures. stack.h changes are placeholder for the ctime etc, can be moved later based on need. updates #384 Change-Id: I6111c13cfd2ec92e2b4e9295896bf62a8a33b2c7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* dict: add another type to handle backward compatibilityAmar Tumballi2018-01-174-6/+35
| | | | | | | | | | | | | | | | This new type helps to avoid excessive logs. It should be set only in case of * volume graph building (graph.y) * dict unserialize (happens once a dictionary is received on wire in old protocol) All other dict set and get should have proper check and warning logs if there is a mismatch. updates #220 Change-Id: I1cccb304a877aa80c07aaac95f10f5005e35b9c5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* locks: added inodelk/entrylk contention upcall notificationsXavier Hernandez2018-01-161-0/+17
| | | | | | | | | | | | | | The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* dict: fix VALIDATE_DATA_AND_LOG callAtin Mukherjee2018-01-071-2/+2
| | | | | | | | | | Couple of instances doesn't pass enough number of parameters to the function resulting compilation to fail. Updates #203 Change-Id: Id8caa6fe7fc611645ad7ff11d81a2462e4ec6bab Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* libglusterfs: Include key name in data type validationN Balachandran2018-01-051-27/+27
| | | | | | | | | | Printing the key name makes it easier for developers to figure out which keys have dict data type mismatches. Updates #337 Change-Id: I21d9a22488a4c5e5a8d991ca2d53f1e3039f7685 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dict: add more types for valuesAmar Tumballi2018-01-053-6/+162
| | | | | | | | | | Added 2 more types which are present in gluster codebase, mainly IATT and UUID. Updates #203 Change-Id: Ib6d6d6aefb88c3494fbf93dcbe08d9979484968f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: export minimum necessary symbolsKaleb S. KEITHLEY2018-01-022-2/+1093
| | | | | | | | | | | | | | | | | | | | | | minimize risk of symbol collisions in global namespace. see https://review.gluster.org/#/c/5697/ which Amar has resurrected. This is a strawman proposal to use an export-list to only export the necessary symbols from libglusterfs. I suppose some of this could be fixed by smarter use of static in the function definitions. It's a bit scary to see some of the names we expose. And then there are the names we use in the reserved namespace. One step short of going all the way to symbol versions fixes gluster/glusterfs#382 Change-Id: Ifb848dfc655ef735dd27c73b7729e1188eb817f1 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* posix: Introduce flags for validity of iatt membersRavishankar N2017-12-291-10/+49
| | | | | | | | | | | | | | | | | | v1 of the patch started off as adding new fields to iatt that can be filled up using statx but the discussions were more around introducing masks to check the validity of different fields from a RIO perspective. To that extent, I have dropped the statx call in this version and introduced a 64 bit mask for existing fields. The masks I have defined are similar with the statx() flags' masks. I have *not* changed iatt_to_stat() to use the macros IATT_TYPE_VALID, IATT_GFID_VALID etc before blindly copying from struct iatt to struct. Also fixed warnings in xlators because of atime/mtime/ctime seconds field change from uint32_t to int64_t. Change-Id: I4ac614f1e8d5c8246fc99d5bc2d2a23e7941512b Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* Use RTLD_LOCAL for symbol resolutionPrashanth Pai2017-12-271-2/+2
| | | | | | | | | | | | | RTLD_LOCAL is the default value for symbol visibility flag of dlopen() in Linux and NetBSD. Using it avoids conflicts during symbol resolution. This also allows us to detect xlators that have not been explicitly linked with libraries that they use. This used to go unnoticed when RTLD_GLOBAL was being used. BUG: 1193929 Change-Id: I50db6ea14ffdee96596060c4d6bf71cd3c432f7b Signed-off-by: Prashanth Pai <ppai@redhat.com>
* Set log path correctly when clients use UDSPrashanth Pai2017-12-271-5/+23
| | | | | | | | | | | | | | | | When a libgapi client passes a path to Unix socket file as the "host" parameter to glfs_set_volfile_server() and doesn't explicitly specify a log file, the default log file path being generated was invalid. Example: ERROR: failed to create logfile "/var/log/glusterfs//tmp/gd2/ w1/run/glusterd2.socket-test-10368.log" (No such file or directory) With this fix, it is set to: /var/log/glusterfs/tmp-gd2-w1-run-glusterd2.socket-test-31869.log Change-Id: Ibb4b58382c72eab0d104543781e0e966ebf4c47f Signed-off-by: Prashanth Pai <ppai@redhat.com>
* dict: support better on-wire transferAmar Tumballi2017-12-272-67/+220
| | | | | | | | | | | | | | | This patch brings data type awareness to dictionary, and also makes sure valid data is properly sent to the other side of the wire using XDR. Next step is to allow people to add more data types (for example, Bool, UUID, iatt etc), and then make it part of every fop signature in wire. Fixes #203 Change-Id: Ie0eee2db847bea2bf7dad80dec89ce3e7c5917c1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster/dht: Add migration checks to dht_(f)xattropN Balachandran2017-12-261-0/+1
| | | | | | | | | | | | The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c BUG: 1471031 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* xlator.h: move options and other variables to the top of structureAmar Tumballi2017-12-221-22/+22
| | | | | | | | | | | | This helps external applications which wants to consume xlator_api to read only fields (and not functions) using dlopen() to write smaller structures/objects and still achieve their requirements. One such example is GD2 project. Updates #168 Change-Id: I8737939c8c72f6572ee1514201e9f9f8e4f37b40 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster/ec: Change [f]getxattr to parallel-dispatch-onePranith Kumar K2017-12-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment in EC, [f]getxattr operations wait to acquire a lock while other operations are in progress even when it is in the same mount with a lock on the file/directory. This happens because [f]getxattr operations follow the model where the operation is wound on 'k' of the bricks and are matched to make sure the data returned is same on all of them. This consistency check requires that no other operations are on-going while [f]getxattr operations are wound to the bricks. We can perform [f]getxattr in another way as well, where we find the good_mask from the lock that is already granted and wind the operation on any one of the good bricks and unwind the answer after adjusting size/blocks to the parent xlator. Since we are taking into account good_mask, the reply we get will either be before or after a possible on-going operation. Using this method, the operation doesn't need to depend on completion of on-going operations which could be taking long time (In case of some slow disks and writes are in progress etc). Thus we reduce the time to serve [f]getxattr requests. I changed [f]getxattr to dispatch-one and added extra logic in ec_link_has_lock_conflict() to not have any conflicts for fops with EC_MINIMUM_ONE as fop->minimum to achieve the effect described above. Modified scripts to make sure READ fop is received in EC to trigger heals. Updates gluster/glusterfs#368 Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* rchecksum/fips: Replace MD5 usage to enable fips supportKotresh HR2017-12-212-4/+6
| | | | | | | | | rchecksum uses MD5 which is not fips compliant. Hence using sha256 for the same. Updates: #230 Change-Id: I7fad016fcc2a9900395d0da919cf5ba996ec5278 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* snapshot: fix several coverity issues in glusterd-snapshot.cSunny Kumar2017-12-212-0/+17
| | | | | | | | | | This patch fixes issues 157, 426, 428, 431, 432, 437,439, 482 from [1]. [1] https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-12-13-e255385a/html/ Change-Id: Iff9df12bd9802db29434155badb1beda045aba5b BUG: 789278 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* fips: Replace md5sum usage to enable fips supportKotresh HR2017-12-192-12/+0
| | | | | | | | | | | | md5sum is not fips compliant. Using xxhash64 instead of md5sum for socket file generation in glusterd and changelog to enable fips support. NOTE: md5sum is 128 bit hash. xxhash used is 64 bit. Updates: #230 Change-Id: I1bf2ea05905b9151cd29fa951f903685ab0dc84c Signed-off-by: Kotresh HR <khiremat@redhat.com>
* all: Simplify component message id's definitionXavier Hernandez2017-12-143-2013/+301
| | | | | | | | | This patch creates a new way of defining message id's that is easier and less error prone because it doesn't require so many manual changes each time a new component is defined or a new message created. Change-Id: I71ba8af9ac068f5add7e74f316a2478bc991c67b Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* logging: free the strdup'd filename and ident on gf_log_fini()Niels de Vos2017-12-131-34/+44
| | | | | | | | | | | | | | Every time glfs_set_logging() is called, the strings containing the filename and the identity of the logfile leaks. Both should be free'd before a new gf_strdup() is done and in gf_log_fini(). In addition to the free'ing of the filename, the code has been adapted to use sys_open() and fdopen() so that unneeded closes and re-opening of the filedescriptor are prevented. Change-Id: I63e3e757ac990a4db419206dfad141ab68dbfa06 BUG: 1443145 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* write-behind: Allow trickling-writes to be configurableCsaba Henk2017-12-131-0/+2
| | | | | | | | | | | | | | This is the undisputed/trivial part of Shreyas' patch he attached to https://bugzilla.redhat.com/1364740 (of which the current bug is a clone). We need more evaluation for the page_size and window_size bits before taking them on. Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9 BUG: 1428060 Co-authored-by: Shreyas Siravara <sshreyas@fb.com> Signed-off-by: Csaba Henk <csaba@redhat.com>
* glusterfs: Use gcc builtin ATOMIC operator to increase/decreate refcount.Mohit Agrawal2017-12-129-162/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In glusterfs code base we call mutex_lock/unlock to take reference/dereference for a object.Sometime it could be reason for lock contention also. Solution: There is no need to use mutex to increase/decrease ref counter, instead of using mutex use gcc builtin ATOMIC operation. Test: I have not observed yet how much performance gain after apply this patch specific to glusterfs but i have tested same with below small program(mutex and atomic both) and get good difference. static int numOuterLoops; static void * threadFunc(void *arg) { int j; for (j = 0; j < numOuterLoops; j++) { __atomic_add_fetch (&glob, 1,__ATOMIC_ACQ_REL); } return NULL; } int main(int argc, char *argv[]) { int opt, s, j; int numThreads; pthread_t *thread; int verbose; int64_t n = 0; if (argc < 2 ) { printf(" Please provide 2 args Num of threads && Outer Loop\n"); exit (-1); } numThreads = atoi(argv[1]); numOuterLoops = atoi (argv[2]); if (1) { printf("\tthreads: %d; outer loops: %d;\n", numThreads, numOuterLoops); } thread = calloc(numThreads, sizeof(pthread_t)); if (thread == NULL) { printf ("calloc error so exit\n"); exit (-1); } __atomic_store (&glob, &n, __ATOMIC_RELEASE); for (j = 0; j < numThreads; j++) { s = pthread_create(&thread[j], NULL, threadFunc, NULL); if (s != 0) { printf ("pthread_create failed so exit\n"); exit (-1); } } for (j = 0; j < numThreads; j++) { s = pthread_join(thread[j], NULL); if (s != 0) { printf ("pthread_join failed so exit\n"); exit (-1); } } printf("glob value is %ld\n",__atomic_load_n (&glob,__ATOMIC_RELAXED)); exit(0); } time ./thr_count 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 1m10.288s user 0m57.269s sys 3m31.565s time ./thr_count_atomic 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 0m20.313s user 1m20.558s sys 0m0.028 Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* core/memacct: save allocs in mem_acct_rec listN Balachandran2017-12-073-1/+42
| | | | | | | | | | | | | | | With configure --enable-debug, add all object allocations to a list in the corresponding mem_acct_rec. This allows us to see all objects of a particular type and allows for additional debugging in case of memory leaks. This is not compiled in by default and must be explicitly enabled. It is intended to be used by developers. Change-Id: I7cf2dbeadecf994423d7e7591e85f18d2575cce8 BUG: 1522662 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* Fixes gNFSd gf_update_latency crashesRichard Wareing2017-12-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | Summary: - Per title, does a bounds check on the frame->op and bails from the function if it's invalid preventing the crash Test Plan: Prove tests Reviewers: dph, jackl Reviewed By: jackl FB-commit-id: e67cc15 Change-Id: If1a5a9c0630573d4a6615050a9114ccf532551c7 BUG: 1522847 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: https://review.gluster.org/16847 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* performance/io-threads: Reduce the number of timing calls in iot_workerMax Rijevski2017-12-071-0/+4
| | | | | | | | | | | | | | | | | Summary: - Reduce the amount of unnecessary timing calls in iot_worker servicing. - The current logic is unnecessarily accurate and hurts performance for many small FOPS. Change-Id: I6db4f1ad9a48d9d474bb251a2204969061021954 BUG: 1522950 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Reviewed-on: http://review.gluster.org/16081 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kevin Vigor <kvigor@fb.com>
* libglusterfs: specify ctx in gf_log_set_loglevelZhang Huan2017-12-063-9/+5
| | | | | | | | | specify ctx in gf_log_set_loglevel, instead of getting it from a thread specific variable. Change-Id: I498f826e8e32231235a6b0005026a27c327727fd BUG: 1521213 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* metrics: provide options to dump metrics from xlatorsAmar Tumballi2017-12-069-18/+331
| | | | | | | | | | * Introduce xlator methods to allow dumping of metrics * Separate options to get the metrics dumped in a path Updates #168 Change-Id: I7df80df33b71d6f449f03c2332665b4a45f6ddf2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rio/everywhere: add icreate/namelink fopSusant Palai2017-12-0514-1/+339
| | | | | | | | | | | | | | | | | | | | | icreate creates inode, while namelink links the basename to it's parent gfid. For now mkdir is the primary user of these fops. Better distribution is acheived by creating the inode on ,(say) mds1 and linking the basename to it's parent gfid on mds2. The inode serves readdirp, stat etc. More details about the fops are present at: https://review.gluster.org/#/c/13395/3/design/DHT2/DHT2_Icreate_Namelink_Notes.md This backport of three patches from experimental branch. 1- https://review.gluster.org/#/c/18085/ 2- https://review.gluster.org/#/c/18086/ 3- https://review.gluster.org/#/c/18094/ Updates gluster/glusterfs#243 Change-Id: I1bd3d5a441a3cfab1acfeb52f15c6c867d362592 Signed-off-by: Susant Palai <spalai@redhat.com>
* libglusterfs: Add put fopPoornima G2017-12-0514-0/+271
| | | | | | | | | | | | | | | | | Problem: It had been a longtime request to implement put fop in gluster. put fop in gluster may not have the exact sementics of HTTP PUT, but can be easily extended to do so. The subsequent patches, will contain more semantics on the put fop and its guarentees. Why compound fop framework is not used for put? Compound fop framework currently doesn't allow compounding of entry fop and inode fops, i.e. fops on multiple inodes cannot be combined in compound fop. Updates #353 Change-Id: Idb7891b3e056d46d570bb7e31bad1b6a28656ada Signed-off-by: Poornima G <pgurusid@redhat.com>
* libglusterfs: use rwlock for fdtableZhang Huan2017-11-302-25/+24
| | | | | | | | | | | To resolve a fd from client requests, need to take a mutex lock for the fdtable to do the lookup. When a client is busy doing read and write, the mutex lock could introduce contention. Therefore, use rwlock instead of mutex to reduce the contention. Change-Id: Ic833aed738a178a7ea1abafed7eb13814989d28c BUG: 1518582 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* xlator: provide a xlator_api_t structure to include all exported optionsAmar Tumballi2017-11-304-37/+268
| | | | | | | | | | each translator from now on can have just 1 symbol exported called 'xlator_api', which has all the required fields in it. Updates: #164 Change-Id: I48d54f5ec59fee842b1d55877e3ac5e9ec9b6bdd Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster-syncop: Address comments in 3ad68df725ac32f83b5ea7c0976e2327e7037c8cPranith Kumar K2017-11-291-2/+4
| | | | | Change-Id: I325f718c6c440076c9d9dcd5ad1a0c6bde5393b1 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* libglusterfs: fix the call_stack_set_group() functionCsaba Henk2017-11-242-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - call_stack_set_group() will take the ownership of passed buffer from caller; - to indicate the change, its signature is changed from including the buffer directly to take a pointer to it; - either the content of the buffer is copied to the groups_small embedded buffer of the call stack, or the buffer is set as groups_large member of the call stack; - the groups member of the call stack is set to, respectively, groups_small or groups_large, according to the memory management conventions of the call stack; - the buffer address is overwritten with junk to effectively prevent the caller from using it further on. Also move call_stack_set_group to stack.c from stack.h to prevent "defined but not used [-Wunused-function]" warnings (not using it anymore in call_stack_alloc_group() implementation, which saved us from this so far). protocol/server: refactor gid_resolve() In gid_resolve there are two cases: either the gid_cache_lookup() call returns a value or not. The result is caputured in the agl variable, and throughout the function, each particular stage of the implementation comes with an agl and a no-agl variant. In most cases this is explicitly indicated via an if (agl) { ... } else { ... } but some of this branching are expressed via goto constructs (obfuscating the fact we stated above, that is, each particular stage having an agl/no-agl variant). In the current refactor, we bring the agl conditional to the top, and present the agl/non-agl implementations sequentially. Also we take the opportunity to clean up and fix the agl case: - remove the spurious gl.gl_list = agl->gl_list; setting, as gl is not used in the agl caae - populate the group list of call stack from agl, fixing thus referred BUG. Also fixes BUG: 1513920 Change-Id: I61f4574ba21969f7661b9ff0c9dce202b874025d BUG: 1513928 Signed-off-by: Csaba Henk <csaba@redhat.com>
* libglusterfs: Handle FS errors gracefullyPranith Kumar K2017-11-222-93/+167
| | | | | | | | | | | | | | | Problem: FS sometimes doesn't give the expected return values. We need our common functions to guard against this. Example BUG: https://bugzilla.redhat.com/show_bug.cgi?id=864401 Fix: When the return value is not as per specification, change the return value to -1 and errno to EIO BUG: 1469487 Change-Id: I14739ab2e5ae225b1a91438b87f8928af56f2934 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster-syncop: Implement tiebreaker inodelk/entrylkPranith Kumar K2017-11-222-0/+103
| | | | | | | | | | | | In this implementation, inodelk/entrylk will be tried for the subvols given with trylock. In this attempt if all locks are obtained, then inodelk is successful, otherwise, if it gets success on the first available subvolume, then it will go for blocking lock, where as other subvolumes will not try and this acts as tie-breaker. Updates gluster/glusterfs#354 Change-Id: Ia2521b9ccb81a42bd6104ab21f610f761ba2b801 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* timer: Fix possible race during cleanupSoumya Koduri2017-11-211-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As mentioned in bug1509189, there is a possible race between gf_timer_cancel(), gf_timer_proc() and gf_timer_registry_destroy() leading to use_after_free. Problem: 1) gf_timer_proc() is called, locks reg, and gets an event. It unlocks reg, and calls the callback. 2) Meanwhile gf_timer_registry_destroy() is called, and removes reg from ctx, and joins on gf_timer_proc(). 3) gf_timer_call_cancel() is called on the event being processed. It cannot find reg (since it's been removed from reg), so it frees event. 4) the callback returns into gf_timer_proc(), and it tries to free event, but it's already free, so double free. Solution: The fix is to bail out in gf_timer_cancel() when registry is not found. The logic behind this is that, gf_timer_cancel() is called only on any existing event. That means there was a valid registry earlier while creating that event. And the only reason we cannot find that registry now is that it must have got set to NULL when context cleanup is started. Since gf_timer_proc() takes care of releasing all the remaining events active on that registry, it seems safe to bail out in gf_timer_cancel(). Change-Id: Ia9b088533141c3bb335eff2fe06b52d1575bb34f BUG: 1509189 Reported-by: Daniel Gryniewicz <dang@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* libglusterfs:checked return coverity fixSubha sree Mohankumar2017-11-211-1/+8
| | | | | | | | | | Problem:Calling "gf_thread_create" without checking return value. Fix:The return value is saved and checked if gf_thread_create fails. Change-Id: Ibdaac1c90a1a8369e92ade50825598b041063da8 BUG: 789278 Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
* rpc : Change the way client uuid is builtPoornima G2017-11-201-24/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Today the main users of client uuid are protocol layers, locks, leases. Protocol layers requires each client uuid to be unique, even across connects and disconnects. Locks and leases on the server side also use the same client uid which changes across file migrations. Which makes the graph switch and file migration tedious for locks and leases. file migration across bricks becomes difficult as client uuid for the same client, is different on the other brick. The exact set of issues exists for leases as well. Solution would be to introduce a constant in the client-uid string which the locks and leases can use to identify the owner client across bricks. Client uuid currently: %s(ctx uuid)-%s(protocol client name)-%d(graph id)%s(setvolume count/reconnect count) Proposed Client uuid: "CTX_ID:%s-GRAPH_ID:%d-PID:%d-HOST:%s-PC_NAME:%s-RECON_NO:%s" - CTX_ID: This is will be constant per client. - GRAPH_ID, PID, HOST, PC_NAME(protocol client name), RECON_NO(setvolume count) remains the same. Change-Id: Ia81d57a9693207cd325d7b26aee4593fcbd6482c BUG: 1369028 Signed-off-by: Susant Palai <spalai@redhat.com>
* dict: Fix several coverity issues in dictMohit Agrawal2017-11-153-10/+37
| | | | | | | | | | | | | This patch fixes issues 230,592,593,110,63 from [1] [1] https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-10-30-9aa574a5/html/ Note: Resolve FORWARD_NULL coverity issue in glusterfs_ctx_new is also fixed with this patch. BUG: 789278 Change-Id: Ic4199a144a14cc9ead7366fb1c9699197141bc86 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* libglusterfs/options: Add new member 'setkey' to xlator optionsKaushal M2017-11-151-0/+4
| | | | | | | | | | | | | | | | | The 'setkey' will be used as the key by GD2 when setting the option during volgen. 'setkey' also supports using varstrings. This is mainly to be used for options, which use a different key for 'volume set' and in volfiles. For eg. the 'auth.*' options of protocol/server. The protocol/server xlator has been updated to make use of this for the auth.allow and auth.reject options. Updates #302 Change-Id: I1fd2fd69625c9db48595bd3f494c221625255169 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs/atomic: Improved atomic supportXavier Hernandez2017-11-141-72/+425
| | | | | | | | | | | | | | | | This patch solves a detection problem in configure.ac that prevented that compilation detects builtin __atomic or __sync functions. It also adds more atomic types and support for other atomic functions. An special case has been added to support 64-bit atomics on 32-bit systems. The solution is to fallback to the mutex solution only for 64-bit atomics, but smaller atomic types will still take advantage of builtins if available. Change-Id: I6b9afc7cd6e66b28a33278715583552872278801 BUG: 1510397 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* libglusterfs/gfdb:coverity issue fixSubha sree Mohankumar2017-11-131-2/+1
| | | | | | | | | | | Coverity Id:719,748,761 problem : The value of "ret" is overwritten in init_db and add_conection_node Change-Id: Iade8ca8d61c5e25e8c311b1375219f5f61d51bc3 BUG: 789278 Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>