summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tests: kill_brick should wait for brick status to become offlineAtin Mukherjee2018-08-101-10/+10
| | | | | | Change-Id: I52e8eec7f334af37de433c444f4ddfc876fa56cc Fixes: bz#1614088 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests: Add ability to preserve older tarball for retried testsShyamsundarR2018-08-091-0/+39
| | | | | | | | | | | | | | | | | | | When a test is retried, the cleanup directives overwrite the older tarball with the latest one, thus losing the logs from the failed run. This patch changes run-tests.sh to rename the older tarball when retrying a test, thus preserving the same. The tarball is renamed using a time stamp and optionally a trailing sequence number, in case the test fails within the very second. Although the sequence # is not strictly required as we retry only once, it provides a defence for any future enhancements to the same. Fixes: bz#1614062 Change-Id: I9afe486b0b6f6a26f2ad0642e38bc0ba15b3ecc9 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* tests: Set heal-timeout to 5 secondsPranith Kumar K2018-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Shd keeps doing heals in a loop until it heals at least one entry in the previous run. A heal is termed successful only if it heals both metadata and entry/data heal i.e. the entry needs to be completely healed by just that healer. In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old and brick-1 is new. After replace-brick only root-gfid will be present in brick-0's index 1) shd-thread corresponding to brick-0 does metadata heal, this creates root-gfid in brick-0's 'dirty' index. 2) Both healer threads corresponding to brick-0 and brick-1 now try to heal root-gfid and brick-1 gets the heal-domain lock. brick-0's shd-thread will experience a failure and it goes back to waiting for 10 minutes (cluster.heal-timeout). 3) When brick-1's healer-thread completes healing root-gfid it creates 5 files which create indices in brick-0, so until brick-0 doesn't trigger one more heal, heal won't happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser than cluster.heal-timeout, so decreasing this to 5 seconds so that the next heal is triggered which will do the heals. fixes bz#1613807 Change-Id: I881133fc28880d8615fbc4558a0dfa0dc63d7798 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* tests: Increase timeout for mpx restart crash testShyamsundarR2018-08-091-3/+6
| | | | | | | | | | | | | | | | | | | | In lcov based regression testing environments, all tests take more time than what occurs in centos7 regressions. Possibly due to code instrumentation for lcov purposes. Due to this the test, bug-1432542-mpx-restart-crash.t constantly times out. This patch increases the timeout for the same to enable lcov tests to pass on a more regular basis. It was also noted by Nithya that the test at times generated an OOM kill on the regression machines. In order to reduce runtime memory foot print of the tests, FUSE mounts are unmounted as soon as the required test is complete. Fixes: bz#1608568 Change-Id: I37f8d4b45807a69c52c7c7df4923c0fc33fab4e4 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* glusterd: more stricter checks of if brick is running in multiplex modeAtin Mukherjee2018-08-091-32/+39
| | | | | | | | | | | | | | | | | | | | | | | While gf_attach () utility can help in detaching a brick instance from the brick process which the kill_brick () function in tests/volume.rc uses it has a caveat which is as follows: 1. It doesn't ensure the respective brick is marked as stopped which glusterd does from glusterd_brick_stop 2. Sometimes if kill_brick () is executed just after a brick stack is up, the mgmt_rpc_notify () can take some time before marking priv->connected to 1 and before it if kill_brick () is executed, brick will fail to initiate the pmap_signout which would inturn cleans up the pidfile. To avoid such possibilities, a more stricter check on if a brick is running or not in brick multiplexing has been brought in now where it not only checks for its pid's existance but checks if the respective process has the brick instance associated with it before checking for brick's status. Change-Id: I98b92df949076663b9686add7aab4ec2f24ad5ab Fixes: bz#1595320 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests/bitrot: Fix tests/bitrot/bug-1373520.tKotresh HR2018-08-091-4/+13
| | | | | | | | | | | The test was failing with brick-mux enabled intermittently. As the test depends on lookup to recover file via heal, it's advisable to disable all perf xlators. Hence doing the same. fixes: bz#1611566 Change-Id: Ib7705e7951d53c435b8e390298164d73c6d71745 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* MAINTAINERS: Add Xavier Hernandez as peer for shard xlatorKrutika Dhananjay2018-08-071-0/+1
| | | | | | | | | | | | Shard module never had a peer, although Pranith reviewed most of the patches. Over the past few months, Xavier has reviewed shard patches - both big and small - and also found some great bugs in his reviews of some complex patches. Proposing that we add him as peer for shard translator. Change-Id: I29487052673f3738340764aa63bdd7586fb28def fixes: bz#1612017 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* tests: Add timeout option to run-tests.shShyamsundarR2018-08-061-1/+2
| | | | | | | | | | | Added a '-t' timeout option to run-tests.sh, to be able to set this to higher than the default 200 in case of lcov based tests, as those take more time due to instrumentations added by lcov. Change-Id: Ibaf70e881bfa94f35e822124bcf9849b309e7cc1 Updates: bz#1608564 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* performance/quick-read: don't update with stale data after invalidationRaghavendra G2018-08-042-44/+233
| | | | | | | | | | | | Once invalidated, make sure that only ops incident after invalidation update the cache. This makes sure that ops before invalidation don't repopulate cache with stale data. This patch also uses an internal counter instead of frame->root->unique for keeping track of generations. Change-Id: I6b38b141985283bd54b287775f3ec67b88bf6cb8 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* tests: fix online_brick_count functionAtin Mukherjee2018-08-031-1/+4
| | | | | | | | online_brick_count should discard Bitrot and Scrubber daemon. Change-Id: I301373ccdbeec1d1a5e6c6b137f48ed997f22556 Fixes: bz#1611103 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* posix: prevent crash when SEEK_DATA/HOLE is not supportedXavi Hernandez2018-08-032-4/+4
| | | | | | | | | Instead of not defining the 'seek' fop when it's not supported on the compilation platform, we simply return EINVAL when it's used. Fixes: bz#1611834 Change-Id: I253666d8910c5e2fffa3a3ba37085e5c1c058a8e Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* Revert "performance/readdir-ahead: Invalidate cached dentries if they're ↵Raghavendra G2018-08-036-600/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modified while in cache" This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c. With the latest master, I created a single brick volume and some files inside it. [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying again"; ls -l /mnt/fuse1 umount: /mnt/fuse1: not mounted total 0 ----------. 0 root root 0 Jan 1 1970 file-1 ----------. 0 root root 0 Jan 1 1970 file-2 ----------. 0 root root 0 Jan 1 1970 file-3 ----------. 0 root root 0 Jan 1 1970 file-4 ----------. 0 root root 0 Jan 1 1970 file-5 d---------. 0 root root 0 Jan 1 1970 subdir Trying again total 3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-1 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-2 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-4 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-5 d---------. 0 root root 0 Jan 1 1970 subdir [root@rhgs313-6 ~]# Conversation can be followed on gluster-devel on thread with subj: tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected pointed this patch as culprit. Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1390050
* performance/ob: stringent synchronization between rename/unlink and openRaghavendra G2018-08-032-67/+330
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue 1: ======== open all pending fds before resuming rename and unlink currently ob uses fd_lookup to find out the opened-behind. But, fd_lookup gives the recent fd opened on the inode, but the oldest fd(s) (there can be multiple fds opened-behind when the very first opens on an inode are issued in parallel) are the candidates for fds with pending opens on backend. So, this patch explictily tracks the opened-behind fds on an inode and opens them before resuming rename or unlink. similar code changes are also done for setattr and setxattr to make sure pending opens are complete before permission change. This patch also adds a check for an open-in-progress to ob_get_wind_fd. If there is already an open-in-progress, ob_get_wind_fd won't return an anonymous fd as a result. This is done to make sure that rename/unlink/setattr/setxattr don't race with an operation like readv/fstat on an anonymous fd already in progress. Issue 2: ======== once renamed/unlinked, don't open-behind any future opens on the same inode. Issue 3: ======== Don't use anonymous fds by default. Note that rename/unlink can race with a read/fd on anonymous fds and these operations can fail with ESTALE. So, for better consistency in default mode, don't use anonymous fds. If performance is needed with tradeoff of consistency, one can switch on the option "use-anonymous-fd" Change-Id: Iaf130db71ce61ac37269f422e348a45f6ae6e82c Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* glusterd: Bricks of a normal volumes should not attach to ↵Sanju Rakonde2018-08-033-33/+65
| | | | | | | | | | | | | | | | | | | gluster_shared_storage bricks Problem: In a brick multiplexing environment, Bricks of a normal volume created by user are getting attached to the bricks of a volume "gluster_shared_storage" which is created by enabling the enable-shared-storage option. Mounting gluster_shared_storage has strict authentication checks. when we attach bricks of a normal volume to bricks of gluster_shared_storage, mounting the normal volume created by user will fail due to strict authentication checks. Solution: We should not attach bricks of a normal volume to brick process of gluster_shared_storage volume and vice versa. fixes: bz#1610726 Change-Id: If1b5a2a02675789a2915ba480fb48c145449163d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* coverity: Fix remaining SECURE_TEMP issues reportedShyamsundarR2018-08-034-10/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Two pending SECURE_TEMP issues still exist in the coverity reports, these are fixed by this patch. In both instances (where functions actually seem to be duplicates of each other) the need was for a FILE * and not an fd. Applied the same pattern in both places as in other parts of the code where mkstemp was used and later a FILE * was created from the resulting fd for use. Coverity report: https://download.gluster.org/pub/gluster/ glusterfs/static-analysis/master/glusterfs-coverity/ 2018-07-30-4d3c62e7/html/ Issues numbered: 382, 383 (named SECURE_TEMP) Further added tmpfile to the blacklist, so that future code changes do not add the same, into symbol-check.sh. Also corrected shellcheck errors in symbol-check.sh as a result of updating the same. Updates: bz#789278 Change-Id: I1d572a16ca5b5df2f597aeaa5f454fad34c8296e Signed-off-by: ShyamsundarR <srangana@redhat.com>
* geo-rep/hook-script: Fix ssh/scp optionsKotresh HR2018-08-035-22/+77
| | | | | | | | | | | | | | Always use ssh and scp with "-oPasswordAuthentication=no" and "-oStrictHostKeyChecking=no" options. It might hang the post script otherwise leading geo-rep setup failure Also increased geo-rep timeout. Occasionally, it's taking more time to reach Active/Passive status. Especially, the first start after create. fixes: bz#1610405 Change-Id: I9560d64dbe0edf5db73446a9fc97dda19b88d233 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* performance/open-behind: don't use anonymous fds for reads by defaultRaghavendra G2018-08-022-1/+3
| | | | | | | | | | | | | | | | | | | | anonymous fds interfere with working of read-ahead as read-ahead won't be able to store its cache in fd. Also, as seen in bz 1455872, anonymous fds also affect performance of large file sequential reads as the cost of opening fd for each read on brick stack is significant. So, have a proper fd which enables read-ahead to store its cache and brick stack to reuse the fd during reads. With this change test tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t fails consistently. The failure can also be seen with open-behind off. bz 1611532 has been filed to track the issue with test. Thanks to Rafi <rkavunga@redhat.com> for assistance provided in debugging test failure. Change-Id: Ifa52d8ff017f115e83247f3396b9d27f0295ce3f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1455872
* performance/md-cache: update cache only from fops issued after previous ↵Raghavendra G2018-08-026-100/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | invalidation Invalidations are triggered mainly by two codepaths - upcall and write-behind unwinding a cached write with zeroed out stat. For the case of upcall, following race can happen: * stat s1 is fetched from brick * invalidation is detected on brick * invalidation is propagated to md-cache and cache is invalidated * s1 updates md-cache with a stale state For the case of write-behind, imagine following sequence of operations, * A stat s1 was issued from application thread t1 when size of file was s1 * stat s1 completes on brick stack, but yet to reach md-cache * A write w1 from application thread t2 extends file to size s2 is cached in write-behind and response is unwound with zeroed out stat * md-cache while handling write-cbk, invalidates cache * md-cache receives response for s1, updates cache with stale stat with size of s1 overwriting invalidation state Fix is to remember when s1 was incident on md-cache and update cache with results of s1 only if the it was incident after invalidation of cache. This patch identified some bugs in regression tests which is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1608158. As a stop gap measure I am marking following tests as bad basic/afr/split-brain-resolution.t bugs/bug-1368312.t bugs/replicate/bug-1238398-split-brain-resolution.t bugs/replicate/bug-1417522-block-split-brain-resolution.t bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t Change-Id: Ia4bb9dd36494944e2d91e9e71a79b5a3974a8c77 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* coding-standard: add points on structure paddingAmar Tumballi2018-08-021-0/+48
| | | | | | | | This is a recommendation for users, and reviewers can take a point from this. Updates: bz#1193929 Change-Id: Idcd778e42a886fd79b549da4927149a07573a20b Signed-off-by: Amar Tumballi <amarts@redhat.com>
* performance/write-behind: synchronize rename with cached writes on srcRaghavendra G2018-08-021-0/+40
| | | | | | | | | | | rename response contains a postbuf stat of src-inode. Since md-cache caches stat in rename codepath too, we've to make sure stat accounts any cached writes in write-behind. So, we make sure rename is resumed only after any cached writes are committed to backend. Change-Id: Ic9f2adf8edd0b58ebaf661f3a8d0ca086bc63111 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* socket: Remove code duplicationKrutika Dhananjay2018-08-021-69/+13
| | | | | | | | | | | | | While I was reading code, I saw that socket_submit_request() and socket_submit_reply() are identical except for @a_byte and the source of @msg object being different. This patch moves all of the common code to a new function with the differing vars passed as parameters by the callers. Change-Id: I7a62ae72f10c34dc8de01b250d89a77ec5ab490d fixes: bz#1608991 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* doc: keep just one copy of a coredump analysisAmar Tumballi2018-08-012-36/+5
| | | | | | | | | | Keeping two copies of the files means, one would be out-of-date soon, and users would always be confused about which one is the source of truth. Updates: bz#1193929 Change-Id: I568149732fdb9d282ccd583640eee9b9056963fd Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rpm: do not build glusterfs-resource-agents on el6Niels de Vos2018-07-291-0/+4
| | | | | | | | | | | glusterfs-resource-agents depends on glusterfs-server and this is not available on el6 with the current Gluster releases. It is not possible to install glusterfs-resource-agents on el6 without running into dependency problems, so do not build the sub-packae at all. Change-Id: Ibe08ad3a1b7882559b4e445603d0508b9282b755 Fixes: bz#1609551 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* performance/write-behind: better invalidation in readdirpRaghavendra G2018-07-283-24/+36
| | | | | | | | | | | | | | | | | | | | | | | | Current invalidation of stats in wb_readdirp_cbk is prone to races. As the deleted comment explains, <snip> We cannot guarantee integrity of entry->d_stat as there are cached writes. The stat is most likely stale as it doesn't account the cached writes. However, checking for non-empty liability list here is not a fool-proof solution as there can be races like, 1. readdirp is successful on posix 2. sync of cached write is successful on posix 3. write-behind received sync response and removed the request from liability queue 4. readdirp response is processed at write-behind. In the above scenario, stat for the file is sent back in readdirp response but it is stale. </snip> Change-Id: I6ce170985cc6ce3df2382ec038dd5415beefded5 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* build: remove bundled arg-standaloneNiels de Vos2018-07-2828-7003/+15
| | | | | | | | | | | | | | libargp or argp-standalone is available on all commonly used distributions. There is no need to bundle an unmaintained version of argp-standalone in this repository anymore. FreeBSD places the argp.h file in /usr/local/include when argp-standalone is installed. This path is not added to CPPFLAGS by default, so thats done in configure.ac as well. Change-Id: I384a53ab0a008ec9d48fd83afeaf8fbc197e91ee Fixes: bz#1609337 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* performance/readdir-ahead: Invalidate cached dentries if they're modified ↵Krutika Dhananjay2018-07-286-22/+600
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while in cache PROBLEM: Entries that are readdirp'd ahead can undergo modification in terms of writes, truncates which could modify their iatts. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: Whenever a dentry undergoes modification, in the cbk of the modification fop, a "dirty" flag (default 0) is set in its inode ctx. When it's time for readdir-ahead to serve these entries, it will read the inode ctx and check if the entry is "dirty", and if it is, set the entry's attrs to all zeroes, as an indicator to fuse, md-cache etc not to cache these attributes. Also there is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, fuse-bridge is made to drop entries for which dentry->inode is not the same as linked inode, in readdirp cbk. Change-Id: If7396507632b5268442ca580473d5155fee9cbef BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* rpc: fix return value in rpc destroyZhang Huan2018-07-281-0/+2
| | | | | | Change-Id: I73a113e2d40f508fd53b273a990a2371692c87bf fixes: bz#1607689 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* rpc: add missing free of rpc->dnscacheZhang Huan2018-07-281-0/+8
| | | | | | Change-Id: I3fa97b99bf23459cf548205d75d2cc7936b2310e fixes: bz#1607689 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* libgfapi: fix memory leak on old volume filesZhang Huan2018-07-282-2/+5
| | | | | | | | | | | Fix missing free of fs->oldvolfile. This patch uses standard calloc/realloc to allocate memory for vol file. As by the time fs struct is destroyed, the THIS->ctx is already gone, that causes invalid memory access. Change-Id: I72ae19c76eb16e61f077714f7e52405fee721997 fixes: bz#1607689 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* libgfapi: add missing unref of mgmt client in glfs_finiZhang Huan2018-07-281-0/+1
| | | | | | Change-Id: I89cdf14cb9d822eaf7c01cf0b0220b423eb5b705 fixes: bz#1607689 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* build: remove uuid from contrib/Niels de Vos2018-07-2722-1903/+21
| | | | | | | | | | | | | | Bundling libuuid is not needed anymore, all current distributions provide it now. Some OS's provide their own uuid_*() functions in libc. These may not be fully compatible with libuuid.so found on Linux systems. In that case, either e2fsprogs-libuuid can be installed, or support for the native uuid_*() functions can be added to libglusterfs/src/compat-uuid.h. Change-Id: Icfa48caea81307a3bca549364969c2038911942b Fixes: bz#1607319 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* build: rename event.h to gf-event.hNiels de Vos2018-07-2721-22/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | Newer FreeBSD versions (noticed with 10.3-RELEASE) provide a event.h file that on occasion gets included instead of the libglusterfs file. When this happens, 'struct event_pool' will not be defined and building will fail with errors like: autoscale-threads.c:18:55: error: incomplete definition of type 'struct event_pool' int thread_count = pool->eventthreadcount; ~~~~^ autoscale-threads.c:17:16: note: forward declaration of 'struct event_pool' struct event_pool *pool = ctx->event_pool; ^ This problem is caused by 'pkg-config --cflags uuid' that adds /usr/local/include to the GF_CPPFLAGS. The use of libuuid is preferred so that the contrib/uuid/ directory can be removed. By renaming event.h to gf-event.h there is no conflict between the different event.h files anymore and compiling on FreeBSD works without issues. Change-Id: Ie69f6b8a4f8f8e9630d39a86693eb74674f0f763 Updates: bz#1607319 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* stack: Reduce stack usage for local variables to store tmpfile namesShyamsundarR2018-07-274-22/+62
| | | | | | | | | | This patch moves stack based PATH_MAX allocations for tmpfile names, to heap allocated names instead. Reducing the impact on stack space used and accruing benefits thereof. Change-Id: I646d9cb091018de6768b3523902788fa2ba14d96 Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* coverity: Ignore most of SECURE_TEMP issuesShyamsundarR2018-07-277-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | mkstemp as per the Linux man page, uses 0600 as the permission bits when creating the file. This is hence safe and a Coverity warning that should be ignored. Further, we are mostly a multi-threaded program in all our daemons and cannot set and unset umask at will in a multi-threaded program, to address the coverity issue. This change attempts to nudge coverity to ignore this warning, using the pattern, /* coverity[EVENT_TAG_NAME] ... */ <line of code that has the issue> This commit is an experiment, if post merge the next coverity report ignores these errors, the above pattern (as found using an internet search) works and can be applied to certain other warnings as well. Change-Id: I73a184ce1a54dd9e66542952b1190a74438c826a Updates: bz#789278 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* rpc: merge ssl infra with epoll infraMilind Changire2018-07-272-748/+814
| | | | | | | | | | | | | | | | | Patch attempts to use the epoll infra for handling SSL connections as well instead of the socket_poller() thread func. This essentially makes priv->own_thread flag redundant. SSL_connect()/SSL_accept() is now non-blocking which has done away with the localised poll() in ssl_do(). So, ssl_do() has been updated appropriately. own_thread and coincidently socket_poller() thread for SSL processing is now deprecated. Change-Id: I1ce54c06ddb643c16baa143598e7e4fbf16bae0a fixes: bz#1561332 Signed-off-by: Milind Changire <mchangir@redhat.com>
* glusterd: Add multiple checks before attach/start a brickMohit Agrawal2018-07-2711-50/+329
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In brick mux scenario sometime glusterd is not able to start/attach a brick and gluster v status shows brick is already running Solution: 1) To make sure brick is running check brick_path in /proc/<pid>/fd , if a brick is consumed by the brick process it means brick stack is come up otherwise not 2) Before start/attach a brick check if a brick is mounted or not 3) At the time of printing volume status check brick is consumed by any brick process Test: To test the same followed procedure 1) Setup brick mux environment on a vm 2) Put a breaking point in gdb in function posix_health_check_thread_proc at the time of notify GF_EVENT_CHILD_DOWN event 3) unmount anyone brick path forcefully 4) check gluster v status it will show N/A for the brick 5) Try to start volume with force option, glusterd throw message "No device available for mount brick" 6) Mount the brick_root path 7) Try to start volume with force option 8) down brick is started successfully Change-Id: I91898dad21d082ebddd12aa0d1f7f0ed012bdf69 fixes: bz#1595320 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* maintainers: change in glusto ownershipNigel Babu2018-07-261-4/+4
| | | | | | | | | | | | As the project has matured, Vijay and Akarsha have stepped up significantly. Nigel is stepping down to peer now that there are capable people who will be actively focusing on Glusto. Shwetha is no longer actively contributing and has therefore agreed to step down from peer. Change-Id: I089eb02857d2ea353e811cb8bdf71edda96fa041 Fixes: bz#1608684
* sdfs: Fix missing NULL option list treminationShyamsundarR2018-07-251-0/+1
| | | | | | | | | | | | | Option list for volume_options in sdfs was not NULL terminated. This resulted in a crash when running in lcov based builds. This is rectified by this patch. fixes: bz#1608566 Change-Id: I5d8730f1ae963ed6adf21d970e4921c5d5d92f62 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* georep: fix hard-coded paths in gsyncd.conf.inKaleb S. KEITHLEY2018-07-251-2/+2
| | | | | | | | | | | | | | | | | This is part of the reason why we use autoconf (i.e. configure). For an ordinary clone+autogen.sh+configure SBIN_DIR is /usr/local/sbin; for an rpm or dpkg build it will be /usr/sbin. I wonder how many more are lurking in our sources? /usr/libexec is one that frequently bites us on Debian and Ubuntu, which don't have /usr/libexec. (But it's all Linux, right?) See https://bugzilla.redhat.com/show_bug.cgi?id=1601532 Reported-by: lohmaier+rhbz@gmail.com Change-Id: I6523894416cc06236ea1f99529efd36e957bd98e updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* rpc: rpc_clnt_connection_cleanup is crashed due to double freeMohit Agrawal2018-07-251-3/+17
| | | | | | | | | | | | | Problem: gfapi client is getting crashed in rpc_clnt_connection_cleanup at the time of destroying saved_frames Solution: gfapi client is getting crashed because saved_frame ptr is already freed in rpc_clnt_destroy.To avoid the same update code in rpc_clnt_destroy Change-Id: Id8cce102b49f26cfd86ef88257032ed98f43192b fixes: bz#1607783 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd: Coverity issues with type FORWARD_NULLSanju Rakonde2018-07-244-11/+11
| | | | | | | | | | | This patch fixes coverity issues 102, 103, 112 and 119 from [1] [1] https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2018-07-23-5fa004f3/html/ Updates: bz#789278 Change-Id: I99762eb0bcbd974a5250434777db63520f2ce2e6 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* glusterd: Deadcode Coverity issueOshank Kumar2018-07-241-2/+0
| | | | | | | | | | | | | | | | | | | This patch will fix coverity issue 74 from [1]. we are updating ret value line number 5011, and immediately checking whether ret is having non zero value at line number 5013.If ret is 0, then only we continue to execute and we can reach line number 5036. By the time we reach 5036, ret value is always 0. So this block of code is redundant here and removing it. [1] https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2018-07-23-5fa004f3/html/ Updates: bz#789278 Change-Id: Ia6e8ba2936e350f0d29a9151ab786622f5e750db Signed-off-by: Oshank Kumar <okumar@redhat.com>
* features/shard: Make lru limit of inode list configurableKrutika Dhananjay2018-07-234-4/+75
| | | | | | | | | | | | | | | Currently this lru limit is hard-coded to 16384. This patch makes it configurable to make it easier to hit the lru limit and enable testing of different cases that arise when the limit is reached. The option is features.shard-lru-limit. It is by design allowed to be configured only in init() but not in reconfigure(). This is to avoid all the complexity associated with eviction of least recently used shards when the list is shrunk. Change-Id: Ifdcc2099f634314fafe8444e2d676e192e89e295 updates: bz#1605056 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* storage/posix: Avoid log flood in posix_set_parent_ctime()Vijay Bellur2018-07-231-4/+0
| | | | | | | | | | | | posix_set_parent_ctime() unconditionally logs an error if consistent time attributes is not enabled. This log does not add any value, prints an incorrect errno & floods the log file. Hence nuking this log message in this patch. Change-Id: I82a78f2f8ce5ab518f8cdf6d9086a97049712f75 fixes: bz#1607049 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep : fix possible crashSunny Kumar2018-07-231-2/+5
| | | | | | | | | | | | Problem : In 'glusterd_verify_slave' while tokenizing error message we call 'strtok_r' and store return value in 'tmp' which can be NULL. We are passing this 'tmp' as 1st argument to 'strcmp' which will lead to segmentation fault. Solution : before calling 'strcmp' we should NULL check 'tmp'. Change-Id: Ifd3864b904afe6cd09d9e5a4b55c6d0578e22b9d fixes: bz#1602121 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* core (named threads): flood of -Wformat-truncation warnings with gcc-7.1Kaleb S. KEITHLEY2018-07-237-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Starting in Fedora 26 which has gcc-7.1.x, -Wformat-trunction is enabled with -Wformat, resulting in a flood of new warnings. This many warnings is a concern because it makes it hard(er) to see other warnings that should be addressed. An example is at https://kojipkgs.fedoraproject.org//packages/glusterfs/3.12.0/1.fc28/data/logs/x86_64/build.log For more info see https://review.gluster.org/#/c/18267/ I can't find much (or good) documentation on the heuristics the compiler uses for this warning. In the case of printing integer types it appears it looks at the available space in the destination and the range of values for the variable and/or its type. To address the specific question about why 0x3ff versus 0xfff to mask the value, either would suffice to hint to the compiler that the printed value will fit in three characters. But the loop is from 0...1023 (or 0...0x3ff if you prefer) so I chose that as a more "accurate" mask to use as it exactly matches the range of values of the loop. Fixes: bz#1492847 Change-Id: I6e309ba42159841131d8241bfc0566ef09e00aa9
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-22173-428/+432
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* quota: new volume set option to track of quota in GD2Hari Gowtham2018-07-211-0/+10
| | | | | | | | | | quota enable as volume set needs a new option to keep track of it. Bugzilla ID:1600812 Change-Id: Ib8d770936bafe859f80e717409bd861760090e59 fixes: bz#1600812 Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
* geo-rep: Fix issues with gfid conflict handlingKotresh HR2018-07-203-64/+169
| | | | | | | | | | | | | | | | | | | | | | | 1. MKDIR/RMDIR is recorded on all bricks. So if one brick succeeds creating it, other bricks should ignore it. But this was not happening. The fix rename of directories in hybrid crawl, was trying to rename the directory to itself and in the process crashing with ENOENT if the directory is removed. 2. If file is created, deleted and a directory is created with same name, it was failing to sync. Again the issue is around the fix for rename of directories in hybrid crawl. Fixed the same. If the same case was done with hardlink present for the file, it was failing. This patch fixes that too. fixes: bz#1598884 Change-Id: I6f3bca44e194e415a3d4de3b9d03cc8976439284 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* glusterd-quota.c: fix coverity warning (BAD_COMPARE)Yaniv Kaul2018-07-201-1/+1
| | | | | | | | | See https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2018-07-13-1718f9c6/html/1/6glusterd-quota.c.html#error Only compile tested! Change-Id: Ief42f9fcdb02ad001bd39c4a6e27e7fa86fd2496 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>