summaryrefslogtreecommitdiffstats
path: root/xlators/features/shard
Commit message (Collapse)AuthorAgeFilesLines
* Land part 2 of clang-format changesGluster Ant2018-09-121-5518/+5444
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* Land clang-format changesGluster Ant2018-09-123-315/+307
| | | | Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
* Multiple files: calloc -> mallocYaniv Kaul2018-09-042-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators/storage/posix/src/posix-inode-fd-ops.c: xlators/storage/posix/src/posix-helpers.c: xlators/storage/bd/src/bd.c: xlators/protocol/client/src/client-lk.c: xlators/performance/quick-read/src/quick-read.c: xlators/performance/io-cache/src/page.c xlators/nfs/server/src/nfs3-helpers.c xlators/nfs/server/src/nfs-fops.c xlators/nfs/server/src/mount3udp_svc.c xlators/nfs/server/src/mount3.c xlators/mount/fuse/src/fuse-helpers.c xlators/mount/fuse/src/fuse-bridge.c xlators/mgmt/glusterd/src/glusterd-utils.c xlators/mgmt/glusterd/src/glusterd-syncop.h xlators/mgmt/glusterd/src/glusterd-snapshot.c xlators/mgmt/glusterd/src/glusterd-rpc-ops.c xlators/mgmt/glusterd/src/glusterd-replace-brick.c xlators/mgmt/glusterd/src/glusterd-op-sm.c xlators/mgmt/glusterd/src/glusterd-mgmt.c xlators/meta/src/subvolumes-dir.c xlators/meta/src/graph-dir.c xlators/features/trash/src/trash.c xlators/features/shard/src/shard.h xlators/features/shard/src/shard.c xlators/features/marker/src/marker-quota.c xlators/features/locks/src/common.c xlators/features/leases/src/leases-internal.c xlators/features/gfid-access/src/gfid-access.c xlators/features/cloudsync/src/cloudsync-plugins/src/cloudsyncs3/src/libcloudsyncs3.c xlators/features/bit-rot/src/bitd/bit-rot.c xlators/features/bit-rot/src/bitd/bit-rot-scrub.c bxlators/encryption/crypt/src/metadata.c xlators/encryption/crypt/src/crypt.c xlators/performance/md-cache/src/md-cache.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. In some cases, also changed allocation size to be sizeof some struct or type instead of a pointer - easier to read. In some cases, removed redundant strlen() calls by saving the result into a variable. 1. Only done for the straightforward cases. There's room for improvement. 2. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! .. and allocate memory as much as needed. xlators/nfs/server/src/mount3.c : Don't blindly allocate PATH_MAX, but strlen() the string and allocate appropriately. Also, align error messges. updates: bz#1193929 Original-Author: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ibda6f33dd180b7f7694f20a12af1e9576fe197f5
* xlators: move from strlen() to sizeof()Yaniv Kaul2018-08-311-2/+2
| | | | | | | | | | | | | | | | | | xlators/features/index/src/index.c xlators/features/shard/src/shard.c xlators/features/upcall/src/upcall-internal.c xlators/mgmt/glusterd/src/glusterd-bitrot.c xlators/mgmt/glusterd/src/glusterd-locks.c xlators/mgmt/glusterd/src/glusterd-mountbroker.c xlators/mgmt/glusterd/src/glusterd-op-sm.c For const strings, just do compile time size calc instead of runtime. Compile-tested only! Change-Id: I995b2b89f14454b3855a4cd0ca90b3f01d5e080f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* feature/shard: Fix Coverity issueAshish Pandey2018-08-271-7/+5
| | | | | | | | | | | | | | | | | | | | | Fix following coverity issues- CID: 1394660 1394668 1394667 1389008 1389434 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=84880983&defectInstanceId=25821108&mergedDefectId=1389008 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=84880983&defectInstanceId=25821101&mergedDefectId=1389434 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=84880983&defectInstanceId=25821001&mergedDefectId=1394660 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=84880983&defectInstanceId=25821010&mergedDefectId=1394667 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=84880983&defectInstanceId=25821017&mergedDefectId=1394668 Change-Id: I08f09649dbe758ba0d367ae5330b48b18784dec3 updates: bz#789278 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* features/shard: Fix crash and test case in RENAME fopKrutika Dhananjay2018-08-141-2/+5
| | | | | | | | | | | | | | Setting the refresh flag in inode ctx in shard_rename_src_cbk() is applicable only when the dst file exists and is sharded and has a hard link > 1 at the time of rename. But this piece of code is exercised even when dst doesn't exist. In this case, the mount crashes because local->int_inodelk.loc.inode is NULL. Change-Id: Iaf85a5ee3dff8b01a76e11972f10f2bb9dcbd407 Updates: bz#1611692 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Make lru limit of inode list configurableKrutika Dhananjay2018-07-232-4/+21
| | | | | | | | | | | | | | | Currently this lru limit is hard-coded to 16384. This patch makes it configurable to make it easier to hit the lru limit and enable testing of different cases that arise when the limit is reached. The option is features.shard-lru-limit. It is by design allowed to be configured only in init() but not in reconfigure(). This is to avoid all the complexity associated with eviction of least recently used shards when the list is shrunk. Change-Id: Ifdcc2099f634314fafe8444e2d676e192e89e295 updates: bz#1605056 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Perform shards deletion in the backgroundKrutika Dhananjay2018-06-203-159/+685
| | | | | | | | | | | | | | | A synctask is created that would scan the indices from .shard/.remove_me, to delete the shards associated with the gfid corresponding to the index bname and the rate of deletion is controlled by the option features.shard-deletion-rate whose default value is 100. The task is launched on two accounts: 1. when shard receives its first-ever lookup on the volume 2. when a rename or unlink deleted an inode Change-Id: Ia83117230c9dd7d0d9cae05235644f8475e97bc3 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Introducing ".shard/.remove_me" for atomic shard deletion ↵Krutika Dhananjay2018-06-134-421/+1040
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (part 1) PROBLEM: Shards are deleted synchronously when a sharded file is unlinked or when a sharded file participating as the dst in a rename() is going to be replaced. The problem with this approach is it makes the operation really slow, sometimes causing the application to time out, especially with large files. SOLUTION: To make this operation atomic, we introduce a ".remove_me" directory. Now renames and unlinks will simply involve two steps: 1. creating an empty file under .remove_me named after the gfid of the file participating in unlink/rename 2. carrying out the actual rename/unlink A synctask is created (more on that in part 2) to scan this directory after every unlink/rename operation (or upon a volume mount) and clean up all shards associated with it. All of this happens in the background. The task takes care to delete the shards associated with the gfid in .remove_me only if this gfid doesn't exist in backend, ensuring that the file was successfully renamed/unlinked and its shards can be discarded now safely. Change-Id: Ia1d238b721a3e99f951a73abbe199e4245f51a3a updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Fix missing unlock in shard_fsync_shards_cbk()Vijay Bellur2018-06-011-0/+1
| | | | | | | updates: bz#789278 Change-Id: I745a98e957cf3c6ba69247fcf6b58dd05cf59c3c Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Add option to barrier parallel lookup and unlink of shardsKrutika Dhananjay2018-04-232-28/+89
| | | | | | | | | Also move the common parallel unlink callback for GF_FOP_TRUNCATE and GF_FOP_FTRUNCATE into a separate function. Change-Id: Ib0f90a5f62abdfa89cda7bef9f3ff99f349ec332 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Make operations on internal directories genericKrutika Dhananjay2018-04-182-92/+206
| | | | | | Change-Id: Iea7ad2102220c6d415909f8caef84167ce2d6818 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Do list_del_init() while list memory is validPranith Kumar K2018-03-201-1/+1
| | | | | | | | | | | | | | | | | Problem: shard_post_lookup_fsync_handler() goes over the list of inode-ctx that need to be fsynced and in cbk it removes each of the inode-ctx from the list. When the first member of list is removed it tries to modifies list head's memory with the latest next/prev and when this happens, there is no guarantee that the list-head which is from stack memory of shard_post_lookup_fsync_handler() is valid. Fix: Do list_del_init() in the loop before winding fsync. BUG: 1557876 Change-Id: If429d3634219e1a435bd0da0ed985c646c59c2ca Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* features/shard: Upon FSYNC from upper layers, wind fsync on all changed shardsKrutika Dhananjay2018-03-053-38/+504
| | | | | | Change-Id: Ib74354f57a18569762ad45a51f182822a2537421 BUG: 1468483 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Fix shard inode refcount when it's part of priv->lru_list.Krutika Dhananjay2018-03-021-9/+17
| | | | | | | | | | | For as long as a shard's inode is in priv->lru_list, it should have a non-zero ref-count. This patch achieves it by taking a ref on the inode when it is added to lru list. When it's time for the inode to be evicted from the lru list, a corresponding unref is done. Change-Id: I289ffb41e7be5df7489c989bc1bbf53377433c86 BUG: 1468483 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* glusterfsd: Memleak in glusterfsd process while brick mux is onMohit Agrawal2018-02-271-0/+3
| | | | | | | | | | | | | | | | | | Problem: At the time of stopping the volume while brick multiplex is enabled memory is not cleanup from all server side xlators. Solution: To cleanup memory for all server side xlators call fini in glusterfs_handle_terminate after send GF_EVENT_CLEANUP notification to top xlator. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/19574) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc
* features/shard: Leverage block_num info in inode-ctx in read callbackKrutika Dhananjay2018-02-271-18/+3
| | | | | | | | | ... instead of adding this information in fd_ctx in call path and retrieving it again in the callback. Change-Id: Ibbddbbe85baadb7e24aacf5ec8a1250d493d7800 BUG: 1468483 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Pass the correct block-num to store in inode ctxKrutika Dhananjay2018-02-271-1/+1
| | | | | | Change-Id: Icf3a5d0598a081adb7d234a60bd15250a5ce1532 BUG: 1468483 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: shard options for GD2.Krutika Dhananjay2018-01-261-0/+3
| | | | | | | Updates #302 Change-Id: Ife21440ffcf5805ce5858360dc94a456ead891e5 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* core: fix some of the dict_{get,set} with proper APIsAmar Tumballi2018-01-171-7/+7
| | | | | | | updates #220 Change-Id: I6e25dbb69b2c7021e00073e8f025d212db7de0be Signed-off-by: Amar Tumballi <amarts@redhat.com>
* all: Simplify component message id's definitionXavier Hernandez2017-12-141-167/+30
| | | | | | | | | This patch creates a new way of defining message id's that is easier and less error prone because it doesn't require so many manual changes each time a new component is defined or a new message created. Change-Id: I71ba8af9ac068f5add7e74f316a2478bc991c67b Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* features/shard: Coverity fixesKrutika Dhananjay2017-11-081-3/+6
| | | | | | | | This patch fixes coverity issues 242 and 453. Change-Id: If18f40539dccc7c2fcdcf8ef9b6fa3efbb3e462f BUG: 789278 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Change default shard-block-size to 64MBKrutika Dhananjay2017-09-141-1/+1
| | | | | | | | | | | Change-Id: I55fa87e07136cff10b0d725ee24dd3151016e64e BUG: 1489823 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/18243 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Sunil Kumar Acharya <sheggodu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/shard: Increment counts in locksPranith Kumar K2017-09-061-2/+10
| | | | | | | | | | | | | | | | | | | Problem: Because create_count/eexist_count are incremented without locks, all the shards may not be created because call_count will be lesser than what it needs to be. This can lead to crash in shard_common_inode_write_do() because inode on which we want to do fd_anonymous() is NULL Fix: Increment the counts in frame->lock Change-Id: Ibc87dcb1021e9f4ac2929f662da07aa7662ab0d6 BUG: 1488354 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/18203 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/shard: Return aggregated size in stbuf of LINK fopKrutika Dhananjay2017-09-061-2/+42
| | | | | | | | | | Change-Id: I42df7679d63fec9b4c03b8dbc66c5625f097fac0 BUG: 1488546 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/18209 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Remove ctx from LRU in shard_forgetPranith Kumar K2017-06-301-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: There is a race when the following two commands are executed on the mount in parallel from two different terminals on a sharded volume, which leads to use-after-free. Terminal-1: while true; do dd if=/dev/zero of=file1 bs=1M count=4; done Terminal-2: while true; do cat file1 > /dev/null; done In the normal case this is the life-cycle of a shard-inode 1) Shard is added to LRU when it is first looked-up 2) For every operation on the shard it is moved up in LRU 3) When "unlink of the shard"/"LRU limit is hit" happens it is removed from LRU But we are seeing a race where the inode stays in Shard LRU even after it is forgotten which leads to Use-after-free and then some memory-corruptions. These are the steps: 1) Shard is added to LRU when it is first looked-up 2) For every operation on the shard it is moved up in LRU Reader-handler Truncate-handler 1) Reader handler needs shard-x to be read. 1) Truncate has just deleted shard-x 2) In shard_common_resolve_shards(), it does inode_resolve() and that leads to a hit in LRU, so it is going to call __shard_update_shards_inode_list() to move the inode to top of LRU 2) shard-x gets unlinked from the itable and inode_forget(inode, 0) is called to make sure the inode can be purged upon last unref 3) when __shard_update_shards_inode_list() is called it finds that the inode is not in LRU so it adds it back to the LRU-list Both these operations complete and call inode_unref(shard-x) which leads to the inode getting freed and forgotten, even when it is in Shard LRU list. When more inodes are added to LRU, use-after-free will happen and it leads to undefined behaviors. Fix: I see that the inode can be removed from LRU even by the protocol layers like gfapi/gNFS when LRU limit is reached. So it is better to add a check in shard_forget() to remove itself from LRU list if it exists. BUG: 1466037 Change-Id: Ia79c0c5c9d5febc56c41ddb12b5daf03e5281638 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17644 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: Handle offset in appending writesPranith Kumar K2017-05-271-42/+67
| | | | | | | | | | | | | | | | | | | | | When a file is opened with append, all writes are appended at the end of file irrespective of the offset given in the write syscall. This needs to be considered in shard size update function and also for choosing which shard to write to. At the moment shard piggybacks on queuing from write-behind xlator for ordering of the operations. So if write-behind is disabled and two parallel appending-writes come both of which can increase the file size beyond shard-size the file will be corrupted. BUG: 1455301 Change-Id: I9007e6a39098ab0b5d5386367bd07eb5f89cb09e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17387 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/shard: Set size in inode ctx before size update for truncate tooKrutika Dhananjay2017-05-101-6/+8
| | | | | | | | | | | Change-Id: I7e984bb0f50c7d42764c0648e697d94d6c768dc7 BUG: 1448299 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17184 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* features/shard: Initialize local->fop in readvKrutika Dhananjay2017-04-111-0/+1
| | | | | | | | | | | | Change-Id: I9008ca9960df4821636501ae84f93a68f370c67f BUG: 1440051 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17014 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Fix vm corruption upon fix-layoutKrutika Dhananjay2017-04-072-59/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shard's writev implementation, as part of identifying presence of participant shards that aren't in memory, first sends an MKNOD on these shards, and upon EEXIST error, looks up the shards before proceeding with the writes. The VM corruption was caused when the following happened: 1. DHT had n subvolumes initially. 2. Upon add-brick + fix-layout, the layout of .shard changed although the existing shards under it were yet to be migrated to their new hashed subvolumes. 3. During this time, there were writes on the VM falling in regions of the file whose corresponding shards were already existing under .shard. 4. Sharding xl sent MKNOD on these shards, now creating them in their new hashed subvolumes although there already exist shard blocks for this region with valid data. 5. All subsequent writes were wound on these newly created copies. The net outcome is that both copies of the shard didn't have the correct data. This caused the affected VMs to be unbootable. FIX: For want of better alternatives in DHT, the fix changes shard fops to do a LOOKUP before the MKNOD and upon EEXIST error, perform another lookup. Change-Id: I8a2e97d91ba3275fbc7174a008c7234fa5295d36 BUG: 1440051 RCA'd-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Reported-by: Mahdi Adnan <mahdi.adnan@outlook.com> Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17010 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* shared: simplify and fix crash when we run out of memoryMichael Scherer2017-04-061-5/+1
| | | | | | | | | | | | | | | | | | | Since mem_get0 can return NULL, local->op_ret is gonna crash. Found by coverity. And since we only have ENOMEM as potential error, we can also simplify the code by avoiding using 'local' for that. Change-Id: I778747b57f520b1a52347c0fc9f27efd7a7c5ca0 BUG: 789278 Signed-off-by: Michael Scherer <misc@redhat.com> Reviewed-on: https://review.gluster.org/16739 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Michael Scherer <misc@fedoraproject.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Pass the correct iatt for cache invalidationKrutika Dhananjay2017-03-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This fixes a performance issue with shard which was causing the translator to trigger unusually high number of lookups for cache invalidation even when there was no modification to the file. In shard_common_stat_cbk(), it is local->prebuf that contains the aggregated size and block count as opposed to buf which only holds the attributes for the physical copy of base shard. Passing buf for inode_ctx invalidation would always set refresh to true since the file size in inode ctx contains the aggregated size and would never be same as @buf->ia_size. This was leading to every write/read being preceded by a lookup on the base shard even when the file underwent no modification. Change-Id: Ib0349291d2d01f3782d6d0bdd90c6db5e0609210 BUG: 1436739 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/16961 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Fix EIO error on add-brickKrutika Dhananjay2017-02-232-19/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DHT seems to link inode during lookup even before initializing inode ctx with layout information, which comes after directory healing. Consider two parallel writes. As part of the first write, shard sends lookup on .shard which in its return path would cause DHT to link .shard inode. Now at this point, when a second write is wound, inode_find() of .shard succeeds and as a result of this, shard goes to create the participant shards by issuing MKNODs under .shard. Since the layout is yet to be initialized, mknod fails in dht call path with EIO, leading to VM pauses. The fix involves shard maintaining a flag to denote whether a fresh lookup on .shard completed one network trip. If it didn't, all inode_find()s in fop path will be followed by a lookup before proceeding with the next stage of the fop. Big thanks to Raghavendra G and Pranith Kumar K for the RCA and subsequent inputs and feedback on the patch. Change-Id: I9383ec7e3f24b34cd097a1b01ad34e4eeecc621f BUG: 1420623 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/14419 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* features/shard: Put onus of choosing the inode to resolve on individual fopsKrutika Dhananjay2017-02-232-26/+21
| | | | | | | | | | | | | | | ... as opposed to adding checks in "common" functions to choose the inode to resolve based local->fop, which is rather ugly and prone to errors. Change-Id: Ia46cc59992baa2979516369cb72d8991452c0274 BUG: 1420623 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/16709 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Fill loc.pargfid too for named lookups on individual shardsKrutika Dhananjay2016-11-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a sharded volume when a brick is replaced while IO is going on, named lookup on individual shards as part of read/write was failing with ENOENT on the replaced brick, and as a result AFR initiated name heal in lookup callback. But since pargfid was empty (which is what this patch attempts to fix), the resolution of the shards by protocol/server used to fail and the following pattern of logs was seen: Brick-logs: [2016-11-08 07:41:49.387127] W [MSGID: 115009] [server-resolve.c:566:server_resolve] 0-rep-server: no resolution type for (null) (LOOKUP) [2016-11-08 07:41:49.387157] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null) (00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82) ==> (Invalid argument) [Invalid argument] Client-logs: [2016-11-08 07:41:27.497687] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.497755] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.498500] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.499680] E [MSGID: 133010] Also, this patch makes AFR by itself choose a non-NULL pargfid even if its ancestors fail to initialize all pargfid placeholders. Change-Id: I5f85b303ede135baaf92e87ec8e09941f5ded6c1 BUG: 1392445 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15788 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* build: out-of-tree builds generates files in the wrong directoryKaleb S KEITHLEY2016-09-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And minor cleanup of a few of the Makefile.am files while we're at it. Rewrite the make rules to do what xdrgen does. Now we can get rid of xdrgen. Note 1. netbsd6's sed doesn't do -i. Why are we still running smoke tests on netbsd6 and not netbsd7? We barely support netbsd7 as it is. Note 2. Why is/was libgfxdr.so (.../rpc/xdr/src/...) linked with libglusterfs? A cut-and-paste mistake? It has no references to symbols in libglusterfs. Note3. "/#ifndef\|#define\|#endif/" (note the '\'s) is a _basic_ regex that matches the same lines as the _extended_ regex "/#(ifndef|define|endif)/". To match the extended regex sed needs to be run with -r on Linux; with -E on *BSD. However NetBSD's and FreeBSD's sed helpfully also provide -r for compatibility. Using a basic regex avoids having to use a kludge in order to run sed with the correct option on OS X. Note 4. Not copying the bit of xdrgen that inserts copyright/license boilerplate. AFAIK it's silly to pretend that machine generated files like these can be copyrighted or need license boilerplate. The XDR source files have their own copyright and license; and their copyrights are bound to be more up to date than old boilerplate inserted by a script. From what I've seen of other Open Source projects -- e.g. gcc and its C parser files generated by yacc and lex -- IIRC they don't bother to add copyright/license boilerplate to their generated files. It appears that it's a long-standing feature of make (SysV, BSD, gnu) for out-of-tree builds to helpfully pretend that the source files it can find in the VPATH "exist" as if they are in the $cwd. rpcgen doesn't work well in this situation and generates files with "bad" #include directives. E.g. if you `rpcgen ../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.x`, you get an #include directive in the generated .c file like this: ... #include "../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.h" ... which (obviously) results in compile errors on out-of-tree build because the (generated) header file doesn't exist at that location. Compared to `rpcgen ./glusterfs3-xdr.x` where you get: ... #include "glusterfs3-xdr.h" ... Which is what we need. We have to resort to some Stupid Make Tricks like the addition of various .PHONY targets to work around the VPATH "help". Warning: When doing an in-tree build, -I$(top_builddir)/rpc/xdr/... looks exactly like -I$(top_srcdir)/rpc/xdr/... Don't be fooled though. And don't delete the -I$(top_builddir)/rpc/xdr/... bits Change-Id: Iba6ab96b2d0a17c5a7e9f92233993b318858b62e BUG: 1330604 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14085 Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* shard: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-08-291-3/+0
| | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a/the "leak" - via the generated rpc/xdr headers - of pragmas that mask these warnings. However 14085 won't pass the smoke test until all the warnings are fixed. Change-Id: I3e53f12371b002d7f2fe2e281b63794c7769f823 BUG: 1369124 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15258 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* build: correctly format some (s)size_t messagesNiels de Vos2016-07-171-1/+1
| | | | | | | | | | | | | | | | | | On 32-bit builds the are are warnings like these: posix.c:6438: warning: format '%ld' expects type 'long int', but argument 11 has type 'ssize_t' Instead of using "%l" for (signed) size_t variables, "%z" should be used. BUG: 1198849 Change-Id: I6f57b5e8ea174dd9e3056aff5da685e497894ccf Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14933 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* core, shard: Make shards inherit main file's O_DIRECT flag if presentKrutika Dhananjay2016-06-071-0/+11
| | | | | | | | | | | | | | | | | | If the application opens a file with O_DIRECT, the shards' anon fds would also need to inherit the flag. Towards this, shard xl would be passing the odirect flag in the @flags parameter to the WRITEV fop. This will be used in anon fd resolution and subsequent opening by posix xl. Change-Id: Iddb75c9ed14ce5a8c5d2128ad09b749f46e3b0c2 BUG: 1342171 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14191 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Don't modify readv sizePranith Kumar K2016-06-061-14/+1
| | | | | | | | | | | | | | | | For o-direct reads application sends aligned size which needs to be sent as is, otherwise o-direct writes where the file-size is not aligned fails. Change-Id: I097418ad92eda6c835d7352a3d2e53ea9d8e2424 BUG: 1342298 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14623 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* posix, shard: Use page-aligned buffer for o-direct readsKrutika Dhananjay2016-06-031-0/+3
| | | | | | | | | | | | | | and also make shard_readv_do() pass the correct flags when the original fd is opened with O_DIRECT. Change-Id: Ic2f8ad900743ed3f7cab56948bcf1358d247a311 BUG: 1342171 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14639 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Get hard-link-count in {unlink,rename}_cbk before deleting ↵Krutika Dhananjay2016-05-201-143/+200
| | | | | | | | | | | | | shards Change-Id: I0606b74f11f5412c4d9af44a6505635ed9022c15 BUG: 1335858 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14334 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* Revert "features/shard: Make o-direct writes work with sharding"Krutika Dhananjay2016-05-171-6/+0
| | | | | | | | | | | | | | | | | | | | This reverts commit c272c71391cea9db817f4e7e38cfc25a7cff8bd5. This is for two reasons: 1) It introduces high fop latencies 2) Even with the patch, there is no true odirect behavior since the workaround in the patch doesn't reduce the caching done in kernel's page cache as far as writes on anon fds associated with individual shards is concerned. Change-Id: Iaf411198b56237ca0601deaf17f69ef178d7e769 BUG: 1335818 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14328 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* features/shard: Make o-direct writes work with shardingKrutika Dhananjay2016-04-111-0/+6
| | | | | | | | | | | | | | | | | | | With files opened with o-direct, the expectation is that the IO performed on the fds is byte aligned wrt the sector size of the underlying device. With files getting sharded, a single write from the application could be broken into more than one write falling on different shards which _might_ cause the original byte alignment property to be lost. To get around this, shard translator will send fsync on odirect writes to emulate o-direct-like behavior in the backend. Change-Id: Ie8a6c004df215df78deff5cf4bcc698b4e17a7ae BUG: 1322214 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13846 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* features/shard: Implement discard fopKrutika Dhananjay2016-03-111-5/+22
| | | | | | | | | | | | Change-Id: Ia5bd8d36b21a586df6556fbec3474892d5871229 BUG: 1261841 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13657 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* features/shard: Return ENOTSUP for unsupported fallocate flagsKrutika Dhananjay2016-03-081-0/+8
| | | | | | | | | | | | | | | Basis: http://lists.gnu.org/archive/html/qemu-devel/2016-02/msg05101.html Change-Id: I5bf80b6e8caed3d7f136fc57e16abfb28869e009 BUG: 1261841 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13523 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Fix NULL-dereference when fsync failsKrutika Dhananjay2016-03-021-0/+4
| | | | | | | | | | Change-Id: I4e51961c158c3b5c78791846ca7f0f6cf7fb5c4a BUG: 1313293 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13562 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* all: fixes for clang compile warningsKaleb S KEITHLEY2016-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cli/src/cli-cmd-parser.c (chenk) cli/src/cli-xml-output.c (spandit) cli/src/cli.c (chenk) libglusterfs/src/common-utils.c (vmallika) libglusterfs/src/gfdb/gfdb_sqlite3.c (jfernand +1) rpc/rpc-transport/socket/src/socket.c (?) xlators/cluster/afr/src/afr-transaction.c (?) xlators/cluster/dht/src/dht-common.h (srangana +2) xlators/cluster/dht/src/dht-selfheal.c (srangana +2) xlators/debug/io-stats/src/io-stats.c (R. Wareing) xlators/features/barrier/src/barrier.c (vshastry) xlators/features/bit-rot/src/bitd/bit-rot-scrub.h (vshankar +1) xlators/features/shard/src/shard.c (kdhananj +1) xlators/mgmt/glusterd/src/glusterd-ganesha.c (skoduri) xlators/mgmt/glusterd/src/glusterd-handler.c (atinmu) xlators/mgmt/glusterd/src/glusterd-op-sm.h (atinmu) xlators/mgmt/glusterd/src/glusterd-snapshot.c (spandit) xlators/mgmt/glusterd/src/glusterd-syncop.c (atinmu) xlators/mgmt/glusterd/src/glusterd-volgen.c (atinmu) xlators/protocol/client/src/client-messages.h (mselvaga +1) xlators/storage/bd/src/bd-helper.c (M. Mohan Kumar) xlators/storage/bd/src/bd.c (M. Mohan Kumar) xlators/storage/posix/src/posix.c (nbalacha +1) Change-Id: I85934fbcaf485932136ef3acd206f6ebecde61dd BUG: 1293133 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13031 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* shard: add seek() FOP as not supportedNiels de Vos2016-02-041-0/+12
| | | | | | | | | | | | | | | | | | For getting basic support for SEEK_DATA/SEEK_HOLE, sharding has not been implemented. Bug 1301647 has been filed to get this new feature in sharding as well. Because of a premature merge (and revert), this change is re-applying everything from commit 2ce3daa94066dcc77cdc6b54a31747b6c7c0c2fc again. BUG: 1220173 Change-Id: I0fb2d36c65af5cb2d0a064104b74f7a863ec4ed3 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/13347 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* Revert "shard: add seek() FOP as not supported"Pranith Kumar Karampuri2016-01-271-12/+0
| | | | | | | | | | | | | This reverts commit 2ce3daa94066dcc77cdc6b54a31747b6c7c0c2fc. Change-Id: Ic00337a69e0a322b14c5cfdf68c06428c5da3a19 Reviewed-on: http://review.gluster.org/13301 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>