summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: Make read child match check in afr optionalKrutika Dhananjay2015-03-183-0/+19
| | | | | | | | | | | | | | | Having this particular check which was introduced by commit c78998c39f0857ea7aacba360632c148afc54a55 causes a drop in performance in readdirp. So the behavior is made configurable with this patch. Change-Id: I2858fc18b3539df7aa6d3f489e0d5cfaeb8a9b3c BUG: 1202669 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9917 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: remove stale index entriesRavishankar N2015-03-173-4/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write transaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919 BUG: 1190069 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Convert quota size from n/w to host order before useKrutika Dhananjay2015-03-121-0/+4
| | | | | | | | | | | | Change-Id: I3e4fe15716556441546fcd62b8ac2833869b21cf BUG: 1200670 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9853 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add self-heal-daemon command handlersPranith Kumar K2015-03-092-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the changes required in ec xlator to handle index/full heal. Index healer threads: Ec xlator start an index healer thread per local brick. This thread keeps waking up every minute to check if there are any files to be healed based on the indices kept in index directory. Whenever child_up event comes, then also this index healer thread wakes up and crawls the indices and triggers heal. When self-heal-daemon is disabled on this particular volume then the healer thread keeps waiting until it is enabled again to perform heals. Full healer threads: Ec xlator starts a full healer thread for the local subvolume provided by glusterd to perform full crawl on the directory hierarchy to perform heals. Once the crawl completes the thread exits if no more full heals are issued. Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic. Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Handle getxattr of quota-size keyPranith Kumar K2015-03-093-61/+103
| | | | | | | | | | | | | Afr needs to query QUOTA_SIZE_KEY from all the subvolumes and return the value which is maximum of the readable bricks. Change-Id: Ibb9064c8652aea0d984796e7a06f8adca72aa971 BUG: 1199431 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9820 Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Implementation of quorum-readsPranith Kumar K2015-03-055-2/+32
| | | | | | | | | | | Provide a way of disabling reads when quorum is not met. Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5 BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9543 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Do not increment healed_count if no healing was performedKrutika Dhananjay2015-03-047-67/+92
| | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: When file modifications are happening while index heal is launched, index healer could pick up entries which appeared in indices/xattrop transiently during the course of the operations on the mount point, and do not really need any heal. This will cause index healer to keep doing index-heal in a loop as long as it finds this entry, by believing that it did successfully heal some gfids even when it didn't. FIX: afr_selfheal() now returns a 1 to indicate that it did not (need to) heal a given gfid. afr_shd_selfheal() will not increment healed_count whenever afr_selfheal() returns a 1. Change-Id: I0d97e11392a032a852e8c6508f691300ef0e5b98 BUG: 1194305 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9713 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glfs_fini: Clean up all the resources allocated in glfs_new.Poornima G2015-03-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially even after calling glfs_fini(), all the threads created during init and many other resources like memory pool, iobuf pool, event pool and other memory allocs were not being freed. With this patch these resources are freed in glfs_fini(). The two thumb rules followed in this patch are: - The threads are not killed, they are made to exit voluntarily, once the queued tasks are completed. The main thread waits for the other threads to exit. - Free the memory pools and destroy the graphs only after all the other threads are stopped, so that there are less chances of hitting access after free. Resources freed and its order: 1. Destroy the inode table of all the graphs - Call forget on all the inodes. This will not be required when the cleanup during graph switch is implemented to perform inode table destroy. 2. Deactivate the current graph, call fini of all the xlators. 3. Syncenv destroy - Join the synctask threads and cleanup syncenv resources Sets the destroy mode, complete the existing synctasks, then join the synctask threads. After entering the destroy mode, -if a new synctask is submitted, it fails. -if syncenv_new() is called, it will end up creating new threads, but this is called only during init. 4. Poller thread destroy Register an event handler which sets the destroy mode for the poller. Once the poller is done processing all the events, it exits. 5. Tear down the logging framework The log file is closed and the log level is set to none, after this point no log messages appear either in log file or in stderr. 6. Destroy the timer thread Set the destroy bit, once the pending timer events are processed the timer thread exits. Note: Log infrastructure should be shutdown before destroying the timer thread as gf_log uses timers. 7. Destroy the glusterfs_ctx_t For all the graphs(active and passive), free graph, xlator structs and few other lists. Free the memory pools - iobuf pool, event pool, dict, logbuf pool, stub mem pool, stack mem pool, frame mem pool. Few things not addressed in this patch: 1. rpc_transport object not destroyed, the PARENT_DOWN should have destroyed this object but has not, needs to be addressed as a part of different patch 2. Each xlator fini should clean up the local pool allocated by its xlator. Needs to be addresses as a part of different patch. 3. Each xlator should implement forget to free its inode_ctx. Needs to be addresses as a part of different patch. 3. Few other leaks reported by valgrind. 4. fd and fd contexts The numbers: The resource usage by the test case in this patch: Without the fix, Memory: ~3GB; Threads: ~81 With this fix, Memory: 300MB; Threads: 1(main thread) Change-Id: I96b9277541737aa8372b4e6c9eed380cb871e7c2 BUG: 1093594 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/7642 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* libglusterfs: Moved common functions as utils in syncop/common-utilsPranith Kumar K2015-02-273-138/+11
| | | | | | | | | | | | | | | These will be used by both afr and ec. Moved syncop_dirfd, syncop_ftw, syncop_dir_scan functions also into syncop-utils.c Change-Id: I467253c74a346e1e292d36a8c1a035775c3aa670 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9740 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : provide split-brain info by using getxattrAnuradha2015-02-253-0/+140
| | | | | | | | | | | | | | | | | | | | | | This patch is one part to enable users analyze and resolve split-brain. Problem : To know if a file is in data/metadata split-brain Solution : Performing "getfattr -n afr.split-brain-status <path-to-file>" from the mount provides this information. Also provides the list of afr children to analyse to get more information. Change-Id: I4d9b429794759a906371416cb84c84a212e2c7b9 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9633 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Enable auto-quorum for replicate with odd number of bricksPranith Kumar K2015-02-091-8/+16
| | | | | | | | | | | Change-Id: I908934f1f22cf7d2d0ceccc0dedf28a69861997f BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9517 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Re-introduce heal-timeout optionPranith Kumar K2015-02-063-1/+14
| | | | | | | | | | | Change-Id: I87484c810006a92ed7489284b6d74e9b0aecae80 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9598 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: Provide syncop_ftw and syncop_dir_scan utilsPranith Kumar K2015-02-061-257/+117
| | | | | | | | | | | | | | | | | ftw provides file tree walk. dir_scan does just a readdir not readdirp. Also changed Afr's self-heal-daemon's crawling functions to use this. These utils will be used by ec in future to do proactive/full healing. Change-Id: I05715ddb789592c1b79a71e98f1e8cc29aac5c26 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9485 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Fix parent read subvol selection policy in lookupKrutika Dhananjay2015-02-041-1/+1
| | | | | | | | | | | | | | | | | When lookup has succeeded on multiple subvols of AFR (including the read child of the parent dir) and all of them are "readable", ideally the call must be unwound with postparent from the parent's read child. But that is not the case, due to a bug introduced in the commit c78998c39f0857ea7aacba360632c148afc54a55. This patch fixes the issue. Change-Id: I83b0c26494a5d0bdbc30fcbe974fbdb6f7e9c84a BUG: 1179169 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9569 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: When parent and entry read subvols are different, set ↵Krutika Dhananjay2015-02-022-23/+94
| | | | | | | | | | | | | | | entry->inode to NULL That way a lookup would be forced on the entry, and its attributes will always be selected from its read subvol. Change-Id: Iaba25e2cd5f83e983fc8b1a1f48da3850808e6b8 BUG: 1179169 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9477 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr : Change in heal info split-brain commandAnuradha2015-01-302-5/+12
| | | | | | | | | | | | | Implementation of heal info split-brain command with glfs-heal. Change-Id: I233eb790de6eb5468a4cbb12a1cef0f97db2a1d2 BUG: 1183019 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9459 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Don't write to sparse regions of sink.Ravishankar N2015-01-301-2/+39
| | | | | | | | | | | | | | | | | | | | Problem: When data-self-heal-algorithm is set to 'full', shd just reads from source and writes to sink. If source file happened to be sparse (VM workloads), we end up actually writing 0s to the corresponding regions of the sink causing it to lose its sparseness. Fix: If the source file is sparse, and the data read from source and sink are both zeros for that range, skip writing that range to the sink. Change-Id: I787b06a553803247f43a40c00139cb483a22f9ca BUG: 1166020 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9480 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: split-brain resolution CLIRavishankar N2015-01-157-54/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the AFR heal command to include automated split-brain resolution. This patch [3/3] is the final patch for afr automated split-brain resolution implementation. "gluster volume heal <VOLNAME> [full | statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | heal-failed | split-brain]| split-brain {bigger-file <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]" The new additions being: 1.gluster volume heal <VOLNAME> split-brain bigger-file <FILE> Locates the replica containing the FILE, selects bigger-file as source and completes heal. 2.gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE> Selects <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal. 3.gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME> Selects all split-brained files in <HOSTNAME:BRICKNAME> as source and completes heal. Note: <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file, which sometimes gets displayed in the heal info command's output. Entry/gfid split-brain resolution is not supported. Example can be found in the test case. Change-Id: I4649733922d406f14f28ee9033a5cb627b9538b3 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9377 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libglusterfs: change signature of syncop_(f)getxattrRavishankar N2015-01-052-6/+8
| | | | | | | | | | | | | | | | | Pass xdata dict to syncop_(f)getxattr calls. This patch [1/3] is required as a part of afr automated split-brain resolution implementation. Change-Id: I3970b3dd6daf64681a031e37f8e9afb14fb3d668 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9375 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: serialize inode locksPranith Kumar K2015-01-042-60/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Afr winds inodelk calls without any order, so blocking inodelks from two different mounts can lead to dead lock when mount1 gets the lock on brick-1 and blocked on brick-2 where as mount2 gets lock on brick-2 and blocked on brick-1 Fix: Serialize the inodelks whether they are blocking inodelks or non-blocking inodelks. Non-blocking locks also need to be serialized. Otherwise there is a chance that both the mounts which issued same non-blocking inodelk may endup not acquiring the lock on any-brick. Ex: Mount1 and Mount2 request for full length lock on file f1. Mount1 afr may acquire the partial lock on brick-1 and may not acquire the lock on brick-2 because Mount2 already got the lock on brick-2, vice versa. Since both the mounts only got partial locks, afr treats them as failure in gaining the locks and unwinds with EAGAIN errno. Change-Id: Ie6cc3d564638ab3aad586f9a4064d81e42d52aef BUG: 1176008 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9372 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr: Fixes to commit 85427a23c238499137cbfaafdb7b6ad27f67506aAnuradha2015-01-013-1/+10
| | | | | | | | | | | | | * Fixed a dict leak * Re-added 'return on failure' check Change-Id: I07edd03e4608fd2b7c4a91019a0e43033e6e78b2 BUG: 1163804 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9368 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Make entry-self-heal in afr-v2 compatible with afr-v1Pranith Kumar K2014-12-261-1/+28
| | | | | | | | | | | | | | | | | | | | | Problem: entry self-heal in 3.6 and above, takes full lock on the directory only for the duration of figuring out the xattrs of the directories where as 3.5 takes locks through out the entry-self-heal. If the cluster is heterogeneous then there is a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal. Fix: In 3.6.x and above get an entry lock on a very long name before entry self-heal begins so that 3.5 entry self-heal will not get locks until 3.6.x entry self-heal completes. Change-Id: I71b6958dfe33056ed0a5a237e64e8506c3b0fccc BUG: 1168189 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9227 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: stop encoding subvolume id in readdir d_offAnand Avati2014-12-263-131/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of encoding d_off in AFR is to indicate the selected subvolume for the first readdir, and continue all further readdirs of the session on the same subvolume. This is required because, unlike files, dir d_offs are specific to the backend and cannot be re-used on another subvolume. The d_off transformation encodes the subvolume id and prevents such invalid use of d_offs on other servers. However, this approach could be quite wasteful of precious d_off bit-space. Unlike DHT, where server id can change from entry to entry and thus encoding the server id in the transformed d_off is necessary, we could take a slightly relaxed approach in AFR. The approach is to save the subvolume where the last readdir request was sent in the fd_ctx. This consumes constant space (i.e no per-entry cache), and serves the purpose of avoiding d_off "misuse" (i.e using d_off from one server on another). The compromise here is NFS resuming readdir from a non-0 cookie after an extended delay (either anonymous FD has been reclaimed, or server has restarted). In such cases a subvolume is picked freshly. To make this fresh picking more deterministic (i.e, to pick the same subvolume whenever possible, even after reboots), the function afr_hash_child (used by afr_read_subvol_select_by_policy) is modified to skip all dynamic inputs (i.e PID) for the case of directories. Change-Id: I46ad95feaeb21fb811b7e8d772866a646330c9d8 BUG: 1163161 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/9332 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: coverity fixesRavishankar N2014-12-236-19/+21
| | | | | | | | | | | | | Some fixes for the 17th Dec 2014 run. https://scan6.coverity.com:8443/reports.htm#v31028/p10714/g31029 Change-Id: Ia4410ef87a56fffb61803d0a4e62369b058e1cfb BUG: 1176089 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9314 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : Change in volume heal info commandAnuradha2014-12-237-29/+366
| | | | | | | | | | | | | | | | gluster volume heal <volname> info command will now also display if the files listed (in the output of the command) are in split-brain or possibly being healed. This patch also fixes build warning that occurs. Change-Id: I1fc92e62137f23b2b9ddf6e05819cee6230741d1 BUG: 1163804 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9119 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* Avoid spurious directory metedata split brainEmmanuel Dreyfus2014-12-221-1/+115
| | | | | | | | | | | | | | | | | | | When directory content is modified, [mc]time is updated. On Linux, the filesystem does it, while at least on NetBSD, the kernel file-system independant code does it. This means that when entries are added while bricks are down, the kernel sends a SETATTR [mc]time which will cause metadata split brain for the directory. In this case, clear the split brain by finding the source with the most recent modification date. BUG: 1129939 Change-Id: Ic0177e0df753a4748624d0b906834ed54593adb9 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9291 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* telldir()/seekdir() portability fixesEmmanuel Dreyfus2014-12-171-6/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX says that an offset obtained from telldir() can only be used on the same DIR *. Linux is abls to reuse the offset accross closedir()/opendir() for a given directory, but this is not portable and such a behavior should be fixed. An incomplete fix for the posix xlator was merged in http://review.gluster.com/8926 This change set completes it. - Perform the same fix index xlator. - Use appropriate casts and variable types so that 32 bit signed offsets obtained by telldir() do not get clobbered when copied into 64 bit signed types. - modify glfs-heal.c and afr-self-heald.c so that they do not use anonymous fd, since this will cause closedir()/opendir() between each syncop_readdir(). On failure we fallback to anonymous fs only for Linux so that we can cope with updated client vs not updated brick. - Avoid sending an EINVAL when the client request for the EOF offset. Here we fix an error in previous fix for posix xlator: since we fill each directory entry with the offset of the next entry, we must consider as EOF the offset of the last entry, and not the value of telldir() after we read it. - Add checks in regression tests that we do not hit cases where offsets fed to seekdir() are wrong. Introduce log_newer() shell function to check for messages produced by the current script. This fix gather changes from http://review.gluster.org/9047 and http://review.gluster.org/8936 making them obsolete. BUG: 1129939 Change-Id: I59fb7f06a872c4f98987105792d648141c258c6a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9071 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Eliminate locking in sh domain in metadata self-healKrutika Dhananjay2014-12-051-35/+2
| | | | | | | | | | Change-Id: I9ef25a17c9a43ba06fac2ad3f7c18cb47de91537 BUG: 1170913 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9240 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/marker: Filter internal xattrs in lookupPranith Kumar K2014-11-113-7/+21
| | | | | | | | | | | | | Afr should ignore quota-size-key as part of self-heal but should heal quota-limit key. Change-Id: Ic0b06bd20a563a00d6bfdc2dc5a76c661e533ecb BUG: 1161106 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9061 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve errno in case of failures on all subvolsPranith Kumar K2014-11-051-5/+16
| | | | | | | | | | | | | | | | | | | | Problem: When quorum is enabled and the fop fails on all the subvolumes, op_errno is set to EROFS which overrides the actual errno returned from bricks. Fix: Don't override the errno when fop fails on all subvols. Change-Id: I61e57bbf1a69407230ec172a983de18d1c624fd2 BUG: 1157976 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8984 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Perform post-op in entry selfheal inside locksKrutika Dhananjay2014-10-311-3/+31
| | | | | | | | | | | | | | | | | Take entrylks in xlator domain before doing post-op (undo-pending) in entry self-heal. This is to prevent a parallel name self-heal on an entry under @fd->inode from reading pending xattrs while it is being modified by SHD after entry sh below, given that name self-heal takes locks ONLY in xlator domain and is free to read pending changelog in the absence of the following locking. Change-Id: Ie083ceab10155c460447f04bdce7688480f1ac4f BUG: 1128721 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9020 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* logs: Do selective logging for errnosPranith Kumar K2014-10-201-2/+3
| | | | | | | | | | | | | | | | | | | | | Problem: Just after replace-brick the mount logs are filled with ENOENT/ESTALE warning logs because the file is yet to be self-healed now that the brick is new. Fix: Do conditional logging for the logs. ENOENT/ESTALE will be logged at lower log level. Only when debug logs are enabled, these logs will be written to the logfile. Change-Id: If203d09e2479e8c2415ebc14fb79d4fbb81dfc95 BUG: 1151303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8918 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* heal: glfs-heal implementationPranith Kumar K2014-10-154-10/+26
| | | | | | | | | | | Thanks a lot to Niels for helping me to get build stuff right. Change-Id: I634f24d90cd856ceab3cc0c6e9a91003f443403e BUG: 1147462 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6529 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Add afr-v1 xattr compatibilityPranith Kumar K2014-10-016-83/+330
| | | | | | | | | | | | | All the special cases v1 handles and also self-accusing pending changelog from v1 pre-op also is handled in this patch. Change-Id: Ie10f71633fb20276f01ecafbd728f20483e7029c BUG: 1128721 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8536 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Fix invalid seekdir() usageEmmanuel Dreyfus2014-09-301-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to POSIX, seekdir() should only be given offset obtained from telldir() on the same DIR * http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html Code from afr-self-heald.c and index.c is operating outside of the specification, by doing using seekdir() with offset from a previously open/close/re-open directory. This seems to work on Linux (although with no guarantee it will always in the future). On NetBSD the seekdir() with a in invalid offset is a nilpotent operation, and causes an infinite loop, since index_fill_readdir() always restart from the beginning of the directory. The situation is fixed by using a non anonymous fd in afr-self-heald.c: we explicitely open the directory so that it remains open on the brick side during the timeframe where we want to reuse offsets in seekdir(). This requires adding an opendir fop in index xlator. If the brick was not updated, the opendir will fail and we fallback to the standard violating approach for backward compatibility on Linux. On other systems we fail since it never worked. While there, add tests to check seekdir() success in index and posix xlators, so that incorrect usage from calling code produce an explicit error instead of an infinite loop. We can only do it on non Linux systems, for the sake of backward compatibility when the brick was updated but not the client. BUG: 1129939 Change-Id: I88ca90acfcfee280988124bd6addc1a1893ca7ab Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/8760 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Launch self-heal only when all the brick status is knownPranith Kumar K2014-09-291-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: File goes into split-brain because of wrong erasing of xattrs. RCA: The issue happens because index self-heal is triggered even before all the bricks are up. So what ends up happening while erasing the xattrs is, xattrs are erased only on the sink brick for the brick that it thinks is up leading to split-brain Example: lets say the xattrs before heal started are: brick 2: trusted.afr.vol1-client-2=0x000000020000000000000000 trusted.afr.vol1-client-3=0x000000020000000000000000 brick 3: trusted.afr.vol1-client-2=0x000010040000000000000000 trusted.afr.vol1-client-3=0x000000000000000000000000 if only brick-2 came up at the time of triggering the self-heal only 'trusted.afr.vol1-client-2' is erased leading to the following xattrs: brick 2: trusted.afr.vol1-client-2=0x000000000000000000000000 trusted.afr.vol1-client-3=0x000000020000000000000000 brick 3: trusted.afr.vol1-client-2=0x000010040000000000000000 trusted.afr.vol1-client-3=0x000000000000000000000000 So the file goes into split-brain. Change-Id: I1185713c688e0f41fd32bf2a5953c505d17a3173 BUG: 1142601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8755 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr : Fix incorrect looping of index healerAnuradha2014-09-291-4/+9
| | | | | | | | | | | | | | | | Sending appropriate return value from afr_selfheal() fixes the issue. Credits : Krutika Dhananjay and Pranith Kumar. Change-Id: I01dbd49476f6bfbd02028fdde1f60cc0324a1e39 BUG: 1146812 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8868 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix inode leakKrutika Dhananjay2014-09-281-0/+2
| | | | | | | | | | Change-Id: I723b65d6fcae5bd39c0daece3b2f48baf0a72166 BUG: 1145471 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8875 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: More dict_t leak fixesKrutika Dhananjay2014-09-262-33/+68
| | | | | | | | | | Change-Id: I6618b7b2df0f88a3e6252f9136135cf402147a37 BUG: 1134221 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8852 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix locking issues in entry self-healKrutika Dhananjay2014-09-241-92/+123
| | | | | | | | | | | | | Original reporter of the bug & designer of the solution: Pranith Kumar K <pkarampu@redhat.com> Change-Id: I9ed89aa92e4cd0f8049f5f6c7a3701e52989ae5e BUG: 1128721 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8837 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix spurious metadata self-healsPranith Kumar K2014-09-247-29/+86
| | | | | | | | | | | | | - Added logging for metadata and data self-heals which helped in debugging this issue. - Added checks to skip self-heals when no sinks are available to heal Change-Id: I0d50dceb84cd9ad4fe00e0b749ddf7d4ff42348a BUG: 1128721 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8709 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Fixed mem leaks in self-heal code path.Anuradha2014-09-232-1/+17
| | | | | | | | | | | | | | | | AFR_STACK_RESET previously didn't cleanup afr_local_t, leading to memory leaks. With this patch, cleanup is done. All credit goes to Pranith Kumar Karampuri. Change-Id: I3c727ff4bb323dccb81da4b3168ac69bb340d17d BUG: 1145471 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8821 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Do not reset pending xattrs on gfid or type mismatch in entry-shKrutika Dhananjay2014-09-231-18/+79
| | | | | | | | | | Change-Id: Ie27219a376382e2455a0fcc094f8b7eb243738ae BUG: 1140613 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8816 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Don't start heal when lookup succeeds on < 2 childrenPranith Kumar K2014-09-236-8/+29
| | | | | | | | | | | | | | | | | | | Problem: When self-heal code doesn't see at least 2 successes on looking up children, then self-heal can't be done. What is happening now is if all the lookups fail then the pending changelog is all zeros in xattrs so all the children are becoming sources and leading to crashes when the code paths further assume that some data structures are populated properly Fix: Don't proceed with self-heals when < 2 children succeed lookups. BUG: 1128721 Change-Id: Iffdf0feebb6f98812d9d01cdd0cf97f3e19ba76f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8698 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Set all the xattrs needed by index xlatorAnuradha2014-09-162-41/+30
| | | | | | | | | | | | | | | | | | | | | | Index xlator removes the index file from indices xattrop directory in case the value for keys sent are zero. If all the required keys are not set by afr then index file might be removed in an invalid way. With this change all the keys required by index xlator are set by afr such that invalid removal of files does not occur. Change-Id: Idbed0764a95157fd5cab8d6685057a43788fc7df BUG: 1139230 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8652 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Handle EAGAIN properly in inodelkPranith Kumar K2014-09-152-14/+130
| | | | | | | | | | | | | | | | | | Problem: When one of the brick is taken down and brough back up in a replica pair, locks on that brick will be allowed. Afr returns inodelk success even when one of the bricks already has the lock taken. Fix: If any brick returns EAGAIN return failure to parent xlator. Change-Id: I5b842d0fc094359cc4231494053d2bfeb606bbbe BUG: 1141539 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8710 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: perform list-xattr during lookupRavishankar N2014-09-054-11/+222
| | | | | | | | | | | | | Detect and heal mismatching user extended attributes during lookup. 'Forward' port of http://review.gluster.org/#/c/7444/ Change-Id: Id03c9746f083ffd3014711d0b3a2e5a71a45eed4 BUG: 1134691 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/8558 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : Mark pending changelog xattrs for new creationsAnuradha2014-09-037-90/+148
| | | | | | | | | | | | | Based on type of file, set appropriate pending changelogs for new entries. Change-Id: Ifd124bf9bc54b996ce83ab9f39d03b3ccca7eb3c BUG: 1130892 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8555 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Minor fixes to #8574Krutika Dhananjay2014-09-031-5/+3
| | | | | | | | | | | | | | * Fixed a bug in afr_selfheal_name_type_mismatch_check() * Fixed indentation * Removed redundant 'continue' statements Change-Id: Ie58b5dec9085ce9fe46ae9f244ebae1b1cef7ac5 BUG: 1132469 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8586 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Propagate EIO on inode's type mismatchKrutika Dhananjay2014-09-027-120/+333
| | | | | | | | | | | | | Original author of the test script: Pranith Kumar K <pkarampu@redhat.com> Change-Id: If515ecefd3c17f85f175b6a8cb4b78ce8c916de2 BUG: 1132469 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8574 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>