summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: Empty string should not be default option valPranith Kumar K2012-12-053-5/+6
| | | | | | | | | | | | Glusterd does not allow empty string as default value. Changed afr option values to disallow empty string as value. Change-Id: I92a2d658907dbc6101e1139dd91f548acb5506f5 BUG: 859927 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4271 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: mark new entry changelog for create/mknod failuresPranith Kumar K2012-12-045-67/+228
| | | | | | | | | | | | | | | | | | | | | | Problem: When create/mknod fails on some of the nodes, appropriate pending data/metadata changelogs are not assigned. This was not considered to be an issue because entry self-heal would do the assigning of appropriate changelog after creating new entries. But using the combination of rebalance and remove brick we can construct a case where a file with same name and gfid can be created in a dir with different data and link-to xattr without any changelog. Fix: When a create/mknod failure is observed mark the appropriate changelog on the new file created. Change-Id: I4c32cbf5594a13fb14deaf97ff30b2fff11cbfd6 BUG: 858212 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4207 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: use data trylock mode in read/write self-heal trigger pathsBrian Foster2012-12-041-1/+8
| | | | | | | | | | | | | | | | | | | | | | Self-heal data lock contention between clients and glustershd instances can lead to long wait and user response times if the client ends up pending its lock on glustershd self-heal of a large file. We have reports of guest vm instances going completely unresponsive during self-heal of virtual disk images. Optimize the read/write self-heal trigger codepath (i.e., afr_open_fd_fix()) to trylock for self-heal and skip the self-heal otherwise to minimize the likelihood of a running/active guest of competing with glustershd on arrival of a brick. Note that lock contention is still possible from the client (e.g., via lookup). BUG: 874045 Change-Id: I406443c061ff6acd2a851179626b78352caa5c03 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4258 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: support self-heal data trylock mechanismBrian Foster2012-12-044-8/+15
| | | | | | | | | | | | | | | Introduce a block flag to support an optional blocking or non-blocking mode in the self-heal data locking mechanism. All callers are modified to use blocking mode, which is the current default behavior (no change in behavior is introduced by this commit). BUG: 874045 Change-Id: Ib7ff9984578fa11de4e3b6981508100cdddd37cd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4257 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: make flush non-transactionalBrian Foster2012-12-043-139/+38
| | | | | | | | | | | | | | | | | | Flush is historically a transaction to ensure all previous writes were complete. This is no longer required as write-behind has learned to make flush a barrier operation (re: conversation w/ Avati). Flush taking a full file lock causes VMs running on afr volumes to stall when a migration occurs and self-heal is in progress. Make afr_flush() a non-transactional operation. BUG: 874045 Change-Id: If2db83823e280c86b1b29b41361eed7081601632 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4261 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: support auto-NUFA optionJeff Darcy2012-12-041-9/+67
| | | | | | | | | | | | | | | Many people have asked for behavior like the old NUFA, which builds and seems to run but was previously impossible to enable/configure in a standard way. This change allows NUFA to be enabled instead of DHT from the command line, with automatic selection of the local subvolume on each host. Change-Id: I0065938db3922361fd450a6c1919a4cbbf6f202e BUG: 882278 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4234 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fix memory leaksRaghavendra Bhat2012-12-042-1/+3
| | | | | | | | | | | | | * write-behind: free the inode context in wb_forget * distribute: in readdirp callback put the allocated context to the inode * distribute: check if the layout is NULL before accessing it in layout_unref Change-Id: I7698f81b85b99d06bf6b01fc1a6e51e1593b5e27 BUG: 790709 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4250 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to disable readdir failoverPranith Kumar K2012-12-034-25/+41
| | | | | | | | | | | | | | | | | In a replica pair unlike files, directories may not have their content in same order, so readdir for same (offset, size) may not give same entries on both the sobvolumes of replica pair. Switching over from one subvolume to another may not be a good idea sometimes. It may lead to duplicate entries or fewer entries or both. This patch provides a way to disable readdir-failover so that applications like rebalance can retry if they want to. Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0 BUG: 859387 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4159 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Added descriptions to afr optionsPranith Kumar K2012-11-291-11/+86
| | | | | | | | | Change-Id: I4aef1c79743ee08b62e04d7b709f3e8c6b9dc56a BUG: 881517 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4244 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* cluster/dht: fail fix-layout if any of the subvol is downshishir gowda2012-11-295-35/+47
| | | | | | | | | | | | | | | | If any subvolume is down, and a layout is re-written and hash values change, entry names in the downed subvol can be reused in the other subvol which got the same hash range. when the downed subvol is brought back up, duplicate entried might appear Also separated handling of ENOSPC and ENOTCONN error. Change-Id: I5ed93990425a4cee70df2dab7c7c119fdc87ad56 BUG: 860663 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4000 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Heal dir uid/gidshishir gowda2012-11-294-1/+105
| | | | | | | | | | | | Identify mismatching uid/gid in lookup, and trigger a syncop heal. uid/gid of subvol with latest ctime is trusted (local->prebuf). Change-Id: Ib5c4bc438e7f4b1f33080e73593f40f400e997f0 BUG: 862967 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: send unique dict_t instances to replicas in self-heal fxattropBrian Foster2012-11-291-28/+42
| | | | | | | | | | | | | | | | | | | | | afr_sh_data_fxattrop() currently allocates and sends a single xattr dict_t instance to each replica. The callback codepath references the returned object in the self-heal in-memory state for the particular replica. If storage/posix is in the same address-space (i.e., running a single glusterfs client with a fuse->afr->posix graph), the same object is modified and returned for each child, causing corrupted in-memory state and afr xattrs. Allocate and send independent xattr dict_t's for each replica. This allows self-heal to work correctly in a single address-space graph. BUG: 868478 Change-Id: I42832e85b5d1abb6098c28944c717e129300109e Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4149 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: handle short writes in afr_writev_wind and self-heal to avoid corruptionBrian Foster2012-11-295-16/+75
| | | | | | | | | | | | | | | | | | | | | The current failure to handle short writes on writev fops leaves us open to file corruption. A short write on a user request is ignored and leaves replicas in an inconsistent state. A short write during a self-heal is ignored and incorrectly marks the files as consistent if the heal completes. Modify user writev handling to return the best case return value from each of the replicas. Short writes that occur relative to this value are marked as failed and will require a heal. Modify self-heal to set an error on a short write and abort the heal. BUG: 853690 Change-Id: I18b30f58702326249230eeebb361b29e40b535f5 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4150 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: send ACCESS call on dir to first_up_subvol if cached is downshishir gowda2012-11-291-0/+11
| | | | | | | | | Change-Id: I4f518a969bbe3a11075e7c9ae10bd21bf059d5f3 BUG: 867253 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4240 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/stripe: handle GF_XATTR_LOCKINFO_KEY in f(get)(set)xattrRaghavendra2012-11-273-19/+302
| | | | | | | | | Change-Id: I4463006a7f54c05e757d877c56e1330fd91aec45 BUG: 808400 Signed-off-by: Raghavendra <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4125 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/distribute: send getxattr on LOCKINFO to only cached subvolumes.Raghavendra2012-11-271-1/+3
| | | | | | | | | | | | lk is sent to only cached subvolume. Hence there is no point in sending LOCKINFO to other children (even in case of directories). Change-Id: Ia20fc358dfa84cee9a52d1f613564ff6f25aa0c9 BUG: 808400 Signed-off-by: Raghavendra <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4123 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: handle GF_XATTR_LOCKINFO_KEY appropriately.Raghavendra G2012-11-272-26/+521
| | | | | | | | | | | | | | values from all children need to be aggregated into a dictionary and serialized buffer of this aggregated dictionary has to be the value of GF_XATTR_LOCKINFO_KEY in the dict sent as a result of fgetxattr. Change-Id: Ie877f7c637c07feaee4c44d7ef86aa967a17b7e7 BUG: 808400 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4121 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* replicate: don't stop checking xattrs because one was absentJeff Darcy2012-11-263-114/+41
| | | | | | | | | | | | | | | | | | | | | The functional issue is described by the subject line. This patch also addresses several efficiency/structure issues, such as... * Calling dict_set_ptr once for each txn type, instead of once overall. * Calling afr_index_for_transaction_type once per iteration instead of once per call (or better yet zero since the conversion is unnecessary). * Implementation of inner functions in a different file than their one caller, creating a spurious header-file dependency. Change-Id: I29e0df906a820533b66b9ced73e015dfe77267d2 BUG: 865825 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Cluster/afr: Fix output for gluster volume heal vn info healedVenkatesh Somyajula2012-11-267-17/+53
| | | | | | | | | | | | | | | | | | | | | | Problem: Whenever gluster volume heal vol full command is executed, the entries stored in the circual buffer for sh->healed are added in the dictionary in the _crawl_post_sh_action function irrespective of whether actual self heal (due to non-zero values in chage log) takes place or not. Fix: Value of key (actual-sh-done) will be set to 1 whenever self heal takes place due to non-zero change log values and if for some FOP self heal daemon finds that no self heal required after examining the pending matrix, the value will be 0. Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc BUG: 863068 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4181 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Implement float percentagePranith Kumar K2012-11-234-20/+21
| | | | | | | | | Change-Id: Ia7ea63471f0bbd74686873f5f6f183475880f1a0 BUG: 839595 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: check transaction type for eager-lock after it is setPranith Kumar K2012-11-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: I7607c7ff4f88c7ced5416a1cddb6586cf45d88f9 BUG: 861335 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4220 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: dump the layout information of directories onlyRaghavendra Bhat2012-11-191-9/+18
| | | | | | | | | | | | | | | testcase: The changes are for removing gf_log from statedump related sections in dht and using pthread_mutex_trylock in statedump sections. Changes are internal. So tests were done by attaching gdb to the process and executing by manually changing the values of some of the pointers. Change-Id: I41fa76c1812b462cb76f5bbf2fd14de080e73895 BUG: 843822 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4117 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: ignore empty ->hashed_subvol during lookupVenky Shankar2012-10-171-9/+1
| | | | | | | | | | | | | | | ->hashed_subvol is not valid (== NULL) when the subvolume the entity hashes to is down. For directories, we need not rely on ->hashed_subvol as we aggregate information from all subvolumes. So, during lookup, NULL ->hashed_subvol is ingored but logged. Change-Id: I306e4e274fe29d60ff028add4a6c3bcd67b2f314 Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 856459 Reviewed-on: http://review.gluster.org/4046 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Always return the latest time in struct iatt.shishir gowda2012-10-166-50/+264
| | | | | | | | | | | | | | | | | | | save the a/c/mtime in inode_ctx, and dht_inode_ctx_update checks the passed iatte, and updates the stat's time, and inode_ctx's time accordingly. For preparent times, only the iatt stat to be returned is updated, not the ctx. With this, update, WIPE is removed, as we would always be passing back the latest mtime, and hence cache times will be relevant. TODO-handle rename WIPE calls Change-Id: I8e4c738cd830f3fafeef789c9181f9c242ac96a2 BUG: 857791 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3737 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Edited log message in afr_sh_entry_expunge_entry_cbkVenkatesh Somyajulu2012-10-121-1/+2
| | | | | | | | | Change-Id: I9f7562d28c8bc798552c403164397f929a7bd1e7 BUG: 860246 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4052 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Preventing client crashing as the callings of GF_CALLOC has been failed.linbaiye2012-10-114-31/+107
| | | | | | | | | | | | | As the callings of GF_CALLOC can seldom come to a failure, glusterfs client will crash due to segment fault. We should have returned once the variables of transaction's local can't be alloced. Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b BUG: 861335 Signed-off-by: linbaiye <linbaiye@gmail.com> Reviewed-on: http://review.gluster.org/4005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: don't use synctask_new from within a synctaskJeff Darcy2012-10-111-3/+14
| | | | | | | | | | Change-Id: Iebf821ff720c63ab6da4b219d82c7f1d00769992 BUG: 862838 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4032 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: Changes and enhancements to XML outputKaushal M2012-10-111-4/+8
| | | | | | | | | | | | | | | | | | | | This patch contains several xml related changes which fix some bugs and introduce xml output for commands which were missing it. These include, * XML output for rebalance & remove-brick status * XML output for replace-brick * XML output for 'volume status all' in on xml document * proper XML output for "volume {create|start|stop|delete}" * type & status of a volume in 'volume info' is now given as a string as well This patch also cleans up the '#if (HAVE_LIB_XML)' sections from the code-base, so that it is not littered around. Change-Id: I5bb022adf0fedf7e3ead92b4b79bfa02b0b5fef5 BUG: 828131 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3869 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr Changed the message's log level from Error to DebugVenkatesh Somyajulu2012-10-101-3/+3
| | | | | | | | | Change-Id: Ic2506561367bfec9022dc53e9b17b03dc343df95 BUG: 859411 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4055 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: check transaction type for eager-lock after it is setPranith Kumar K2012-10-101-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: Ib1c886866f28788aed67622982e86d667b2cdb80 BUG: 864786 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4053 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: split CPPFLAGS from CFLAGSJeff Darcy2012-10-035-10/+16
| | | | | | | | | | | | | | | | | Automake provides a separate variable for preprocessor flags (*_CPPFLAGS). They are already uses in a few places, so make it consistent and use it everywhere. Note that cflags obtained from pkg-config often are cppflags, which is why LIBXML2_CFLAGS moves with into AM_CPPFLAGS, for example. Change-Id: I15feed1d18b2ca497371271c4b5876d5ec6289dd BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4029 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove useless explicit -fPIC -shared fromJeff Darcy2012-10-035-10/+10
| | | | | | | | | | | | | | | | | | | | CFLAGS libtool will automatically add "-fPIC" to the compiler command line as needed, so there is no need to specify it separately. "-shared" is normally a linker flag and has an odd effect when used with libtool --mode=compile, namely that it inhibits production of static objects. For that however, using AC_DISABLE_STATIC is a lot simpler. Change-Id: Ic4cba0fad18ffd985cf07f8d6951a976ae59a48f BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4027 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove -nostartfiles flagJeff Darcy2012-10-025-5/+5
| | | | | | | | | | | | | | | The "-nostartfiles" is a discouraged option and is documented to potentially result in undesired behavior. Since I see no reason why it should be in glusterfs, remove it. Change-Id: I56f2b08874516ebad91447b2583ca2fb776bb7ab BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4018 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: consolidate common compilation flags into one variableJeff Darcy2012-10-015-5/+5
| | | | | | | | | | | | | | | Some -D flags are present in all files, so collect them. This adds -D${GF_HOST_OS} to some compiler command lines, but this should not be a problem. Change-Id: I1aeb346143d4984c9cc4f2750c465ce09af1e6ca BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4013 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to set readdir-size in entry-self-healPranith Kumar K2012-10-013-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Entry self-heal does lookups on all the entries that are read in readdir. More the size of readdir more number of lookups happen in parallel. It is observed that it leads to HUGE cpu spikes rendering everything else on the system unusable. Fix: Provided the option self-heal-readdir-size to configure the size. Default value is at 1KB. Tests: Checked that the readdirs are happening with the configured value in entry-self-heal. Change-Id: Icaa937ad88857e6f9a12375b1e7f6a49192bc8b1 BUG: 860895 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4002 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fixed some general typing errors.Varun Shastry2012-09-271-1/+1
| | | | | | | | | | | | Eg: changed recieved to received Change-Id: I360fcb99c97c8a0222e373fee20ea2fccfb938db BUG: 860543 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3998 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Trigger heal on local subvols on any child_upPranith Kumar K2012-09-251-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The index in the child that comes online is generally empty because the changes would have happened on the other child which has been up. So the sync begins when the other child's poll time-out happens (i.e. 10 minutes). The expectation is that the sync must be triggered as soon as the connection with any brick is established. Fix: Whenever any child_up happens trigger the index self-heal on all local children in the replicate subvolume. Tests: 1) Checked that the self-heal is triggered on all local children whenever any child comes online. 2) Checked that the volume heal commands are working fine. Change-Id: I4f64737866470a2f989349a889ea52782930e11d BUG: 852741 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3972 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Wake up post-op on non-co-operative transactionPranith Kumar K2012-09-251-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The problem is observed when kernel untar is done. One file untar happens every second. The reason for this is, setattr lock is blocked on the prev fd data-transaction full-lock (because of eager-lock). Because of post-op-delay the post-op (xattrop + unlock) of the prev data-transaction happens after 1 sec. Until this the setattr is blocked resulting in performance problems in untar. Fix: Whenever an loc data, meta-data transaction comes, it should wakeup the prev-post-op on the same process' fd. Tests: The performance problem in untar went away. I put a breakpoint in client_finodelk for a 2G file dd and the inodelk is hit only 4 times. This confirms that the change does not affect post-op-delay in a -ve way. Change-Id: Ice3c2a1211f4dca6520a19bc4ba6cb9efb2902ad BUG: 845754 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3975 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Clean up of typepunning errors ( Strict aliasing warnings )Varun Shastry2012-09-171-1/+3
| | | | | | | | | | | Change-Id: I48733967facc526fb523a8dc9bd068f8c5cc5971 BUG: 764282 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3950 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-09-138-56/+48
| | | | | | | | | | | | License message changed for server-side, dual license GPLV2 and LGPLv3+. Change-Id: Ia9e53061b9d2df3b3ef3bc9778dceff77db46a09 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3940 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: improve dht_fix_layout_of_directory for better re-assignmentAnand Avati2012-09-121-145/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jeff Darcy wrote: > AFAICT, the fix-layout code doesn't do the same rotation that the > new-directory code does. Therefore, the new bricks always claim > completely predictable hash ranges for every directory, leading to > either a 0-1-2-3 pattern or a 1-0-2-3 pattern. In other words, a > file whose hash falls into the second quarter of the range will always > be assigned to brick 2, and a file whose hash falls into the fourth > quarter will always be assigned to brick 3. The rest will be split > according to the original pattern. Put still another way, instead of > same-named files in different directories being spread across N bricks, > they might be spread across only two bricks (bad) or totally > concentrated on one brick (worse) regardless of N. The current dht_fix_layout_of_directory() code, in an attempt to maximize overlap of new layout with existing layout (to minimize movement of data) fails to do a good job of randomizing new assignment even when it could do a better job. In an example where we expand from 2 nodes to 4 nodes, the current possibilities are limited in the following way - (theoretical hash range: 00 - 99) OLD 1 ----- server1: 00 - 49 server2: 50 - 99 NEW 1 ----- server1: 00 - 24 server2: 50 - 74 server3: 25 - 49 server4: 75 - 99 OLD 2 ----- server1: 50 - 99 server2: 00 - 49 NEW 2 ------ server1: 50 - 74 server2: 00 - 24 server3: 25 - 49 server4: 75 - 99 The above shows that when add-brick from 2 bricks to 4 bricks, server3 and server4 always get the _same_ hash range no matter what the original hash range assignment was. The fix in this patch is first do the standard new directory assignment to a directory (with rotation etc.) and then do the reassignment to maximize overlap. This way newly added servers still get random ranges and existing servers have a probability of getting either of the quarters which were part of its half previously. The same principles hold for all add-brick from M to M+N. Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f BUG: 853258 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/3883 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: handle percent option for 'min-free-disk'Amar Tumballi2012-09-071-0/+11
| | | | | | | | | | | | | | | * with the init option cleanups, setting of 'conf->disk_unit' was reset, which made it not set the '%' in the option. * bring a global check, which makes the option assume its percent, as long as value is < 100. Change-Id: I00bd1395a309cdc596a2b2b80304c6d98696a24a Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 852889 Reviewed-on: http://review.gluster.org/3918 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: add option description of 'open'.Jules.Wang2012-09-061-0/+2
| | | | | | | | | Signed-off-by: Jules Wang <lancelotds@163.com> Change-Id: I6c7dd337c758e82e9d58d4d65f53b5aa72ac5dfb BUG: 764890 Reviewed-on: http://review.gluster.org/3895 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: remove gf_log() from statedump functionsAmar Tumballi2012-09-061-3/+0
| | | | | | | | | Change-Id: I83cccab6819d6a74e96c2717ca539fa1568cac89 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 843822 Reviewed-on: http://review.gluster.org/3912 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/dict: make 'dict_t' a opaque objectAmar Tumballi2012-09-066-32/+26
| | | | | | | | | | | | | | | * ie, don't dereference dict_t pointer, instead use APIs everywhere * other than dict_t only 'data_t' should be the valid export from dict.h * added 'dict_foreach_fnmatch()' API * changed dict_lookup() to use data_t, instead of data_pair_t Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 850917 Reviewed-on: http://review.gluster.org/3829 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Don't stop entry/data self-heal on metadata split-brainPranith Kumar K2012-08-292-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Entry/Data self-heal is orthogonal to meta-data self-heal. meta-data split-brain should not affect entry/data self-heal. Fix: Prevented aborting rest of the self-heals when metadata split-brain happens. Tests: 1) Simulated meta-data split-brain then checked data-self-heal succeed on regular file, entry-self-heal succeed on dir. 2) Reset meta-data change-log on one of the subvols and checked that meta-data self-heal also completes. 3) Executed self-heal sanity script. Change-Id: I05ca222d855d3a6000703e3775471d0f874d35d6 BUG: 851451 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3853 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <obdurodon@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht/rebalance: set the correct ownership on the dst file.shishir gowda2012-08-281-0/+8
| | | | | | | | | | | | | | | Currently, the dst file created has root:root ownership, till migration is completed. During this phase, open fails on the dst file if uid/gid is non-root. Setting the dst_file to the correct ownership fixes the issue Change-Id: Icfec89eb10dc866cdee38dab17695fe21174ef99 BUG: 852361 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3861 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-08-288-116/+42
| | | | | | | | | | | | | | | | | | The license message is changed to Copyright (c) 2008-2012 Red Hat, Inc. <http://www.redhat.com> This file is part of GlusterFS. This file is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. Change-Id: I07d2b63ed5fbbbd1884f1e74f2dd56013d15b0f4 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: Avoid excessive logging in self-heal.Krishnan Parthasarathi2012-08-236-22/+25
| | | | | | | | | | | | | | - (Excessive) Logging has been very useful as 'bread-crumbs' in many a root-cause analyses. This patch aims at avoiding logging when the information could be reconstructed using the xattrs, statedump, and/or "volume heal" CLI commands. Change-Id: Iebc6b10ae18f0dd9704bdc6dd03bcfe0f2a09abd BUG: 844804 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3805 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Self-heald: Prevent logging of errno ENOENTVenkatesh Somyajulu2012-08-201-4/+4
| | | | | | | | | Change-Id: Ie56228dfbdc7e519a344681487164a835488a470 BUG: 835423 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/3826 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>