summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht/src
Commit message (Collapse)AuthorAgeFilesLines
* cluster/distribute: Reopen fds in migration internally as root:rootshishir gowda2013-03-041-1/+10
| | | | | | | | | | | | | | Though linkfile_create and rebalance dst file create sent a setattr with correct ownership, there is still a race window where the linkfile open (client open due to migration) will fail, as its ownership will be root:root. BUG: 884597 Change-Id: Iba73681eae4f280d39ee6c9a40009e195768bee7 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4612 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Prevent spurious multiple defrag crawlsshishir gowda2013-03-041-9/+14
| | | | | | | | | | | | | | | | | In dht_notify, we used to create a thread to start defrag crawls after we had heard from all child subvols. This was in-correct, as a later event, could also trigger the crawl again(due to the fact that all subvols had responded). The fix is to make sure, the thread is started only once after all subvols have responded the first time BUG: 916449 Change-Id: I1619344fbb1cb51d5e1db38d8a29821fa870fa8b Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4610 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Preserve file size during rebalance migrationshishir gowda2013-03-041-0/+6
| | | | | | | | | | | | | | | | | | If holes are encountered, then we do not write these to the dst, which sometimes causes file size to be lesser than src. Data is not corrupted, as when non-zero reads are received, we do write that data. Calling a truncrate to give file size to prevent it from being truncated to less than src in case the file end has holes. Thanks to Brian Foster for providing the test case BUG: 915554 Change-Id: I7e1e0c475118b073c3ebb87e93220c1ec22e8b7d Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4609 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/distribute: Remove suprious fd_unref callshishir gowda2013-03-041-2/+0
| | | | | | | | | | | | After fix http://review.gluster.org/4282 (libglusterfsterfs/syncop: do not hold ref on the fd in cbk) was pushed, syncop_open does not take a ref anymore. BUG: 910661 Change-Id: Idedff91270966e6e70e71ee83785c0228e238d31 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4608 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/dht: Create linkfile with file uid/gidshishir gowda2013-03-044-4/+101
| | | | | | | | | | | | | | | | Currently, linkfile creation happens as root. use uid/gid returned from _cbk (link/rename) to set the correct ownership of the link files. Also added test/dht.rc to implement common dht functions BUG: 884597 Change-Id: I6bc0e04f62d4716fc033681e5678e852a1be7a2f Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/dht: pathinfo xattr changes for directoriesVenky Shankar2013-02-082-92/+224
| | | | | | | | | | | | | | | | | | | | Since directories have presence on all subvolumes there is no definite meaning of ->hashed_subvol or ->cached_subvol. getxattr() code path chooses ->cached_subvol for pathinfo extended attribute. While this makes sense of files, it makes less sense for directories. Further if a hashed or a cached subvolume is down, and there's a getxattr request for a directory, we return with an errno. This patch changes pathinfo extended attribute contents by aggregating information from all subvolumes that are up. Change-Id: I58adb741d63ccfd1d0239af75eb65f26f0fb384d Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 856455 Reviewed-on: http://review.gluster.org/4047 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Use proper libtool option -avoid-version instead of bogus -avoidversionAnand Avati2013-02-071-3/+3
| | | | | | | | | | Change-Id: I1c9541058c7d07786539a3266ca125a6a15287d8 BUG: 859835 Signed-off-by: Anand Avati <avati@redhat.com> Original-author: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com> Signed-off-by: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com> Reviewed-on: http://review.gluster.org/3967 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: better layout-optimization algorithmJeff Darcy2013-02-072-22/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This method deals with the case where swapping might gain a bigger overlap for the xlator currently under consideration, but sacrifices even more from the xlator we're swapping with. For example: A = 0x00000000 - 0x44444443 (new 0x00000000 - 0x55555554) B = 0x44444444 - 0x77777776 (new 0x55555555 - 0xaaaaaaa9) C = 0x77777777 - 0xffffffff (new 0xaaaaaaaa - 0xffffffff) Here, the new range for B has a bigger overlap with the old C than with the old B (0x33333333 vs. 0x22222222 to be precise) so looking only at that might lead us to swap. However, such a swap turns the new C's overlap from 0x55555556 (vs. old C) to *zero* (vs. old B). In other words, we've gained 0x11111111 for B but lost 0x55555556 for C, so it's a bad idea. The new algorithm accounts for all effects of the swap, so it not only avoids bad swaps but can make some good ones that would have been missed previously. For example, if swapping a range X with a later range Y would not increase the overlap for X we would previously have skipped it even if the swap would increase Y's overlap without affecting X's. This is the normal case when we're adding a new brick (which initially has zero overlap with any old range) so finding more good swaps is probably even more important than avoiding bad ones. Also, the logic in dht_overlap_calc was completely broken before, causing integer overflows instead of providing correct values, so no matter what higher-level algorithm was in place the GIGO effect would have resulted in bad decisions. Change-Id: If61ed513cfcb931916c6b51da293e3efbaaf385f BUG: 853258 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/3908 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Correct min_free_disk behaviourRaghavendra Talur2013-02-042-27/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Files were being created in subvol which had less than min_free_disk available even in the cases where other subvols with more space were available. Solution: Changed the logic to look for subvol which has more space available. In cases where all the subvols have lesser than Min_free_disk available , the one with max space and atleast one inode is available. Known Issue: Cannot ensure that first file that is created right after min-free-value is crossed on a brick will get created in other brick because disk usage stat takes some time to update in glusterprocess. Will fix that as part of another bug. Change-Id: If3ae0bf5a44f8739ce35b3ee3f191009ddd44455 BUG: 858488 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/4420 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: ignore EEXIST error in mkdir to avoid GFID mismatchAnand Avati2013-02-031-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In dht_mkdir_cbk, EEXIST error is treated like a true error. Because of this the following sequence of events can happen, eventually resulting in GFID mismatch and (and possibly leaked locks and hang, in the presence of replicate.) The issue exists when many clients concurrently attempt creation of directory and subdirectory (e.g mkdir -p /mnt/gluster/dir1/subdir) 0. First mkdir happens by one client on the hashed subvolume. Only one client wins the race. Others racing mkdirs get EEXIST. Yet other "laggers" in the race encounter the just-created directory in lookup() on the hash dir. 1. At least one "lagger" lookup() notices that there are missing directories on other subvolumes (which the "winner" mkdir is yet to create), and starts off self-heal of the directory. 2. At least on some subvolumes, self-heal's mkdir wins the race against the "winner" mkdir and creates the directory first. This causes the "winner" mkdir to experience EEXIST error on those subvolumes. 3. On other subvolumes where "winner" mkdir won the race, self-heal experiences EEXIST error, but self-heal is properly translating that into a success (but mkdir code path is not -- which is the bug.) 4. Both mkdir and self-heal assign hash layouts to the just created directory. But self-heal distributes hash range across N (total) subvolumes, whereas mkdir distributes hash range across N - M (where M is the number of subvolumes where mkdir lost the race). Both the clients "cache" their respective layouts in the near future for all future creates inside them (evidence in logs) 5. During the creation of the subdirectory, two clients race again. Ideally winner performs mkdir() on the hashed subvolume and proceeds to create other dirs, loser experiences EEXIST error on the hashed subvolume and backs off. But in this case, because the two clients have different layout views of the parent directory (because of different hash splits and assignements), the hashed subvolumes for the new directory can end up being different. Therefore, both clients now win the race (they were never fighting against each other on a common server), assigning different GFIDs to the directory on their respective (different) subvolumes. Some of the remaining subvolumes get GFID1, others GFID2. Conclusion/Fix: Making mkdir translate EEXIST error as success (just the way self-heal is already rightly doing) will bring back truth to the design claim that concurrent mkdir/self-heals perform deterministic + idempotent operations. This will prevent the differing "hash views" by different clients and thereby also avoid GFID mismatch by forcing all clients to have a "fair race", because the hashed subvolume for all will be the same (and thereby avoiding leaked locks and hangs.) Change-Id: I84592fb9b8a3f739a07e2afb23b33758a0a9a157 BUG: 907072 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4459 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* cluster/dht: stack wind with cookiev3.4.0qa8Varun Shastry2013-01-313-29/+45
| | | | | | | | | | | | | | | Default_fops uses stack_wind_tail. It winds without creating the frame leading into wrong subvol return in the cookie. To avoid the problem caused by the same, we're getting the subvol by passing the cookie. Change-Id: I51ee79b22c89e4fb0b89e9a0bc3ac96c5b469f8f BUG: 893338 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4388 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* libglusterfs/syncop: do not hold ref on the fd in cbkRaghavendra Bhat2013-01-301-6/+6
| | | | | | | | | | | | | * Do not do fd_ref in cbks of the fops which return a fd (such as open, opendir, create). Change-Id: Ic2f5b234c5c09c258494f4fb5d600a64813823ad BUG: 885008 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4282 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: get_layout should account only available subvolsv3.4.0qa7shishir gowda2013-01-231-4/+3
| | | | | | | | | | | | | | | | The earlier logic used to check if (layout-spread-count <= subvol_cnt - decommissioned bricks). With this if a subvol was down, and layout-spread was > upsubvols, a mkdir ended up creating holes in the layout. The fix is to consider only the combination of subvols which are usable (not down or not decommissioned). Change-Id: I61ad3bcaf4589f5a75f7887cfa595c98311ae3bb BUG: 902610 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4412 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd/cli: Updated the options descriptions for "volume set help"Avra Sengupta2013-01-211-4/+24
| | | | | | | | | Change-Id: I0db00b7334bb9707ab48bd661ac03a3ad818d6e4 BUG: 893458 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4393 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: fixes for gcc's '-pedantic' flag buildAvra Sengupta2013-01-211-1/+1
| | | | | | | | | | | | | * warnings on 'void *' arguments * warnings on empty initializations * warnings on empty array (array[0]) Change-Id: Iae440f54cbd59580eb69f3ecaed5a9926c0edf95 BUG: 875913 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4219 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: If cached_subvol is down, return ENOTCONN in lookupshishir gowda2013-01-211-1/+10
| | | | | | | | | | | | | When we follow a linkfile, and the lookup returns a ENOTCONN error, return the error, as the cached subvol is down, and lookup_everywhere wont succeed, but actually ends up clearing the linkfile, and clearing the namespace. Change-Id: I772bf71531bc646e8fb62d3e8549a5fe0f3896da BUG: 893378 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4383 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: update ctx-time only if we receive the new iattVarun Shastry2013-01-172-5/+8
| | | | | | | | | | | | | | | 1. Used local->postparent(contains merged iatt of all succesful calls) instead of postparent for dht ctx time update. 2. dht_inode_ctx_time_update avoided in case of opret -1. Change-Id: Ie04a7842a41c241f911b6a3f76267b996d27fb43 BUG: 881013 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4338 Reviewed-by: Shishir Gowda <sgowda@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Add "afr.readdir-failover=off" option the rebalance processshishir gowda2012-12-161-7/+27
| | | | | | | | | | | | | | | | | By failing over readdir (default behaviour), rebalance could get duplicate files, as readdir would re-read from offset 0. Rebalance should not attempt to migrate these files again. Additionally, we need to handle these cases as failure in rebalance crawl. No test case provided, as we cannot determine the read child in afr. Change-Id: If07508b4f92dacc17e0f695b48a866c7c66004be BUG: 859387 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4300 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: re-set layouts to prevent overlapsshishir gowda2012-12-114-8/+55
| | | | | | | | | | | | | | | | | When subvols-per-directory option is used, with bricks addition/removal the layouts might get distributed to other subvols, which were not part of the layout before. We need to clean up layouts on old subvolumes, to prevent overlaps. Also, we need to make sure if layout-cnt is never less than subvolume-cnt. Change-Id: I00994a092ca0c99aedcc41bd9412d43460f88a04 BUG: 884455 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4281 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: support auto-NUFA optionJeff Darcy2012-12-041-9/+67
| | | | | | | | | | | | | | | Many people have asked for behavior like the old NUFA, which builds and seems to run but was previously impossible to enable/configure in a standard way. This change allows NUFA to be enabled instead of DHT from the command line, with automatic selection of the local subvolume on each host. Change-Id: I0065938db3922361fd450a6c1919a4cbbf6f202e BUG: 882278 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4234 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fix memory leaksRaghavendra Bhat2012-12-042-1/+3
| | | | | | | | | | | | | * write-behind: free the inode context in wb_forget * distribute: in readdirp callback put the allocated context to the inode * distribute: check if the layout is NULL before accessing it in layout_unref Change-Id: I7698f81b85b99d06bf6b01fc1a6e51e1593b5e27 BUG: 790709 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4250 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: fail fix-layout if any of the subvol is downshishir gowda2012-11-295-35/+47
| | | | | | | | | | | | | | | | If any subvolume is down, and a layout is re-written and hash values change, entry names in the downed subvol can be reused in the other subvol which got the same hash range. when the downed subvol is brought back up, duplicate entried might appear Also separated handling of ENOSPC and ENOTCONN error. Change-Id: I5ed93990425a4cee70df2dab7c7c119fdc87ad56 BUG: 860663 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4000 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Heal dir uid/gidshishir gowda2012-11-294-1/+105
| | | | | | | | | | | | Identify mismatching uid/gid in lookup, and trigger a syncop heal. uid/gid of subvol with latest ctime is trusted (local->prebuf). Change-Id: Ib5c4bc438e7f4b1f33080e73593f40f400e997f0 BUG: 862967 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: send ACCESS call on dir to first_up_subvol if cached is downv3.4.0qa3shishir gowda2012-11-291-0/+11
| | | | | | | | | Change-Id: I4f518a969bbe3a11075e7c9ae10bd21bf059d5f3 BUG: 867253 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4240 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/distribute: send getxattr on LOCKINFO to only cached subvolumes.Raghavendra2012-11-271-1/+3
| | | | | | | | | | | | lk is sent to only cached subvolume. Hence there is no point in sending LOCKINFO to other children (even in case of directories). Change-Id: Ia20fc358dfa84cee9a52d1f613564ff6f25aa0c9 BUG: 808400 Signed-off-by: Raghavendra <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4123 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Implement float percentagePranith Kumar K2012-11-234-20/+21
| | | | | | | | | Change-Id: Ia7ea63471f0bbd74686873f5f6f183475880f1a0 BUG: 839595 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: dump the layout information of directories onlyRaghavendra Bhat2012-11-191-9/+18
| | | | | | | | | | | | | | | testcase: The changes are for removing gf_log from statedump related sections in dht and using pthread_mutex_trylock in statedump sections. Changes are internal. So tests were done by attaching gdb to the process and executing by manually changing the values of some of the pointers. Change-Id: I41fa76c1812b462cb76f5bbf2fd14de080e73895 BUG: 843822 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4117 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: ignore empty ->hashed_subvol during lookupVenky Shankar2012-10-171-9/+1
| | | | | | | | | | | | | | | ->hashed_subvol is not valid (== NULL) when the subvolume the entity hashes to is down. For directories, we need not rely on ->hashed_subvol as we aggregate information from all subvolumes. So, during lookup, NULL ->hashed_subvol is ingored but logged. Change-Id: I306e4e274fe29d60ff028add4a6c3bcd67b2f314 Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 856459 Reviewed-on: http://review.gluster.org/4046 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Always return the latest time in struct iatt.shishir gowda2012-10-166-50/+264
| | | | | | | | | | | | | | | | | | | save the a/c/mtime in inode_ctx, and dht_inode_ctx_update checks the passed iatte, and updates the stat's time, and inode_ctx's time accordingly. For preparent times, only the iatt stat to be returned is updated, not the ctx. With this, update, WIPE is removed, as we would always be passing back the latest mtime, and hence cache times will be relevant. TODO-handle rename WIPE calls Change-Id: I8e4c738cd830f3fafeef789c9181f9c242ac96a2 BUG: 857791 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3737 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: split CPPFLAGS from CFLAGSJeff Darcy2012-10-031-2/+3
| | | | | | | | | | | | | | | | | Automake provides a separate variable for preprocessor flags (*_CPPFLAGS). They are already uses in a few places, so make it consistent and use it everywhere. Note that cflags obtained from pkg-config often are cppflags, which is why LIBXML2_CFLAGS moves with into AM_CPPFLAGS, for example. Change-Id: I15feed1d18b2ca497371271c4b5876d5ec6289dd BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4029 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove useless explicit -fPIC -shared fromJeff Darcy2012-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | CFLAGS libtool will automatically add "-fPIC" to the compiler command line as needed, so there is no need to specify it separately. "-shared" is normally a linker flag and has an odd effect when used with libtool --mode=compile, namely that it inhibits production of static objects. For that however, using AC_DISABLE_STATIC is a lot simpler. Change-Id: Ic4cba0fad18ffd985cf07f8d6951a976ae59a48f BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4027 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove -nostartfiles flagJeff Darcy2012-10-021-1/+1
| | | | | | | | | | | | | | | The "-nostartfiles" is a discouraged option and is documented to potentially result in undesired behavior. Since I see no reason why it should be in glusterfs, remove it. Change-Id: I56f2b08874516ebad91447b2583ca2fb776bb7ab BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4018 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: consolidate common compilation flags into one variableJeff Darcy2012-10-011-1/+1
| | | | | | | | | | | | | | | Some -D flags are present in all files, so collect them. This adds -D${GF_HOST_OS} to some compiler command lines, but this should not be a problem. Change-Id: I1aeb346143d4984c9cc4f2750c465ce09af1e6ca BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4013 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fixed some general typing errors.Varun Shastry2012-09-271-1/+1
| | | | | | | | | | | | Eg: changed recieved to received Change-Id: I360fcb99c97c8a0222e373fee20ea2fccfb938db BUG: 860543 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3998 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: improve dht_fix_layout_of_directory for better re-assignmentAnand Avati2012-09-121-145/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jeff Darcy wrote: > AFAICT, the fix-layout code doesn't do the same rotation that the > new-directory code does. Therefore, the new bricks always claim > completely predictable hash ranges for every directory, leading to > either a 0-1-2-3 pattern or a 1-0-2-3 pattern. In other words, a > file whose hash falls into the second quarter of the range will always > be assigned to brick 2, and a file whose hash falls into the fourth > quarter will always be assigned to brick 3. The rest will be split > according to the original pattern. Put still another way, instead of > same-named files in different directories being spread across N bricks, > they might be spread across only two bricks (bad) or totally > concentrated on one brick (worse) regardless of N. The current dht_fix_layout_of_directory() code, in an attempt to maximize overlap of new layout with existing layout (to minimize movement of data) fails to do a good job of randomizing new assignment even when it could do a better job. In an example where we expand from 2 nodes to 4 nodes, the current possibilities are limited in the following way - (theoretical hash range: 00 - 99) OLD 1 ----- server1: 00 - 49 server2: 50 - 99 NEW 1 ----- server1: 00 - 24 server2: 50 - 74 server3: 25 - 49 server4: 75 - 99 OLD 2 ----- server1: 50 - 99 server2: 00 - 49 NEW 2 ------ server1: 50 - 74 server2: 00 - 24 server3: 25 - 49 server4: 75 - 99 The above shows that when add-brick from 2 bricks to 4 bricks, server3 and server4 always get the _same_ hash range no matter what the original hash range assignment was. The fix in this patch is first do the standard new directory assignment to a directory (with rotation etc.) and then do the reassignment to maximize overlap. This way newly added servers still get random ranges and existing servers have a probability of getting either of the quarters which were part of its half previously. The same principles hold for all add-brick from M to M+N. Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f BUG: 853258 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/3883 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: handle percent option for 'min-free-disk'Amar Tumballi2012-09-071-0/+11
| | | | | | | | | | | | | | | * with the init option cleanups, setting of 'conf->disk_unit' was reset, which made it not set the '%' in the option. * bring a global check, which makes the option assume its percent, as long as value is < 100. Change-Id: I00bd1395a309cdc596a2b2b80304c6d98696a24a Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 852889 Reviewed-on: http://review.gluster.org/3918 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: remove gf_log() from statedump functionsAmar Tumballi2012-09-061-3/+0
| | | | | | | | | Change-Id: I83cccab6819d6a74e96c2717ca539fa1568cac89 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 843822 Reviewed-on: http://review.gluster.org/3912 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/dict: make 'dict_t' a opaque objectAmar Tumballi2012-09-061-14/+11
| | | | | | | | | | | | | | | * ie, don't dereference dict_t pointer, instead use APIs everywhere * other than dict_t only 'data_t' should be the valid export from dict.h * added 'dict_foreach_fnmatch()' API * changed dict_lookup() to use data_t, instead of data_pair_t Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 850917 Reviewed-on: http://review.gluster.org/3829 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht/rebalance: set the correct ownership on the dst file.shishir gowda2012-08-281-0/+8
| | | | | | | | | | | | | | | Currently, the dst file created has root:root ownership, till migration is completed. During this phase, open fails on the dst file if uid/gid is non-root. Setting the dst_file to the correct ownership fixes the issue Change-Id: Icfec89eb10dc866cdee38dab17695fe21174ef99 BUG: 852361 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3861 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: handle 'dataonly' flag in syncop_fsync()Amar Tumballi2012-08-201-1/+1
| | | | | | | | | | | | * and also in syncop_readv(), don't look at _cbk args if op_ret is < 0. Change-Id: I3ab2982bc6d186e75b6adb74c8981e4ff7058bbe Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 839950 Reviewed-on: http://review.gluster.org/3828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: don't leak upon GF_REALLOC failureJim Meyering2012-08-191-4/+5
| | | | | | | | | Change-Id: I7dfabcc2981df5c5a1e1a54c3135400a60626cd1 BUG: 846755 Signed-off-by: Jim Meyering <meyering@redhat.com> Reviewed-on: http://review.gluster.com/3798 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Optimize readdirp calls in DHTshishir gowda2012-08-133-0/+35
| | | | | | | | | | | | | | | | | Bring in option which is supported by posix xlator to filter out directory's entries from being returned. DHT would now request non-first subvols to filter out directory entries. dht xlator-option readdir-optimize will enable this optimization Change-Id: I35224bc81c9657f54f952efac02790276c35ded5 BUG: 838199 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.com/3772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Suppress user xattr mismatch log messageshishir gowda2012-07-251-1/+1
| | | | | | | | | | | | | | | | | Changing the log-level to DEBUG. Xattr mismatch can occur when parallel setxattr's race, or when one of the bricks was down. A subsequent setxattr will fix the condition when all the subvols are up. In this case, the 'user.swift' xattr used by ufo was out of sync, but did not cause any other error. Change-Id: I6fdff78869b8ff72c305bbe122033e6c1d9d3cff BUG: 838197 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.com/3722 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Mohammed Junaid <junaid@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfs_ctx_t: un-globalize the filesystem contextAnand Avati2012-07-171-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | So far there has been a global glusterfs_ctx_t object which represents the running instance of the filesystem (client or server). It contains the various graphs, connection to the management daemon over which new graphs are obtained, calls stacks issued on this filesystem, and a bunch of such things. With the introduction of libgfapi, it is no more true that there will be only one filesystem context in a process. Applications can be written to use libgfapi and obtain serveral instances of different filesystems/volumes in the same process. This involves messy untangling of assumptions inside libglusterfs that there would only be one global glusterfs_ctx_t and offload that assumption to glusterfsd/ and cli/ (where it is true). Change-Id: Ifd7d1259428c26076140a5764a2dc7361694139c BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3678 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* remove useless if-before-free (and free-like) functionsJim Meyering2012-07-137-70/+34
| | | | | | | | | | | | See comments in http://bugzilla.redhat.com/839925 for the code to perform this change. Signed-off-by: Jim Meyering <meyering@redhat.com> BUG: 839925 Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a Reviewed-on: http://review.gluster.com/3661 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Remove dht dependency on glusterfsd-mgmtshishir gowda2012-06-291-1/+7
| | | | | | | | | | | | | | | glusterfs_ctx->notify can be used by any xlator to talk to glusterfsd-mgmt. Note- This is for any rpc communication initiated by the xlator, and not from glusterd. Change-Id: Ic0e4af106fe1e98d797ca621facda8839b87598a BUG: 835757 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.com/3618 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: coverity fixes (mostly resource leak fixes)Amar Tumballi2012-06-053-14/+33
| | | | | | | | | | | currently working on obvious resource leak reports in coverity Change-Id: I261f4c578987b16da399ab5a504ad0fda0b176b1 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 789278 Reviewed-on: http://review.gluster.com/3265 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: set conf->defrag to NULL after freeing the defrag structureRaghavendra Bhat2012-05-311-2/+3
| | | | | | | | | | | | | Also no need to free the xlator object after rebalance is over, as the process is about to be killed. Change-Id: I6973e43c0353b5de61c0b39e52a22c618be361f4 BUG: 826584 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3495 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: set the inode layout in readdirp_cbk() for filesAmar Tumballi2012-05-311-0/+16
| | | | | | | | | | | | with this, inode-linking it in readdirp_cbk will be neater. Change-Id: Ie2cd646438f851e1755e9b6a3fc9898059bee359 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 816140 Reviewed-on: http://review.gluster.com/2717 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* distribute: support user-specified layouts.Jeff Darcy2012-05-314-3/+15
| | | | | | | | | | | | | The new type is DHT_HASH_TYPE_DM_USER=1 (on disk in network byte order) and we treat it the same as DHT_HASH_TYPE_DM except that we don't stomp on it during rebalance. Change-Id: I893571a9b89577acdea2fe868915b18d3663fd77 BUG: 807312 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/3004 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>