summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/dht: ignore empty ->hashed_subvol during lookupVenky Shankar2012-10-171-9/+1
| | | | | | | | | | | | | | | ->hashed_subvol is not valid (== NULL) when the subvolume the entity hashes to is down. For directories, we need not rely on ->hashed_subvol as we aggregate information from all subvolumes. So, during lookup, NULL ->hashed_subvol is ingored but logged. Change-Id: I306e4e274fe29d60ff028add4a6c3bcd67b2f314 Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 856459 Reviewed-on: http://review.gluster.org/4046 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Always return the latest time in struct iatt.shishir gowda2012-10-166-50/+264
| | | | | | | | | | | | | | | | | | | save the a/c/mtime in inode_ctx, and dht_inode_ctx_update checks the passed iatte, and updates the stat's time, and inode_ctx's time accordingly. For preparent times, only the iatt stat to be returned is updated, not the ctx. With this, update, WIPE is removed, as we would always be passing back the latest mtime, and hence cache times will be relevant. TODO-handle rename WIPE calls Change-Id: I8e4c738cd830f3fafeef789c9181f9c242ac96a2 BUG: 857791 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3737 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Edited log message in afr_sh_entry_expunge_entry_cbkVenkatesh Somyajulu2012-10-121-1/+2
| | | | | | | | | Change-Id: I9f7562d28c8bc798552c403164397f929a7bd1e7 BUG: 860246 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4052 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Preventing client crashing as the callings of GF_CALLOC has been failed.linbaiye2012-10-114-31/+107
| | | | | | | | | | | | | As the callings of GF_CALLOC can seldom come to a failure, glusterfs client will crash due to segment fault. We should have returned once the variables of transaction's local can't be alloced. Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b BUG: 861335 Signed-off-by: linbaiye <linbaiye@gmail.com> Reviewed-on: http://review.gluster.org/4005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: don't use synctask_new from within a synctaskJeff Darcy2012-10-111-3/+14
| | | | | | | | | | Change-Id: Iebf821ff720c63ab6da4b219d82c7f1d00769992 BUG: 862838 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4032 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: Changes and enhancements to XML outputKaushal M2012-10-111-4/+8
| | | | | | | | | | | | | | | | | | | | This patch contains several xml related changes which fix some bugs and introduce xml output for commands which were missing it. These include, * XML output for rebalance & remove-brick status * XML output for replace-brick * XML output for 'volume status all' in on xml document * proper XML output for "volume {create|start|stop|delete}" * type & status of a volume in 'volume info' is now given as a string as well This patch also cleans up the '#if (HAVE_LIB_XML)' sections from the code-base, so that it is not littered around. Change-Id: I5bb022adf0fedf7e3ead92b4b79bfa02b0b5fef5 BUG: 828131 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3869 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr Changed the message's log level from Error to DebugVenkatesh Somyajulu2012-10-101-3/+3
| | | | | | | | | Change-Id: Ic2506561367bfec9022dc53e9b17b03dc343df95 BUG: 859411 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4055 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: check transaction type for eager-lock after it is setPranith Kumar K2012-10-101-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: Ib1c886866f28788aed67622982e86d667b2cdb80 BUG: 864786 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4053 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: split CPPFLAGS from CFLAGSJeff Darcy2012-10-035-10/+16
| | | | | | | | | | | | | | | | | Automake provides a separate variable for preprocessor flags (*_CPPFLAGS). They are already uses in a few places, so make it consistent and use it everywhere. Note that cflags obtained from pkg-config often are cppflags, which is why LIBXML2_CFLAGS moves with into AM_CPPFLAGS, for example. Change-Id: I15feed1d18b2ca497371271c4b5876d5ec6289dd BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4029 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove useless explicit -fPIC -shared fromJeff Darcy2012-10-035-10/+10
| | | | | | | | | | | | | | | | | | | | CFLAGS libtool will automatically add "-fPIC" to the compiler command line as needed, so there is no need to specify it separately. "-shared" is normally a linker flag and has an odd effect when used with libtool --mode=compile, namely that it inhibits production of static objects. For that however, using AC_DISABLE_STATIC is a lot simpler. Change-Id: Ic4cba0fad18ffd985cf07f8d6951a976ae59a48f BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4027 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove -nostartfiles flagJeff Darcy2012-10-025-5/+5
| | | | | | | | | | | | | | | The "-nostartfiles" is a discouraged option and is documented to potentially result in undesired behavior. Since I see no reason why it should be in glusterfs, remove it. Change-Id: I56f2b08874516ebad91447b2583ca2fb776bb7ab BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4018 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: consolidate common compilation flags into one variableJeff Darcy2012-10-015-5/+5
| | | | | | | | | | | | | | | Some -D flags are present in all files, so collect them. This adds -D${GF_HOST_OS} to some compiler command lines, but this should not be a problem. Change-Id: I1aeb346143d4984c9cc4f2750c465ce09af1e6ca BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4013 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to set readdir-size in entry-self-healPranith Kumar K2012-10-013-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Entry self-heal does lookups on all the entries that are read in readdir. More the size of readdir more number of lookups happen in parallel. It is observed that it leads to HUGE cpu spikes rendering everything else on the system unusable. Fix: Provided the option self-heal-readdir-size to configure the size. Default value is at 1KB. Tests: Checked that the readdirs are happening with the configured value in entry-self-heal. Change-Id: Icaa937ad88857e6f9a12375b1e7f6a49192bc8b1 BUG: 860895 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4002 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fixed some general typing errors.Varun Shastry2012-09-271-1/+1
| | | | | | | | | | | | Eg: changed recieved to received Change-Id: I360fcb99c97c8a0222e373fee20ea2fccfb938db BUG: 860543 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3998 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Trigger heal on local subvols on any child_upPranith Kumar K2012-09-251-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The index in the child that comes online is generally empty because the changes would have happened on the other child which has been up. So the sync begins when the other child's poll time-out happens (i.e. 10 minutes). The expectation is that the sync must be triggered as soon as the connection with any brick is established. Fix: Whenever any child_up happens trigger the index self-heal on all local children in the replicate subvolume. Tests: 1) Checked that the self-heal is triggered on all local children whenever any child comes online. 2) Checked that the volume heal commands are working fine. Change-Id: I4f64737866470a2f989349a889ea52782930e11d BUG: 852741 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3972 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Wake up post-op on non-co-operative transactionPranith Kumar K2012-09-251-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The problem is observed when kernel untar is done. One file untar happens every second. The reason for this is, setattr lock is blocked on the prev fd data-transaction full-lock (because of eager-lock). Because of post-op-delay the post-op (xattrop + unlock) of the prev data-transaction happens after 1 sec. Until this the setattr is blocked resulting in performance problems in untar. Fix: Whenever an loc data, meta-data transaction comes, it should wakeup the prev-post-op on the same process' fd. Tests: The performance problem in untar went away. I put a breakpoint in client_finodelk for a 2G file dd and the inodelk is hit only 4 times. This confirms that the change does not affect post-op-delay in a -ve way. Change-Id: Ice3c2a1211f4dca6520a19bc4ba6cb9efb2902ad BUG: 845754 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3975 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Clean up of typepunning errors ( Strict aliasing warnings )Varun Shastry2012-09-171-1/+3
| | | | | | | | | | | Change-Id: I48733967facc526fb523a8dc9bd068f8c5cc5971 BUG: 764282 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3950 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-09-138-56/+48
| | | | | | | | | | | | License message changed for server-side, dual license GPLV2 and LGPLv3+. Change-Id: Ia9e53061b9d2df3b3ef3bc9778dceff77db46a09 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3940 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: improve dht_fix_layout_of_directory for better re-assignmentAnand Avati2012-09-121-145/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jeff Darcy wrote: > AFAICT, the fix-layout code doesn't do the same rotation that the > new-directory code does. Therefore, the new bricks always claim > completely predictable hash ranges for every directory, leading to > either a 0-1-2-3 pattern or a 1-0-2-3 pattern. In other words, a > file whose hash falls into the second quarter of the range will always > be assigned to brick 2, and a file whose hash falls into the fourth > quarter will always be assigned to brick 3. The rest will be split > according to the original pattern. Put still another way, instead of > same-named files in different directories being spread across N bricks, > they might be spread across only two bricks (bad) or totally > concentrated on one brick (worse) regardless of N. The current dht_fix_layout_of_directory() code, in an attempt to maximize overlap of new layout with existing layout (to minimize movement of data) fails to do a good job of randomizing new assignment even when it could do a better job. In an example where we expand from 2 nodes to 4 nodes, the current possibilities are limited in the following way - (theoretical hash range: 00 - 99) OLD 1 ----- server1: 00 - 49 server2: 50 - 99 NEW 1 ----- server1: 00 - 24 server2: 50 - 74 server3: 25 - 49 server4: 75 - 99 OLD 2 ----- server1: 50 - 99 server2: 00 - 49 NEW 2 ------ server1: 50 - 74 server2: 00 - 24 server3: 25 - 49 server4: 75 - 99 The above shows that when add-brick from 2 bricks to 4 bricks, server3 and server4 always get the _same_ hash range no matter what the original hash range assignment was. The fix in this patch is first do the standard new directory assignment to a directory (with rotation etc.) and then do the reassignment to maximize overlap. This way newly added servers still get random ranges and existing servers have a probability of getting either of the quarters which were part of its half previously. The same principles hold for all add-brick from M to M+N. Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f BUG: 853258 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/3883 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: handle percent option for 'min-free-disk'Amar Tumballi2012-09-071-0/+11
| | | | | | | | | | | | | | | * with the init option cleanups, setting of 'conf->disk_unit' was reset, which made it not set the '%' in the option. * bring a global check, which makes the option assume its percent, as long as value is < 100. Change-Id: I00bd1395a309cdc596a2b2b80304c6d98696a24a Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 852889 Reviewed-on: http://review.gluster.org/3918 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: add option description of 'open'.Jules.Wang2012-09-061-0/+2
| | | | | | | | | Signed-off-by: Jules Wang <lancelotds@163.com> Change-Id: I6c7dd337c758e82e9d58d4d65f53b5aa72ac5dfb BUG: 764890 Reviewed-on: http://review.gluster.org/3895 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: remove gf_log() from statedump functionsAmar Tumballi2012-09-061-3/+0
| | | | | | | | | Change-Id: I83cccab6819d6a74e96c2717ca539fa1568cac89 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 843822 Reviewed-on: http://review.gluster.org/3912 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/dict: make 'dict_t' a opaque objectAmar Tumballi2012-09-066-32/+26
| | | | | | | | | | | | | | | * ie, don't dereference dict_t pointer, instead use APIs everywhere * other than dict_t only 'data_t' should be the valid export from dict.h * added 'dict_foreach_fnmatch()' API * changed dict_lookup() to use data_t, instead of data_pair_t Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 850917 Reviewed-on: http://review.gluster.org/3829 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Don't stop entry/data self-heal on metadata split-brainPranith Kumar K2012-08-292-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Entry/Data self-heal is orthogonal to meta-data self-heal. meta-data split-brain should not affect entry/data self-heal. Fix: Prevented aborting rest of the self-heals when metadata split-brain happens. Tests: 1) Simulated meta-data split-brain then checked data-self-heal succeed on regular file, entry-self-heal succeed on dir. 2) Reset meta-data change-log on one of the subvols and checked that meta-data self-heal also completes. 3) Executed self-heal sanity script. Change-Id: I05ca222d855d3a6000703e3775471d0f874d35d6 BUG: 851451 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3853 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <obdurodon@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht/rebalance: set the correct ownership on the dst file.shishir gowda2012-08-281-0/+8
| | | | | | | | | | | | | | | Currently, the dst file created has root:root ownership, till migration is completed. During this phase, open fails on the dst file if uid/gid is non-root. Setting the dst_file to the correct ownership fixes the issue Change-Id: Icfec89eb10dc866cdee38dab17695fe21174ef99 BUG: 852361 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3861 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-08-288-116/+42
| | | | | | | | | | | | | | | | | | The license message is changed to Copyright (c) 2008-2012 Red Hat, Inc. <http://www.redhat.com> This file is part of GlusterFS. This file is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. Change-Id: I07d2b63ed5fbbbd1884f1e74f2dd56013d15b0f4 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: Avoid excessive logging in self-heal.Krishnan Parthasarathi2012-08-236-22/+25
| | | | | | | | | | | | | | - (Excessive) Logging has been very useful as 'bread-crumbs' in many a root-cause analyses. This patch aims at avoiding logging when the information could be reconstructed using the xattrs, statedump, and/or "volume heal" CLI commands. Change-Id: Iebc6b10ae18f0dd9704bdc6dd03bcfe0f2a09abd BUG: 844804 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3805 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Self-heald: Prevent logging of errno ENOENTVenkatesh Somyajulu2012-08-201-4/+4
| | | | | | | | | Change-Id: Ie56228dfbdc7e519a344681487164a835488a470 BUG: 835423 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/3826 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: handle 'dataonly' flag in syncop_fsync()Amar Tumballi2012-08-201-1/+1
| | | | | | | | | | | | * and also in syncop_readv(), don't look at _cbk args if op_ret is < 0. Change-Id: I3ab2982bc6d186e75b6adb74c8981e4ff7058bbe Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 839950 Reviewed-on: http://review.gluster.org/3828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: don't leak upon GF_REALLOC failureJim Meyering2012-08-191-4/+5
| | | | | | | | | Change-Id: I7dfabcc2981df5c5a1e1a54c3135400a60626cd1 BUG: 846755 Signed-off-by: Jim Meyering <meyering@redhat.com> Reviewed-on: http://review.gluster.com/3798 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Optimize readdirp calls in DHTshishir gowda2012-08-133-0/+35
| | | | | | | | | | | | | | | | | Bring in option which is supported by posix xlator to filter out directory's entries from being returned. DHT would now request non-first subvols to filter out directory entries. dht xlator-option readdir-optimize will enable this optimization Change-Id: I35224bc81c9657f54f952efac02790276c35ded5 BUG: 838199 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.com/3772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Unwind with correct pre/post parent bufsPranith Kumar K2012-08-023-403/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: In case of dir fops create, mknod, mkdir, link, symlink, rename if the fop fails on read-child then unwinds are happening with all-zero pre/post iatt-bufs. The bug occurs because the parent bufs are not saved if the response is not from read-child. Fix: Save the pre/post-bufs for the first response. If the response comes from read-child, overwrite whatever we have cached. Tests: Attached the mount process to gdb. Tested that the unwinds happen with proper pre/post iatt bufs in the following cases: 1) All success case 2) Failure on read-child 3) Failure on non-read-child 4) Failure on all children. Tested soft-link self-heal to test the change made in that. Tested errno ENOTEMPTY for rmdir, rename fops. Change-Id: I82882423d2d766b4f4a3044203bcb5dbcaee1755 BUG: 845242 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Handle child_up & fd not opened case in xactionPranith Kumar K2012-08-011-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: When an fd is opened while a brick is down, after the brick comes back up afr issues open on the other brick. It can fail for a number of reasons (enoent etc). While the system is in that state, inode/entrylks pre-op happen only on the brick that is up and fd is opened for fd-fops. post-op should consider only the bricks where both pre-op and fop succeeded as success, rest of them as failures. Code now marks only the children that are down as failures as opposed to child_down & fd-not-opened. This makes change-log appear as success on the subvolume where we did not do any fop leading to no change-log but differences in data/metadata for reg-files. Fix: Mark non-participants of fop as failure. This is tracked in transaction.pre_op[]. Tests: Simulated the scenario using err-gen on top of one of the client xlator which fails all fops always. Performed fops and the changelog represented pending fops on the brick with err-gen loaded. Tested the case of brick down and perform entry/metadata/data operations to confirm they still work as expected. Change-Id: I41905936126b19abba56ca581c0301a894507e1a BUG: 844987 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3765 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Handle failures in fop_cbk gracefullyPranith Kumar K2012-07-311-31/+46
| | | | | | | | | | | | | | | | | | | | | | | | | RCA: Afr crashes when a last fop response fails and 'fop output' arguments are NULL. Afr does not handle these gracefully. Fix: Changed the fops to not access the 'fop output' arguments in case of failures. Tests: Changed afr wind_cbk code to fail the last response by setting op_ret as -1 and op_errno as ENOMEM and setting all other output variables as NULL to test the change. Removed the code to verify success cases. No crashes or errors seen. Change-Id: Iad9bc54db093a162f85bfb8dbeeda5b95acd21d8 BUG: 844689 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3760 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: update loc inode after inode_linkPranith Kumar K2012-07-311-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | RCA: inode passed to inode_link is not assigned any gfid if the inode with that gfid is already linked, so loc for opendir does not have a valid inode Fix: Use the linked_inode returned by inode_link in the loc to perform further operations on the entry. Tests: Checked that opendir comes with an loc with valid inode. Checked that re-opendir happens successfully. Tested index, full self-heal work fine with the fix. Change-Id: Idf4ced4cc2320133744962059d363e373af0e5ec BUG: 826580 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3748 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/stripe: handle short writes and errors in writev callbackBrian Foster2012-07-303-38/+106
| | | | | | | | | | | | | | | | | | | | | | | cluster/stripe write callback handling is broken in the event of server side errors and short writes due to crudely summing up the return values from each node. This can produce incorrect results or cause an application to rewrite the wrong portions of a buffer in an attempt to handle this condition. Modify cluster/stripe writev handling to record the requested size of each write and use this data to return the number of consecutive bytes written from the original request. This allows an application to retry a write at the point of error (and potentially consume said error). BUG: 809975 Change-Id: Ic35cb1e092c29545205aa32e352485c507534ce0 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.com/3700 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <sgowda@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Modified split-brain handlingPranith Kumar K2012-07-265-57/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA The bug is observed because the decision to mark a file in split-brain is taken outside appropriate locks. Lookup gathers xattrs outside any lock. The xattrs being in split-brain in lookup should only be taken as a hint. Appropriate inodelks should be taken before confirming a split-brain. Self-heal confirms this at the moment. If data/metadata self-heal is turned off, inspecting of xattrs could not be performed so split-brain behavior does not work correctly if the self-heal options are turned off. Fix Self-heals are launched to inspect xattrs even when the data/metadata self-heal options are turned off. The decision to heal data/metadata after the xattrs are inspected is based on whether the options are turned on/off. So decision to set/reset split-brain flag is taken inside appropriate locks. Testcases: tests 33-36 in https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh Change-Id: Ia8aeab08208b50c06609ad35a9d72f3d553ee343 BUG: 833727 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Filter O_TRUNC in afr-fix-openPranith Kumar K2012-07-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | RCA: When open was done while a brick is down, afr opens the file after the brick comes backup. If this happens after the self-heal on the file is completed by self-heald etc, the file will end up in truncated state. Fix: Filter O_TRUNC while afr-fix-open because afr_open turns O_TRUNC into truncate transaction, so there will be pending changelog for the subvolume on which open fails. Testing: Had to simulate the race by stopping fix-open until self-heald completes self-heal on the file after brick online. Change-Id: I32759cc37f4bb34f206d01606a279f17b246dba4 BUG: 841840 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3705 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster: fix crash on link of named pipe in stripe/replicate volBrian Foster2012-07-252-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | A crash occurs when attempting to link a named pipe on a striped, replicated volume. The cause for this crash is attempting to deref a NULL inode pointer in stripe_link_cbk(). The RCA for this bug uncovered a couple of problems: - AFR ignores the inode pointer it receives on failure (returning NULL). - stripe assumes the inode pointer is valid on failure. Either one of these changes addresses the crash, but this patch includes both changes. AFR is modified to pass along the inode pointer it receives (which could still be NULL). stripe is modified to not assume the inode pointer is valid on fop failure. BUG: 842825 Change-Id: I9cb2cc918552620929c3ecbd69bc66d4635eafdc Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.com/3727 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Perform data self-heal for non regular filesPranith Kumar K2012-07-255-217/+162
| | | | | | | | | | | | | | | | | | | | | | | | | RCA: Data self-heal for non regular files open the files and then proceeds using that fd. This approach does not work for symlinks because open on symlink opens the file resolved by it. Fix: If the file is not a regular file then perform self-heal using loc. It needs to get 'big' lock and then perform lookup to get changelog then erase data part of chagelog, then unlock. Test cases: Automated at https://github.com/pranithk/gluster-tests/blob/master/afr/special-file-self-heal-test.sh Change-Id: I924a922f5135872efe2cccf2e712ada082c5689f BUG: 811317 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3724 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/stripe: don't fail if no fctx on a non-regular fileBrian Foster2012-07-251-10/+14
| | | | | | | | | | | | | cluster/stripe broke directory rename. Only check for fctx on regular files. BUG: 842652 Change-Id: I8a1e7ff30d57c994082cb10471f610023713ee53 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.com/3720 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Suppress user xattr mismatch log messageshishir gowda2012-07-251-1/+1
| | | | | | | | | | | | | | | | | Changing the log-level to DEBUG. Xattr mismatch can occur when parallel setxattr's race, or when one of the bricks was down. A subsequent setxattr will fix the condition when all the subvols are up. In this case, the 'user.swift' xattr used by ufo was out of sync, but did not cause any other error. Change-Id: I6fdff78869b8ff72c305bbe122033e6c1d9d3cff BUG: 838197 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.com/3722 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Mohammed Junaid <junaid@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: pass back xdata in createBrian Foster2012-07-231-2/+5
| | | | | | | | | | | | | | | A striped, replicated volume spits an error on file creation because stripe requires xdata to process stripe information and AFR isn't passing it back. This fix was suggested by Amar Tumballi. BUG: 842373 Change-Id: Ia7063590ca5e873d4a4e155989cf067e8a07501f Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.com/3713 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* stripe: filter coalesce key in getxattr()/listxattr()Amar Tumballi2012-07-171-0/+3
| | | | | | | | | | | | as 'stripe-coalesce' is an internal key, no need to show it on top of the mount-point. Change-Id: Iab836e73d59c42774db8a2eee13fe3b0cd994bc9 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 801887 Reviewed-on: http://review.gluster.com/3680 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <sgowda@redhat.com>
* glusterfs_ctx_t: un-globalize the filesystem contextAnand Avati2012-07-171-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | So far there has been a global glusterfs_ctx_t object which represents the running instance of the filesystem (client or server). It contains the various graphs, connection to the management daemon over which new graphs are obtained, calls stacks issued on this filesystem, and a bunch of such things. With the introduction of libgfapi, it is no more true that there will be only one filesystem context in a process. Applications can be written to use libgfapi and obtain serveral instances of different filesystems/volumes in the same process. This involves messy untangling of assumptions inside libglusterfs that there would only be one global glusterfs_ctx_t and offload that assumption to glusterfsd/ and cli/ (where it is true). Change-Id: Ifd7d1259428c26076140a5764a2dc7361694139c BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3678 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* remove useless if-before-free (and free-like) functionsJim Meyering2012-07-1320-180/+88
| | | | | | | | | | | | See comments in http://bugzilla.redhat.com/839925 for the code to perform this change. Signed-off-by: Jim Meyering <meyering@redhat.com> BUG: 839925 Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a Reviewed-on: http://review.gluster.com/3661 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: remove the unused files - round 2Amar Tumballi2012-07-126-6034/+0
| | | | | | | | | BUG: 764890 Change-Id: I3eb626eeaa2a09f0e248444f560c2a0eaf46c642 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.com/3660 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* xlator options: remove overwritten data-self-heal initializerJim Meyering2012-07-111-1/+0
| | | | | | | | | | | | | In the struct volume_options, the "data-self-heal" .default_value = "" setting appeared before a setting of .default_value = "on". Remove the former. Change-Id: Ieddcc18f61581f9448d806cd8bf8eefaaf0118b9 BUG: 789278 Signed-off-by: Jim Meyering <meyering@redhat.com> Reviewed-on: http://review.gluster.com/3589 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: post-op-delay supportAnand Avati2012-07-046-2/+175
| | | | | | | | | | | | | | | | post-op-delay introduces an artificial delay between the OP and POST-OP-CHANGELOG phases of a write transaction to increase the probability of changelog-piggyback and eager-locking to work more efficiently. Also enable eager-locking by default. Change-Id: I865ca4b68512c44818719c7e388952f15d53e6c2 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3621 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
* cluster/afr: cleanup lk_owner and PID messAnand Avati2012-07-043-41/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically PID (frame->root->pid) was used by the locks translator to identify a locker (and make decisions about which locks contend or cooperate/merge). Since the introduction of lock_owner parameter the usage of PID (for locks) was deprecated and is now unused. This patch nukes the usage of PID in AFR The usage of lk_owner has also ended up being a mess, because of the differentiation required between ->lk() and ->inodelk(), (->lk() needs to be identified by the process (roughly) and ->inodelk() needs to be identified by the transaction) and also because of optimizations like eager locking (locks are no more identified by the transaction as they now get inherited by the next transaction). The scheme (and technique) now is: - All FOPs (the third phase of the transaction) happen with the lk_owner which is set by the topmost layer (FUSE, NFS etc.) - All entrylks are issued with lk_owner set to the frame->root address. - Inodelks which will not be subject to eager locking are issued with lk_owner set to frame->root. - Inodelks which are subject to eager locking are issued with lk_owner set to the address of fd_t (which are the only type of frames which get subject to the eager locking optimization) - At the start of the transaction, the transaction frame's lk_owner is set to the either frame->root or fd_t (and never unmodified) depending on the type of transaction. - Just before the third phase (FOP phase) the set lk_owner is "saved" away and overwritten by the lk_owner submitted by the top layer (FUSE or NFS) - Right after the third phase, the saved lk_owner is "restored" to resume the transaction into the POST-OP and eventually UNLOCK using the same lk_owner which was used during the LOCK phase. Change-Id: I6ab8e4d6b65ae4185fa85ad3fded8e9188b2f929 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>