summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
...
* features/quota: Make dht_statfs_cbk more fool proof from quota_deem_statfsVarun Shastry2014-06-182-16/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The function depends on the fact that if quota-deem-statfs option is enabled, all of the subvolumes send their xdata with quota-deem-statfs flag ON. But, this may not be true in case of errors in some of the subvolumes. There is a decision/policy made which assumes quota-deem-statfs to be ON if at least ONE of the subvolumes sends the flag ON. By this, df reports quota modified statfs values if *at least ONE* of the bricks sends the quota-deem-statfs flag ON. This can be visualized with the below "Transition Diagram/State Machine". Event: Each Quota deem statfs status from the individual bricks Action: Decision taken on the calculation of the statvfs received State: Whether quota deem statfs is ON or OFF (0: OFF, 1: ON) Input: Event from individual bricks ___ ___ / \ OFF* / \ (OFF|ON)* | | | | \ / ON \ / -----> 0 ----------------> 1 The below Transition Function depicts the relation between the statfs calculation based on the events received. State Event action ------------------------------------- OFF OFF OFF OFF ON REPLACE ON OFF NEGLECT ON ON COMPARE Change-Id: I0e8fb7d3945a3ca3dde0bb99de6cd397e27a3162 BUG: 1048786 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6652 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Do layout self healing of directory for nameless lookupVenkatesh Somyajulu2014-06-174-1/+226
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently in the nameless lookup code path, if at the end of the lookup, even if it detects that layout anamolies are there, layout healing will not be done as there is no code to heal it. So there can be race between mkdir and lookup. Assume mkdir is going on from some other mount point, Say, M1. Directories are created on some nodes but layout is not set yet. Now from M2, nameless lookup goes, lookup will be success full as the directory is present on some of the nodes, but it won't heal layout. Now if create goes after lookup fop, because layout is absent, file creation will fail. Fix: Included the code of layout self-heal in the nameless lookup path. At the end of lookup, layout will be computed as it would have been in the named lookup, but it will be set to those node only, where directory is present. So after that if create fop goes, the probabiliy to get the subvolume with proper hash-range is high now, so reduces the race window. Other: Whenever a directory is created, we have to choose a brick from which we start allocating layout in a circular fashion. To calculate this starting brick, I have changed the candidate from name of the directory to gfid of the directory But to compute where a given file belongs, we will still use the name of the file. Hash computed from the name of the file should belong to any one of the directory-hash-range Calculation of hash for a file is acting as a consumer and the setting of directory layout based on gfid is acting as a producer, which are independent from each other. Change-Id: I3808c55082cd1b5c72d2c77cbbc063f55aa38bee BUG: 1095888 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/7493 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Bring option to choose gfid or name based hashingVenkatesh Somyajulu2014-06-173-5/+32
| | | | | | | | | Change-Id: I11794eb2adceb88e75864aede450e904431a6273 BUG: 1095888 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/8049 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: make loc->inode instead of loc->parentVenkatesh Somyajulu2014-06-161-1/+1
| | | | | | | | | | | | | parent's inode should be taken from loc->inode. Change-Id: I979b7333efa93b1e8f4c73ccf048d48e308f9289 BUG: 1104653 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/8073 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Cluster/DHT: New logging frameworkNithya Balachandran2014-06-1617-753/+1621
| | | | | | | | | | | | Moved all relevant DHT gf_log calls to the new logging framework. Change-Id: I3af3cfe0416e332774a6c4ff6a091d006c400af2 BUG: 1075611 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/7929 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* DHT/readdirp: Directory not shown/healed on mount point if existsSusant Palai2014-06-163-5/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | on single brick(non first up subvolume). Problem: If snapshot is taken, when mkdir has succeeded only on hashed_subvolume, then after restoring snapshot the directory is not shown on mount point. Why: dht_readdirp takes only those directory entries in to account, which are present on first_up_subvolume. Hence, if the "hashed subvolume" is not same as first_up_subvolume, it wont be listed on mount point and also not healed. Solution: Case 1: (Rebalance not running)If hashed subvolume is NULL or down then filter in first_up_subvolume. Other wise the corresponding hashed subvolume will take care of the directory entry. Case 2: If readdirp_optimize option is turned on then read from first_up_subvol Change-Id: Idaad28f1c9f688dbfb1a8a3ab8b244510c02365e BUG: 1092433 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7599 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht/rebalance: Do not allow rebalance when gfid mismatch foundVenkatesh Somyajulu2014-06-141-1/+17
| | | | | | | | | | | | | | | | | | | | | Due to race condition, it may so happen that, gfid obtained in readdirp and gfid found by lookup are different for a given name. in that case do no allow the rebalance. Readdirp of an entry will bring the gfid, which will be stored in the inode through inode_link, and when lookup is done and gfid brought by lookup is different from the one stored in the inode, client3_3_lookup_cbk will return ESATLE and error will be captured by rebalance process. Change-Id: Iad839177ef9b80c1dd0e87f3406bcf4cb018e6fa BUG: 1104653 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/7973 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Fix resolution issues across fuse/server/afrPranith Kumar K2014-06-142-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problems with fuse/server: Fuse loc touch up sets loc->name even when pargfid is not known. Server lookup does (pargfid, name) based lookup when name is set ignoring the gfid. Because of this server resolver finds that the lookup came on (null-pargfid, name) and fails the lookup with EINVAL. Fix: Don't set loc->name in loc_touchup if the pargfid is not known. Did the same even for server-resolver Problem with afr: Lets say there is a directory hierarchy a/b/c/d on the mount and the user is cd'ed into the directory. Bring down one of the bricks of replica and remove all directories/files to simulate disk replacement on that brick. Now this brick is brought back up. Creates on the cd'ed directory fail with ESTALE. Basically before sending a create of 'f' inside 'd', fuse sends a lookup to make sure the file is not present. On one of the bricks 'd' is present and 'f' is not so it sends ENOENT as response. On the new brick 'd' itself is not present. So it sends ESTALE. In afr ESTALE is considered to be special errno on witnessing which lookup has to fail. And ESTALE is given more priority than ENOENT. Due to these reasons lookup fails with ESTALE rather than ENOENT. Since lookup didn't fail with ENOENT, 'create' can't be issued so the command is failed with ESTALE. Solution: Afr needs to consider ESTALE errno normally and ENOENT needs to be given more priority so that operations like create can proceed even when only one of the brick is up and running. Whenever client xlator identifies that gfid-changed, it sets that information in lookup xdata. Afr uses this information to fail the lookup with ESTALE so that top xlator can send fresh lookup. Change-Id: Ica6ce01baef08620154050a635e6f97d51029ef6 BUG: 1106408 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8015 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Core: Fix issues reported by CppcheckLalatendu Mohanty2014-06-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed in this patch: [glusterfs/extras/geo-rep/gsync-sync-gfid.c:105]: (error) Resource leak: fp [glusterfs/libglusterfs/src/xlator.c:651]: (error) Uninitialized variable: gfid [glusterfs/libglusterfs/src/xlator.c:652]: (error) Uninitialized variable: gfid [glusterfs/xlators/cluster/ha/src/ha.c:2699]: (error) Possible null pointer dereference: priv [glusterfs/xlators/features/changelog/src/changelog.c:1464]: (error) Possible null pointer dereference: priv [glusterfs/xlators/mgmt/glusterd/src/glusterd-mgmt-handler.c:865]: (error) Possible null pointer dereference: ctx [glusterfs/xlators/mgmt/glusterd/src/glusterd-mgmt-handler.c:194]: (error) Possible null pointer dereference: ctx [glusterfs/xlators/mgmt/glusterd/src/glusterd-syncop.c:1408]: (error) Possible null pointer dereference: this [glusterfs/xlators/mgmt/glusterd/src/glusterd-utils.c:7002]: (error) Possible null pointer dereference: path_tokens Fixed in 3.4 and 3.5 branch (http://review.gluster.org/#/c/7583/ , http://review.gluster.org/#/c/7605/ will be backported in a separate patch) [glusterfs/xlators/mount/fuse/src/fuse-bridge.c:4688]: (error) Uninitialized variable: finh [glusterfs/xlators/mount/fuse/src/fuse-bridge.c:3081]: (error) Possible null pointer dereference: state [glusterfs/xlators/cluster/dht/src/dht-rebalance.c:1719]: (error) Possible null pointer dereference: ctx [glusterfs/xlators/cluster/stripe/src/stripe.c:4940]: (error) Possible null pointer dereference: local [glusterfs/xlators/mgmt/glusterd/src/glusterd-replace-brick.c:915]: (error) Resource leak: file [glusterfs/xlators/mgmt/glusterd/src/glusterd-replace-brick.c:999]: (error) Resource leak: file [glusterfs/xlators/mgmt/glusterd/src/glusterd-sm.c:248]: (error) Possible null pointer dereference: new_ev_ctx [glusterfs/xlators/mgmt/glusterd/src/glusterd-utils.c:5297]: (error) Possible null pointer dereference: this [glusterfs/xlators/mgmt/glusterd/src/glusterd-utils.c:6273]: (error) Possible null pointer dereference: this [glusterfs/xlators/performance/quick-read/src/quick-read.c:586]: (error) Possible null pointer dereference: iobuf [glusterfs/xlators/nfs/server/src/nfs-common.c:89]: (error) Dangerous usage of 'volname' (strncpy doesn't always null-terminate it). False positives [glusterfs/geo-replication/src/gsyncd.c:99]: (error) Memory leak: str [glusterfs/geo-replication/src/gsyncd.c:395]: (error) Memory leak: argv [glusterfs/xlators/nfs/server/src/nlm4.c:1199]: (error) Possible null pointer dereference: fde [glusterfs/xlators/mgmt/glusterd/src/glusterd-geo-rep.c:1659]: (error) Possible null pointer dereference: command [glusterfs/xlators/mgmt/glusterd/src/glusterd-utils.c:7001]: (error) Possible null pointer dereference: path_tokens Insignificant/Don't care [glusterfs/contrib/uuid/gen_uuid.c:369]: (warning) %ld in format string (no. 2) requires 'long *' but the argument type is 'unsigned long *'. [glusterfs/contrib/uuid/gen_uuid.c:369]: (warning) %ld in format string (no. 3) requires 'long *' but the argument type is 'unsigned long *'. [glusterfs/extras/test/test-ffop.c:27]: (error) Buffer overrun possible for long command line arguments. [glusterfs/xlators/cluster/afr/src/afr-self-heal-common.c:138]: (error) Possible null pointer dereference: __ptr [glusterfs/xlators/cluster/afr/src/afr-self-heal-common.c:140]: (error) Possible null pointer dereference: __ptr [glusterfs/xlators/cluster/afr/src/afr-self-heal-common.c:331]: (error) Possible null pointer dereference: __ptr Change-Id: I7696ed1a2a9553b79f9714e10210a8d563a5abd8 BUG: 1091677 Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com> Reviewed-on: http://review.gluster.org/7693 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Rebalance: Avoid setting other component's xattrsSusant Palai2014-06-081-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem 1: In "gf_defrag_handle_hardlink" we used to do setxattr on internal afr keys. Which lead to afr aborting the op saying "operation not supported". Solution : Sending a new xattr with only required keys. Problem 2: Hardlink migration tries to create linkto files for 2nd to (n-1)th hardlink of a file on their respective hashed_subvolumes. It may so happen that the linkto file already exists on the hashed subvolume may be due to an earlier lookup or hashed subvolume on the older graph is same as that on the new graph. Hence any new link call may fail with EEXIST. Solution: Will log the message with DEBUG level for EEXIST . Otherwise will log with ERROR level. Change-Id: I51f9bfc8cf5b9d8e94a9d614391662fddc0874d4 BUG: 1066798 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7943 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Fix min-free-disk calculations when quota-deem-statfs is onKrutika Dhananjay2014-06-021-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: As part of file creation, DHT sends a statfs call to all of its sub-volumes and expects in return the local space consumption and availability on each one of them. This information is used by DHT to ensure that atleast min-free-disk amount of space is left on each sub-volume in the event that there ARE other sub-volumes with more space available. But when quota-deem-statfs is enabled, quota xlator on every brick unwinds the statfs call with volume-wide consumption of disk space. This leads to miscalculation in min-free-disk algo, thereby misleading DHT at some point, into thinking all sub-volumes have equal available space, in which case DHT keeps sending new file creates to subvol-0, causing it to become 100% full at some point although there ARE other subvols with ample space available. FIX: The fix is to make quota_statfs() behave as if quota xlator weren't enabled, thereby making every brick return only its local consumption and disk space availability. Change-Id: I211371a1eddb220037bd36a128973938ea8124c2 BUG: 1099890 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/7845 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Add descriptions to message-ids.Ravishankar N2014-05-301-0/+75
| | | | | | | | | | | | Change-Id: I7ec2821c96d4cbff78b4959d9b07019896cc9a2c BUG: 1075611 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7840 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* DHT: Wrong error handling in gf_defrag_migrate_dataSusant Palai2014-05-281-9/+10
| | | | | | | | | | | | | | | | | | | | | | Problem: Because of the condition (err = op_errno), err was set to zero always and ENOSPC error will be logged always. "dht_check_free_ space" was returning 1 and it was mapped to EPERM in "rebalance_task _completion". Solution: Changed the return value in dht_check_free_space to -1 as in rebalance_task_completion op_ret value -1 is mapped to ENOSPC. And fixed the wrong error condition after syncop_setxattr in gf_defrag_migrate_data. Change-Id: I474ea1bef3b1e34c89814ed0091dd02bd5b63737 BUG: 1054703 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6727 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* DHT/Mkdir: Fail mkdir with ENOENT, if parent is not availableSusant Palai2014-05-281-3/+3
| | | | | | | | | | Change-Id: I54227bcafb6d0d8cf716a679d2a34be7fc916898 BUG: 1078847 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7306 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Don't support heal info healed/heal-failed commandsPranith Kumar K2014-05-272-32/+21
| | | | | | | | | | Change-Id: Iecfd3150e4f4e795e3403bcb1ac56340759a37d0 BUG: 1098027 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7766 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: move messages to new logging frameworkRavishankar N2014-05-1711-21/+93
| | | | | | | | | | | | | | Change important (from a diagnostics point of view) log messages to use the gf_msg() framework. Change-Id: I0a58184bbb78989db149e67f07c140a21c781bc2 BUG: 1075611 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7784 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix bugs in quorum implementationPranith Kumar K2014-05-148-102/+164
| | | | | | | | | | | | | - Have common place to perform quorum fop wind check - Check if fop succeeded in a way that matches quorum to avoid marking changelog in split-brain. BUG: 1066996 Change-Id: Ibc5b80e01dc206b2abbea2d29e26f3c60ff4f204 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7600 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Remove stale index in self-heal codepathPranith Kumar K2014-05-082-5/+7
| | | | | | | | | Change-Id: I635fc0fa955b33590f1c5b4dfec22d591ea8575c BUG: 1032894 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6592 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Fix inode_forget assert failurePranith Kumar K2014-04-282-28/+47
| | | | | | | | | | | | | | | | | | | | | | Problem: If two self-heals are triggered on same inode in parallel then one inode will be linked and the other inode will not be linked as an inode with that gfid is already linked in inode table. Calling inode-forget on that inode leads to assert failure. Fix: Always use linked inode for performing self-heal. Added inode-forgets in other places as well even though its not really a memory leak. Change-Id: Ib84bf080c8cb6a4243f66541ece587db28f9a052 BUG: 1091597 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7567 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: MacOSX Porting fixesHarshavardhana2014-04-2410-27/+18
| | | | | | | | | | | | | | | | | | | | | git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs Working functionality on MacOSX - GlusterD (management daemon) - GlusterCLI (management cli) - GlusterFS FUSE (using OSXFUSE) - GlusterNFS (without NLM - issues with rpc.statd) Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac BUG: 1089172 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Signed-off-by: Dennis Schafroth <dennis@schafroth.com> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Dennis Schafroth <dennis@schafroth.com> Reviewed-on: http://review.gluster.org/7503 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: force set dir inode ctx cached time in setattr()Anand Avati2014-04-173-1/+34
| | | | | | | | | | | | | | | | | In setattr, the inode times may have been explicitly set "back in time". In such cases, if the inode ctx times are not force set, then they continue to be higher and continue serving the higher/older value in future calls to dht_inode_ctx_time_update() Change-Id: I9cbfa7cf7c4069b0106d1f462de08c5d59bc91b5 BUG: 1083324 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/7378 Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Set right argument order for STACK_WIND_COOKIEVijay Bellur2014-04-101-1/+1
| | | | | | | | | | Change-Id: Ia26e17a7147ed825319c7c29880b9cf4ae80a48c BUG: 1085259 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/7416 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Mem leak fixes found in valgrind for iozonePranith Kumar K2014-04-091-0/+4
| | | | | | | | | | Change-Id: I869d191dc3470b2208c17343bbf772f01ef744cb BUG: 1085511 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7424 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Init local on txn-frame for zerofillPranith Kumar K2014-04-081-1/+1
| | | | | | | | | | Change-Id: I516f4fb0237dd0b3e512117bf987cea69f8678b8 BUG: 1084485 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7407 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: Simple 1-liner fix for crash in RackspaceBrian Foster2014-04-041-1/+1
| | | | | | | | BUG: 1084485 Change-Id: I89ddf10add041638ef70baebbce0ec2807ef4b6d Signed-off-by: Justin Clift <justin@gluster.org> Reviewed-on: http://review.gluster.org/7402 Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Remove eager-lock stub on finodelk failurePranith Kumar K2014-04-023-6/+23
| | | | | | | | | | | | | | | | | | | | Problem: For write fops afr's transaction eager-lock init adds transactions that can share eager-lock to fdctx list. But if eager-lock finodelk fop fails the stub remains in the list. This could later lead to corruption of the list and lead to infinite loop on the list leading to a mount hang. Fix: Remove the stub when finodelk fails. Change-Id: I0ed4bc6b62f26c5e891c1181a6871ee6e4f4f5fd BUG: 1063190 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6944 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* DHT/Rebalance : Hard link Migration FailureSusant Palai2014-03-301-6/+56
| | | | | | | | | | | | | | | | | | | Probelm : __is_file_migratable used to return ENOTSUP for all the cases. Hence, it will add to the failure count. And the remove-brick status will show failure for all the files. Solution : Added 'ret = -2' to gf_defrag_handle_hardlink to be deemed as success. Otherwise dht_migrate_file will try to migrate each of the hard link, which not intended. Change-Id: Iff74f6634fb64e4b91fc5d016e87ff1290b7a0d6 BUG: 1066798 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7124 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Sparse file self-heal cangesPranith Kumar K2014-03-261-9/+28
| | | | | | | | | | | | | - Fix boundary condition for offset - Honour data-self-heal-algorithm option - Added tests for sparse file self-healing Change-Id: I14bb1c9d04118a3df4072f962fc8f2f197391d95 BUG: 1080707 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7339 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: refactorAnand Avati2014-03-2229-18979/+8734
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Remove client side self-healing completely (opendir, openfd, lookup) - Re-work readdir-failover to work reliably in case of NFS - Remove unused/dead lock recovery code - Consistently use xdata in both calls and callbacks in all FOPs - Per-inode event generation, used to force inode ctx refresh - Implement dirty flag support (in place of pending counts) - Eliminate inode ctx structure, use read subvol bits + event_generation - Implement inode ctx refreshing based on event generation - Provide backward compatibility in transactions - remove unused variables and functions - make code more consistent in style and pattern - regularize and clean up inode-write transaction code - regularize and clean up dir-write transaction code - regularize and clean up common FOPs - reorganize transaction framework code - skip setting xattrs in pending dict if nothing is pending - re-write self-healing code using syncops - re-write simpler self-heal-daemon Change-Id: I1e4080c9796c8a2815c2dab4be3073f389d614a8 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6010 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* build: Remove cmockery2 from repoLuis Pabon2014-03-172-36/+0
| | | | | | | | | | | | | | | | While we wait for cmockery2 to be available from Fedora, we can remove cmockery2 from the repo. BUG: 1077011 Change-Id: I75d462c607cd376a5d838ea83f4d12eb59757e73 Signed-off-by: Luis Pabon <lpabon@redhat.com> Reviewed-on: http://review.gluster.org/7281 Reviewed-by: Justin Clift <justin@gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: GlusterFS Unit Test FrameworkLuis Pabon2014-03-064-0/+222
| | | | | | | | | | | | | | | | | | | | This patch will allow for developers to create unit tests for their code. Documentation has been added to the patch and is available here: doc/hacker-guide/en-US/markdown/unittest.md Also, unit tests are run when RPM is created. BUG: 1067059 Change-Id: I95cf8bb0354d4ca4ed4476a0f2385436a17d2369 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Luis Pabon <lpabon@redhat.com> Reviewed-on: http://review.gluster.org/7145 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Justin Clift <justin@gluster.org> Tested-by: Justin Clift <justin@gluster.org>
* core: add @xdata parameter to syncop_[f]removexattr()Anand Avati2014-02-132-4/+4
| | | | | | | | | | | To be used in afr metadata self-heal Change-Id: I8dac4b19d61e331702427eeb5b606aab3d20b328 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6941 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/stripe: Fix the possible resource leaks.Poornima2014-02-121-4/+5
| | | | | | | | | Change-Id: Ic6fbc8c843f80edd7458d15229eb72a5609973a5 BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6986 Reviewed-by: Amar Tumballi <amarts@gmail.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/stripe: Fix the resource leaks.Poornima2014-02-121-0/+2
| | | | | | | | | Change-Id: Ieb1fe112686f4932a6272a0117c1373e736d5b4e BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6951 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: Modified dht-statedump to print all subvolume_statusVenkatesh Somyajulu2014-02-111-12/+33
| | | | | | | | | | | Change-Id: I1aae33472bd15fc2bd7a170544f2994534fdf246 BUG: 1058204 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6800 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: goto statements may cause exit before memory is freed.Christopher R. Hertel2014-02-101-9/+28
| | | | | | | | | | | | | | | | | | | | | | | Memory is allocated for pump_priv and for pump_priv->resume_path, but if an error is detected the references to that memory go out of scope and the memory is never freed. This patch assures that the memory is freed on error. Patchset 2: These are Kaleb's recommended changes which, compared to my original fix, are more comprehensive and provide a more complete resolution to the memory leakage bugs in this function. The bug reported by Coverity was limited to a single memory allocation. BUG: 789278 CID: 1124737 Change-Id: Ie239e3b5d28d97308bf948efec6a92f107bc648b Signed-off-by: Christopher R. Hertel <crh@redhat.com> Reviewed-on: http://review.gluster.org/6929 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: If hashed_subvol is NULL, do not failSusant Palai2014-02-081-3/+3
| | | | | | | | | | | | | | | | | | | | | Problem: With the current implementation we are allowing unlink of a file if hashed subvol is down and cached subvol is up. For the above op to work we should have the info of hashed_subvol. But incase we do remount of the volume we will have a zeroed layout for the disconnected subvol(start=0, stop=0, err=ENOTCONN) which will result into hashed_subvol being NULL and failing unlink op. Solution: Dont fail if hashed_subvol is NULL. Check cached subvol and unlink in cached subvol. The linkto file in the hashed subvol can be remove later. Change-Id: Ic1982c15c8942a1adcb47ed0017d2d5ace5c9241 BUG: 983416 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6851 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Fix memory leak.Poornima2014-02-081-0/+1
| | | | | | | | | Change-Id: I811d104684905a5a9a794cde8e925bd1a97f6546 BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6906 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Set restrictive open flags for files under rebalanceShyamsundarR2014-02-041-2/+10
| | | | | | | | | | | | | | | | | | | Files that are being rebalanced are created in the new volume and access path needs to open these files to write changing data in parallel to both the old and new locations. While opening the file in the new location, we need to restrict the open flags to not use truncate or create and fail if exist flags, to prevent open failures or inadvertently truncate the file under rebalance. Change-Id: I12130e0377adc393f1925c45585200ad991fd0d5 BUG: 1058569 Signed-off-by: ShyamsundarR <srangana@redhat.com> Reviewed-on: http://review.gluster.org/6830 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Fix layout sortingPoornima G2014-02-031-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The layout was not being sorted in the ascending order leading to the wrong detection of holes/overlaps. From looking at the previous git commits it appears that the initial version itself had the err comparison code. Deductions from the current dht_layout_sort(): 1. The zero'ed out layouts should be in the from of list, if needed 2. The layout should be sorted in the ascending order of layout error value. 3. The layout should be sorted in the ascending order of the layout 'start'. But In some cases, with the err comparison code its not sorted in the ascending order. Example: If the input is as below for dht_layout_sort(), the sorting doesn't happen in ascending order. Input: 0-1 err:0 2-3 err:0 6-7 err:0 0-0 err:20 4-5 err:0 With the current sort, Output: 4-5 err:0 0-0 err:0 0-1 err:0 2-3 err:0 6-7 err:0 Expected: 0-0 err:20 0-1 err:0 2-3 err:0 4-5 err:0 6-7 err:0 Looking at dht_layout_anomalies() it appears that, it doesn't require the layout to be sorted based on error value. The other solution was to replace line 468 with: if ((layout->list[i].err || layout->list[j].err) && (layout->list[i].start > layout->list[j].start)) Since dht_layout_anomalies() didn't expect the layout to be sorted based on the error, removed the err comparison. Change-Id: I1215f6cd53efc7dba01c0958ba6cc7609dab6ff5 BUG: 1056406 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6757 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* cluster/dht: Abandoned memory if a call failsChristopher R. Hertel2014-02-031-0/+3
| | | | | | | | | | | | | | | | | | | | | If the call to dict_set_dynstr() fails, the memory indicated by xattr_buf will not have been stored in the dictionary, so it must be freed. Patch set 2: Added a missed call to GF_FREE(). Fixed a formatting consistency issue. Patch set 3: Cleaned a minor style nit. BUG: 789278 CID: 1124786 Change-Id: Id1f85bd2cbfac0b8727a3f6901f0a50ba921817d Signed-off-by: Christopher R. Hertel <crh@redhat.com> Reviewed-on: http://review.gluster.org/6826 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: do not remove linkfile if file exist in cached sub volumeVijaykumar M2014-02-021-19/+109
| | | | | | | | | | | | | | | | Currently with rmdir, if a directory contains only the linkfiles we remove all the linkfiles and this is causing the problem when the cached sub volume is down and end-up with duplicate files showing on the mount point. Solution: Before removing a linkfile check if the files exists in cached subvolume. Change-Id: Iedffd0d9298ec8bb95d5ce27c341c9ade81f0d3c BUG: 1042725 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6500 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Change default_value for option self-heal-daemonVijay Bellur2014-01-271-1/+1
| | | | | | | | | Change-Id: Ic3c8e179a63e82a4e416aea620796f8bb3236c7c BUG: 1052759 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6706 Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: set op_errno correctly during migration.Raghavendra G2014-01-251-3/+18
| | | | | | | | | | Change-Id: I65acedf92c1003975a584a2ac54527e9a2a1e52f BUG: 1010241 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6219 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: goto statements may cause loop exit before memory is freed.Christopher R. Hertel2014-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | Memory is allocated at the top of the while loop via a call to gf_strdup(), but there are several goto calls that exit the loop, and the memory is not freed before each of those calls to goto. This fix moves the final call to GF_FREE() higher in the loop so that the memory is correctly freed. Two variables, dup_str and str_tmp1, point to portions of the allocated memory. Neither are used past the final call to GF_FREE( dup_str ). BUG: 789278 CID: 1124780 Change-Id: Id24b80cdbfd8b8855c80fffec63d7fce98cbed4a Signed-off-by: Christopher R. Hertel <crh@redhat.com> Reviewed-on: http://review.gluster.org/6771 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/stripe: Remove redundant code blocksChristopher R. Hertel2014-01-241-24/+0
| | | | | | | | | | | | | | | | | This appears to have been a cut&paste error. The same set of 12 lines was repeated three times, causing a pointer to allocated memory to be overwritten twice resulting in a memory leak. This patch removes the redundant code. BUG: 789278 CID: 1128915 Change-Id: I3e4a3703b389c00e2a4e99e0a7368c5a3dda74d0 Signed-off-by: Christopher R. Hertel <crh@redhat.com> Reviewed-on: http://review.gluster.org/6769 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Set quota limit key in dht_selfheal of dirs.Varun Shastry2014-01-222-6/+29
| | | | | | | | | | | | | | Also fixed check in dht_is_subvol_in_layout to check if the layouts are zero'ed out. Change-Id: I4bf8ebf66d3ef1946309b6c9aac9e79bf8a6d495 BUG: 969461 Signed-off-by: shishir gowda <sgowda@redhat.com> Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6392 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* quota: filter glusterfs quota xattrsSusant Palai2014-01-221-0/+6
| | | | | | | | | Change-Id: I86ebe02735ee88598640240aa888e02b48ecc06c BUG: 1040423 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6490 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* syncop: Change return value of syncopPranith Kumar K2014-01-195-100/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: We found a day-1 bug when syncop_xxx() infra is used inside a synctask with compilation optimization (CFLAGS -O2). Detailed explanation of the Root cause: We found the bug in 'gf_defrag_migrate_data' in rebalance operation: Lets look at interesting parts of the function: int gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc, dict_t *migrate_data) { ..... code section - [ Loop ] while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL, &entries)) != 0) { ..... code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a thread) /* Need to keep track of ENOENT errno, that means, there is no need to send more readdirp() */ readdir_operrno = errno; ..... code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread) ret = syncop_getxattr (this, &entry_loc, &dict, GF_XATTR_LINKINFO_KEY); code section - [ ERRNO-2] (checking for failures of syncop_getxattr(). This may not always be executed in same thread which executed [SYNCOP-1]) if (ret < 0) { if (errno != ENODATA) { loglevel = GF_LOG_ERROR; defrag->total_failures += 1; ..... } the function above could be executed by thread(t1) till [SYNCOP-1] and code from [ERRNO-2] can be executed by a different thread(t2) because of the way syncop-infra schedules the tasks. when the code is compiled with -O2 optimization this is the assembly code that is generated: [ERRNO-1] 1165 readdir_operrno = errno; <<---- errno gets expanded as *(__errno_location()) 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0 <address@hidden> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------ Address returned by __errno_location() is stored in a special location in stack for later use. 0x00007fd149d48b77 <+519>: mov (%rax),%eax 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp) .... [ERRNO-2] 1281 if (errno != ENODATA) { 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<----- Because it already stored the address returned by __errno_location(), it just dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE EXECUTED BY SAME THREAD!!! 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp 0x00007fd149d492b8 <+2376>: mov (%rax),%edi 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi The problem is that __errno_location() value of t1 and t2 are different. So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is executing [ERRNO-2] code section. When code is compiled without any optimization for [ERRNO-2]: 1281 if (errno != ENODATA) { 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300 <address@hidden><<--- As it is calling __errno_location() again it gets the location from t2 so it works as intended. 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1 <gf_defrag_migrate_data+2287> Fix: Make syncop_xxx() return (-errno) value as the return value in case of errors and all the functions which make syncop_xxx() will need to use (-ret) to figure out the reason for failure in case of syncop_xxx() failures. Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6475 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: Ignore directory with missing xattrs, which have err == 0, and start == ↵Vijaykumar M2014-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | stop From the history (Patch: http://review.gluster.org/4668/) When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. Change-Id: Idb72a869f1a6f103046bb7e6fe0019f6ac853fd4 BUG: 1047331 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6618 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>