summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht
Commit message (Collapse)AuthorAgeFilesLines
* build: MacOSX Porting fixesHarshavardhana2014-04-243-6/+4
| | | | | | | | | | | | | | | | | | | | | git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs Working functionality on MacOSX - GlusterD (management daemon) - GlusterCLI (management cli) - GlusterFS FUSE (using OSXFUSE) - GlusterNFS (without NLM - issues with rpc.statd) Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac BUG: 1089172 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Signed-off-by: Dennis Schafroth <dennis@schafroth.com> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Dennis Schafroth <dennis@schafroth.com> Reviewed-on: http://review.gluster.org/7503 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: force set dir inode ctx cached time in setattr()Anand Avati2014-04-173-1/+34
| | | | | | | | | | | | | | | | | In setattr, the inode times may have been explicitly set "back in time". In such cases, if the inode ctx times are not force set, then they continue to be higher and continue serving the higher/older value in future calls to dht_inode_ctx_time_update() Change-Id: I9cbfa7cf7c4069b0106d1f462de08c5d59bc91b5 BUG: 1083324 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/7378 Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* DHT/Rebalance : Hard link Migration FailureSusant Palai2014-03-301-6/+56
| | | | | | | | | | | | | | | | | | | Probelm : __is_file_migratable used to return ENOTSUP for all the cases. Hence, it will add to the failure count. And the remove-brick status will show failure for all the files. Solution : Added 'ret = -2' to gf_defrag_handle_hardlink to be deemed as success. Otherwise dht_migrate_file will try to migrate each of the hard link, which not intended. Change-Id: Iff74f6634fb64e4b91fc5d016e87ff1290b7a0d6 BUG: 1066798 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7124 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: refactorAnand Avati2014-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Remove client side self-healing completely (opendir, openfd, lookup) - Re-work readdir-failover to work reliably in case of NFS - Remove unused/dead lock recovery code - Consistently use xdata in both calls and callbacks in all FOPs - Per-inode event generation, used to force inode ctx refresh - Implement dirty flag support (in place of pending counts) - Eliminate inode ctx structure, use read subvol bits + event_generation - Implement inode ctx refreshing based on event generation - Provide backward compatibility in transactions - remove unused variables and functions - make code more consistent in style and pattern - regularize and clean up inode-write transaction code - regularize and clean up dir-write transaction code - regularize and clean up common FOPs - reorganize transaction framework code - skip setting xattrs in pending dict if nothing is pending - re-write self-healing code using syncops - re-write simpler self-heal-daemon Change-Id: I1e4080c9796c8a2815c2dab4be3073f389d614a8 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6010 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* build: Remove cmockery2 from repoLuis Pabon2014-03-172-36/+0
| | | | | | | | | | | | | | | | While we wait for cmockery2 to be available from Fedora, we can remove cmockery2 from the repo. BUG: 1077011 Change-Id: I75d462c607cd376a5d838ea83f4d12eb59757e73 Signed-off-by: Luis Pabon <lpabon@redhat.com> Reviewed-on: http://review.gluster.org/7281 Reviewed-by: Justin Clift <justin@gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: GlusterFS Unit Test FrameworkLuis Pabon2014-03-064-0/+222
| | | | | | | | | | | | | | | | | | | | This patch will allow for developers to create unit tests for their code. Documentation has been added to the patch and is available here: doc/hacker-guide/en-US/markdown/unittest.md Also, unit tests are run when RPM is created. BUG: 1067059 Change-Id: I95cf8bb0354d4ca4ed4476a0f2385436a17d2369 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Luis Pabon <lpabon@redhat.com> Reviewed-on: http://review.gluster.org/7145 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Justin Clift <justin@gluster.org> Tested-by: Justin Clift <justin@gluster.org>
* core: add @xdata parameter to syncop_[f]removexattr()Anand Avati2014-02-131-1/+1
| | | | | | | | | | | To be used in afr metadata self-heal Change-Id: I8dac4b19d61e331702427eeb5b606aab3d20b328 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6941 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* dht: Modified dht-statedump to print all subvolume_statusVenkatesh Somyajulu2014-02-111-12/+33
| | | | | | | | | | | Change-Id: I1aae33472bd15fc2bd7a170544f2994534fdf246 BUG: 1058204 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6800 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: If hashed_subvol is NULL, do not failSusant Palai2014-02-081-3/+3
| | | | | | | | | | | | | | | | | | | | | Problem: With the current implementation we are allowing unlink of a file if hashed subvol is down and cached subvol is up. For the above op to work we should have the info of hashed_subvol. But incase we do remount of the volume we will have a zeroed layout for the disconnected subvol(start=0, stop=0, err=ENOTCONN) which will result into hashed_subvol being NULL and failing unlink op. Solution: Dont fail if hashed_subvol is NULL. Check cached subvol and unlink in cached subvol. The linkto file in the hashed subvol can be remove later. Change-Id: Ic1982c15c8942a1adcb47ed0017d2d5ace5c9241 BUG: 983416 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6851 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Set restrictive open flags for files under rebalanceShyamsundarR2014-02-041-2/+10
| | | | | | | | | | | | | | | | | | | Files that are being rebalanced are created in the new volume and access path needs to open these files to write changing data in parallel to both the old and new locations. While opening the file in the new location, we need to restrict the open flags to not use truncate or create and fail if exist flags, to prevent open failures or inadvertently truncate the file under rebalance. Change-Id: I12130e0377adc393f1925c45585200ad991fd0d5 BUG: 1058569 Signed-off-by: ShyamsundarR <srangana@redhat.com> Reviewed-on: http://review.gluster.org/6830 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Fix layout sortingPoornima G2014-02-031-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The layout was not being sorted in the ascending order leading to the wrong detection of holes/overlaps. From looking at the previous git commits it appears that the initial version itself had the err comparison code. Deductions from the current dht_layout_sort(): 1. The zero'ed out layouts should be in the from of list, if needed 2. The layout should be sorted in the ascending order of layout error value. 3. The layout should be sorted in the ascending order of the layout 'start'. But In some cases, with the err comparison code its not sorted in the ascending order. Example: If the input is as below for dht_layout_sort(), the sorting doesn't happen in ascending order. Input: 0-1 err:0 2-3 err:0 6-7 err:0 0-0 err:20 4-5 err:0 With the current sort, Output: 4-5 err:0 0-0 err:0 0-1 err:0 2-3 err:0 6-7 err:0 Expected: 0-0 err:20 0-1 err:0 2-3 err:0 4-5 err:0 6-7 err:0 Looking at dht_layout_anomalies() it appears that, it doesn't require the layout to be sorted based on error value. The other solution was to replace line 468 with: if ((layout->list[i].err || layout->list[j].err) && (layout->list[i].start > layout->list[j].start)) Since dht_layout_anomalies() didn't expect the layout to be sorted based on the error, removed the err comparison. Change-Id: I1215f6cd53efc7dba01c0958ba6cc7609dab6ff5 BUG: 1056406 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6757 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* cluster/dht: Abandoned memory if a call failsChristopher R. Hertel2014-02-031-0/+3
| | | | | | | | | | | | | | | | | | | | | If the call to dict_set_dynstr() fails, the memory indicated by xattr_buf will not have been stored in the dictionary, so it must be freed. Patch set 2: Added a missed call to GF_FREE(). Fixed a formatting consistency issue. Patch set 3: Cleaned a minor style nit. BUG: 789278 CID: 1124786 Change-Id: Id1f85bd2cbfac0b8727a3f6901f0a50ba921817d Signed-off-by: Christopher R. Hertel <crh@redhat.com> Reviewed-on: http://review.gluster.org/6826 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: do not remove linkfile if file exist in cached sub volumeVijaykumar M2014-02-021-19/+109
| | | | | | | | | | | | | | | | Currently with rmdir, if a directory contains only the linkfiles we remove all the linkfiles and this is causing the problem when the cached sub volume is down and end-up with duplicate files showing on the mount point. Solution: Before removing a linkfile check if the files exists in cached subvolume. Change-Id: Iedffd0d9298ec8bb95d5ce27c341c9ade81f0d3c BUG: 1042725 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6500 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: set op_errno correctly during migration.Raghavendra G2014-01-251-3/+18
| | | | | | | | | | Change-Id: I65acedf92c1003975a584a2ac54527e9a2a1e52f BUG: 1010241 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6219 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: goto statements may cause loop exit before memory is freed.Christopher R. Hertel2014-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | Memory is allocated at the top of the while loop via a call to gf_strdup(), but there are several goto calls that exit the loop, and the memory is not freed before each of those calls to goto. This fix moves the final call to GF_FREE() higher in the loop so that the memory is correctly freed. Two variables, dup_str and str_tmp1, point to portions of the allocated memory. Neither are used past the final call to GF_FREE( dup_str ). BUG: 789278 CID: 1124780 Change-Id: Id24b80cdbfd8b8855c80fffec63d7fce98cbed4a Signed-off-by: Christopher R. Hertel <crh@redhat.com> Reviewed-on: http://review.gluster.org/6771 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Set quota limit key in dht_selfheal of dirs.Varun Shastry2014-01-222-6/+29
| | | | | | | | | | | | | | Also fixed check in dht_is_subvol_in_layout to check if the layouts are zero'ed out. Change-Id: I4bf8ebf66d3ef1946309b6c9aac9e79bf8a6d495 BUG: 969461 Signed-off-by: shishir gowda <sgowda@redhat.com> Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6392 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* quota: filter glusterfs quota xattrsSusant Palai2014-01-221-0/+6
| | | | | | | | | Change-Id: I86ebe02735ee88598640240aa888e02b48ecc06c BUG: 1040423 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6490 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* syncop: Change return value of syncopPranith Kumar K2014-01-193-85/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: We found a day-1 bug when syncop_xxx() infra is used inside a synctask with compilation optimization (CFLAGS -O2). Detailed explanation of the Root cause: We found the bug in 'gf_defrag_migrate_data' in rebalance operation: Lets look at interesting parts of the function: int gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc, dict_t *migrate_data) { ..... code section - [ Loop ] while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL, &entries)) != 0) { ..... code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a thread) /* Need to keep track of ENOENT errno, that means, there is no need to send more readdirp() */ readdir_operrno = errno; ..... code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread) ret = syncop_getxattr (this, &entry_loc, &dict, GF_XATTR_LINKINFO_KEY); code section - [ ERRNO-2] (checking for failures of syncop_getxattr(). This may not always be executed in same thread which executed [SYNCOP-1]) if (ret < 0) { if (errno != ENODATA) { loglevel = GF_LOG_ERROR; defrag->total_failures += 1; ..... } the function above could be executed by thread(t1) till [SYNCOP-1] and code from [ERRNO-2] can be executed by a different thread(t2) because of the way syncop-infra schedules the tasks. when the code is compiled with -O2 optimization this is the assembly code that is generated: [ERRNO-1] 1165 readdir_operrno = errno; <<---- errno gets expanded as *(__errno_location()) 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0 <address@hidden> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------ Address returned by __errno_location() is stored in a special location in stack for later use. 0x00007fd149d48b77 <+519>: mov (%rax),%eax 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp) .... [ERRNO-2] 1281 if (errno != ENODATA) { 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<----- Because it already stored the address returned by __errno_location(), it just dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE EXECUTED BY SAME THREAD!!! 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp 0x00007fd149d492b8 <+2376>: mov (%rax),%edi 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi The problem is that __errno_location() value of t1 and t2 are different. So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is executing [ERRNO-2] code section. When code is compiled without any optimization for [ERRNO-2]: 1281 if (errno != ENODATA) { 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300 <address@hidden><<--- As it is calling __errno_location() again it gets the location from t2 so it works as intended. 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1 <gf_defrag_migrate_data+2287> Fix: Make syncop_xxx() return (-errno) value as the return value in case of errors and all the functions which make syncop_xxx() will need to use (-ret) to figure out the reason for failure in case of syncop_xxx() failures. Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6475 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: Ignore directory with missing xattrs, which have err == 0, and start == ↵Vijaykumar M2014-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | stop From the history (Patch: http://review.gluster.org/4668/) When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. Change-Id: Idb72a869f1a6f103046bb7e6fe0019f6ac853fd4 BUG: 1047331 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6618 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: Improve the description in volume set help outputVarun Shastry2014-01-121-1/+2
| | | | | | | | | Change-Id: I785648970f53033a69922c23110b5eea9e47feb3 BUG: 1046030 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6573 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* pathinfo: Provide user namespace access.Vijaykumar M2013-12-161-2/+2
| | | | | | | | | | | | | | | | | Locality can be now queried by unprivileged users with key "glusterfs.pathinfo". Setting both "glusterfs.pathinfo" and "trusted.glusterfs.pathinfo" on disk is prevented with this patch. Original Author: Vijay Bellur <vbellur@redhat.com> Change-Id: I4f7a0db8ad59165c4aeda04b23173255157a8b79 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/5101 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: handle ESTALE/ENOENT in dht_accessAnand Avati2013-12-131-1/+1
| | | | | | | | | | | Had misssed out dht_access in the previous round of cleanup Change-Id: Ib255b9ad13ca62a8bc2eea225c46632aff8e820f BUG: 1032894 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6496 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com>
* glusterd/geo-rep: more glusterd and cli fixes for geo-rep.Ajeet Jha2013-12-121-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | -> handle option validation cases in reset case. -> Creating valid conf path when glusterd restarts. -> Reading the gsyncd worker thread status and displaying it. -> Displaying status-detail per worker. -> Fetch checkpoint info in geo-rep status. -> use-tarssh value validation added. misc: misc geo-rep fixes based on cluster, logrotate etc.. -> cluster/dht: fix 'stime' getxattr getting overwritten. -> cluster/afr: return max of 'stime' values in subvol. -> geo-rep-logrotate: Sending SIGHUP to geo-rep auxiliary. -> cluster/dht: fix convoluted logic while aggregating. -> cluster/*: fix 'stime' min/max fetch logic. Change-Id: I811acea0bbd6194797a3e55d89295d1ea021ac85 BUG: 1036552 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6405 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Make sure gf_defrag_migrate_data is not optimizedPranith Kumar K2013-12-111-0/+7
| | | | | | | | | | | | | | | | | | | Problem: Whenever there syncop_xxx() is used inside a synctask and gcc optimizes it when compiled with -O2 there is a problem where 'errno' would not work as expected. Fix: Until http://review.gluster.com/6475 is reviewed and merged we are making sure the function is not going to be optimized. Change-Id: I504c18c8a7789f0c776a56f0aa60db3618b21601 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6481 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: Set status to FAILED when rebalance stops due to brick going downKaushal M2013-12-103-5/+9
| | | | | | | | | | Change-Id: I98da41342127b1690d887a5bc025e4c9dd504894 BUG: 1038452 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6435 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <gowda.shishir@gmail.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: handle NULL check before strlen/strcmp in fgetxattrAnand Avati2013-12-021-0/+1
| | | | | | | | | | | | @key can legally be NULL. Handle that case without crashing. Change-Id: Iaae293caa7eeb24afc9cd2580799173e2ce00911 BUG: 1036879 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6395 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Handle Link-info getxattr failure in rebalancePranith Kumar K2013-11-261-7/+15
| | | | | | | | | | | | | | When getxattr fails with errno other than ENODATA fail rebalance on that file. Log the reason for error. Change-Id: Ia519870b88e6e6dd464d1c0415411aa999f80bc9 BUG: 1032927 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6341 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <sgowda@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: set layout in inode ctx even if linkfile failsAnand Avati2013-11-261-1/+1
| | | | | | | | | | | | | | Creating linkfile could have failed, but we dont care about linkfile for setting layout in the inode ctx (could be EEXIST etc.) So ignore @inode in cbk and pick it up from local->loc.inode Change-Id: I2952799d7ae0d3441b84b2ca2981afd75d7576e2 BUG: 1032859 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6319 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: fix errno for non-existent GFIDAnand Avati2013-11-265-11/+14
| | | | | | | | | | | | | | | | | | | When clients refer to a GFID which does not exist, the errno to be returned in ESTALE (and not ENOENT). Even though ENOENT might look "proper" most of the time, as the application eventually expects ENOENT even if a parent directory does not exist, not returning ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution in uncached mode. This can result in spurious ENOENTs during concurrent path modification operations. Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936 BUG: 1032894 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6318 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: instruct marker whenever it shouldn't do accountingRaghavendra G2013-11-261-12/+97
| | | | | | | | | | | | | | | | | | This is needed for two reasons: * since dht-linkfiles are internal, they shouldn't be accounted. * hardlink handling in marker is broken. link/unlink of hardlinks present in same directory can break marker accounting. Hence, if src and dst are in same directory in case of rename, dht - if it breaks rename into link/unlink operations - should instruct marker to not to do accounting. Change-Id: I9c9f7384569f75a2792f6450ee7a5279bf751ae7 BUG: 1022995 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6203 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: add dht_is_linkfile helper procedure.Raghavendra G2013-11-262-7/+3
| | | | | | | | | | | | | | components other than distribute (like marker to exclude linkfiles from being accounted) also need awareness of what constitutes a linkfile. Hence its good to separate out this functionality into core. Change-Id: Ib944eeacc991bb1de464c9e73ee409fc7a689ff1 BUG: 1022995 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/quota: Improvements to quotaRaghavendra G2013-11-261-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Two stages of quota enforcement is done: Soft and hard quota Upon reaching soft quota limit on the directory it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log) and no more writes allowed after hard quota limit. After reaching the soft-limit the daemon alerts the user/admin repeatively for every 'alert-time', which is configurable. * Quota enforcer is moved to server-side. It takes care of enforcing quota. Since enforcer doesn't have the cluster view, it relies on another service called quota-aggregator. Aggregator, on query can return the size of a directory based on the cluster view. Enforcer is always loaded in the server graph and is by passed if the feature is not enabled. Options specific to enforcer: server-quota - Specifies whether the feature is on/off. It is used to by pass the quota if turned off. deem-statfs - If set to on, it takes quota limits into consideration while estimating fs size. (df command). The algorithm followed is, i. Adjust statvfs based on limit configured on root. ii. If limit is set on the inode passed, use size/limits on that inode to populate statvfs. Otherwise, use size/limits configured on root. iii. Upon statvfs, update the ctx->size on the inode. iv. Don't let DHT aggregate, instead take the maximum of the usages from the subvols of the DHT, since each of it contains the complete information. Enforcer also makes use of gfid-to-path conversion functionality to work correctly when a client like nfs predominently relies on nameless lookups. * Quota Aggregator acts as a thin client to provide cluster view Its a lightweight *gluster client* process with no mount point, started upon enabling quota or restarting the volume. This is a single process run on each brick, which can answer queries on all volumes in the cluster. Its volfile stored in GLUSTERD_DEFAULT_WORKING_DIR/quotad/quotad.vol. Credits: Raghavendra Bhat <rabhat@redhat.com> Varun Shastry <vshastry@redhat.com> Shishir Gowda <sgowda@redhat.com> Kruthika Dhananjay <kdhananj@redhat.com> Brian Foster <bfoster@redhat.com> Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: Id1cb25b414951da34c665a55f77385d482e0f9de BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5952 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Have #include <signal.h> for kill(2)Emmanuel Dreyfus2013-11-182-0/+2
| | | | | | | | | BUG: 764655 Change-Id: I4d18c9a6c00cb4696645fcb437398562f00b9d24 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6284 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-142-2/+2
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht - rebalance: handle the rebalance @ inode level (!fd level)Amar Tumballi2013-11-134-128/+148
| | | | | | | | | | | | | * migrate all the fd's on an inode to newer subvol after rebalance * use the migration in progress flag in inode, so all the operations on the inode can make use of it Change-Id: Ib807a46e927a1062688fc15119c916797c52a350 BUG: 1013456 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5891 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add BD support to other xlatorsM. Mohan Kumar2013-11-131-15/+16
| | | | | | | | | | | | | | | | Make changes to distributed xlator to work with BD xlator. Unlike files, a block device can't be removed when its opened. So some part of the code were moved down to avoid this situation. Also before truncating a BD file its BD_XATTR should be set otherwise truncate will result in truncating posix file. So file is created with needed BD_XATTR and truncate is invoked. Also enables BD xlator in stripe volume type. Change-Id: If127516e261fac5fc5b137e7fe33e100bc92acc0 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5235 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-103-0/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: dht_lookup_dir_cbk should set op_errno as local->op_errnoKaushal M2013-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Two glusterfs clients return inconsistent errnos when the bricks of the volume were down. Consider two gluster mounts. Mount 1 was done when the bricks were online. Mount 2 was done after the bricks were killed, (using the 'glusterfs' command instead of the mount script). For any request, mount 1 will return ENOTCONN, where as mount 2 will return ENOENT. This happens because for the 2nd mount, a fuse would send a lookup on '/' for any request, as it hadn't been done yet. The client xlator returns ENOTCONN, but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when aggregating. So, fuse returned ENOENT, even though the errno should have been ENOTCONN. Change-Id: I4b7a6d84ce5153045a807fccc01485afe0377117 BUG: 1019095 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6072 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* gNFS: Incorrect NFS ACL encoding for XFSSantosh Kumar Pradhan2013-09-292-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: block unused signals in created threadsAnand Avati2013-09-251-2/+2
| | | | | | | | | | | | | | | Block all signal except those which are set for explicit handling in glusterfs_signals_setup(). Since thread spawning code in libglusterfs and xlators can get called from application threads when used through libgfapi, it is necessary to do this blocking. Change-Id: Ia320f80521a83d2edcda50b9ad414583a0175281 BUG: 1011662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Revert "cluster/dht: Return success in dht_discover if layout issues"shishir gowda2013-09-243-70/+35
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a3e593f9f17cb1e68db97bb5a0d8074793a33964 which was bought into fix dht_layout_anomalies error handling. We still see applications error'ing out due to graph switches. To fix the above issue - We cannot heal in dht_discover, as it is a gfid based lookup, and not path based. So, returning error here would lead to app's to see failure. Also, update the layout in inode_ctx even if it has anomalies. Let subsequent heals fix the issue. Conflicts: xlators/cluster/dht/src/dht-common.c Signed-off-by: shishir gowda <sgowda@redhat.com> Change-Id: I68c1056c3587e04a02344548546ddd06034489c5 BUG: 960348 Reviewed-on: http://review.gluster.org/5443 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Fix anomaly checkshishir gowda2013-09-231-3/+11
| | | | | | | | | | | | | We were wrongly detecting holes/overlaps for already accounted errors. Additionally, sort should also handle zero'ed out layout Change-Id: Ic3d13e1d735b914f9acc01fe919bc90656baea48 BUG: 1003851 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5762 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* distribute: Rebalance should provide even disk space distributionHarshavardhana2013-09-191-15/+30
| | | | | | | | | | | | | | | | | | | | | | | | | Earlier disk space check had an issue which didn't provide the needed functionality to avoid migration when the destination had lesser available space, scenario we need to avoid is stated below : During rebalance `migrate-data` - Destination subvol experiences a `reduction` in 'blocks' of free space, at the same time source subvol gains certain 'blocks' of free space. A valid check is necessary here to avoid errorneous move to destination where the space could be scantily available. This patch provides a proper fix in place by subtracting necessary file blocks from destination and adding those blocks to source. Change-Id: I9c7840716a4256ef614ffc0fbfd9f2b456ac28c8 BUG: 982919 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/5961 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <sgowda@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli/glusterd: improve rebalance fix-layout status reportingRavishankar N2013-09-192-0/+6
| | | | | | | | | | | | | | | | | | | Problem: Currenly the CLI rebalance status command output does not indicate the 'type' of rebalance, i.e. whether a full rebalance or only a fix-layout was carried out. Fix: After the rebalance status of all peers is received by the originator glusterd, alter it to reflect the type of rebalance before passing it on to the CLI process. Change-Id: I1940ffda0d36e25e5b33c84a0ea210394cc9e1d3 BUG: 1004744 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/5826 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Revert "cluster/distribute: Rebalance should also verify free inodes"Anand Avati2013-09-171-22/+5
| | | | | | | | | | | | | | | This reverts commit 215fea41a96479312a5ab8783c13b30ab9fe00fa Realized soon after merging, that this patch is actually useless. Checking for inode utilization on dst relative to src is of no use because dst is already storing linkfile and inode is already consumed no matter what. We might as well perform the rebalance and save on an inode on the src node. Change-Id: I7b8b8d35d9063a710e1bee1c7c6c7739fcc45ad4 Reviewed-on: http://review.gluster.org/5960 Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Rebalance should also verify free inodesHarshavardhana2013-09-171-5/+22
| | | | | | | | | | | | | | | | | | | | | Currently during `MIGRATE_DATA` we never verified about the total inode usage among new and old bricks. Such checks are available for disk space usage but it is also needed for inodes during data migration. Such a check leads to uniform outcome of file distribution upon rebalance. Patch provides: - Check dst_inodes < src_inodes, a friendly `warning` message to indicate we have to ignore such an attempt - Rename __dht_check_free_space() --> __dht_check_free_space_and_inodes() Change-Id: I7bc4dd8b507883f0fb836300e99f0bb083493f5f BUG: 982919 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/5948 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: assign layout onto missing directories tooAnand Avati2013-09-091-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | The current self-healing algorithm is ignoring missing directories for assigning new layout. When lookup() is racing against mkdir() or when self-healing a half-done mkdir(), the layout assignment split must happen based on the final number of directories, and not the currently existing number of directories (because we finish mkdir() of missing directories before hash layout assignment). Without this fix, concurrent mkdir() and lookup() will step on each others feet, create a messed up layout on disk, and end up with different in-memory layouts. Once two clients have different in-memory layouts, creation of subdirectory will not arbitrate on the same hashed subvolume and will result in GFID mismatch of the sub-directory. Change-Id: Ia47acad67c265060405984c822b4d37512b9dbb3 BUG: 907072 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5849 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Peter Portante <pportant@redhat.com> Tested-by: Peter Portante <pportant@redhat.com>
* core: remove GLUSTERFS_CREATE_MODE_KEY usageAmar Tumballi2013-08-231-9/+0
| | | | | | | | | Change-Id: I23b8cb7223b91a55af1cd4214f61bbe0e87351f6 BUG: 952029 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5683 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: changes to support gfid-accessAmar Tumballi2013-08-211-6/+0
| | | | | | | | | Change-Id: I38d2fdc47e4b805deafca6805e54807976ffdb7e Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 952029 Reviewed-on: http://review.gluster.org/5496 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Revert "fuse: auxiliary gfid mount support"Amar Tumballi2013-08-211-0/+6
| | | | | | | | | | | | | | | | | This reverts commit 4c0f4c8a89039b1fa1c9c015fb6f273268164c20. Conflicts: xlators/mount/fuse/src/fuse-bridge.c For build issues added CREATE_MODE_KEY definition in: libglusterfs/src/glusterfs.h Change-Id: I8093c2a0b5349b01e1ee6206025edbdbee43055e BUG: 952029 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5495 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>