glusterfs-snapshot.git/xlators/cluster/dht/src, branch gluster-snap

cluster/dht: Set restrictive open flags for files under rebalance

2014-02-04T17:56:57+00:00

Files that are being rebalanced are created in the new volume
and access path needs to open these files to write changing
data in parallel to both the old and new locations. While opening
the file in the new location, we need to restrict the open flags
to not use truncate or create and fail if exist flags, to prevent
open failures or inadvertently truncate the file under rebalance.

Change-Id: I12130e0377adc393f1925c45585200ad991fd0d5
BUG: 1058569
Signed-off-by: ShyamsundarR 
Reviewed-on: http://review.gluster.org/6830
Reviewed-by: Raghavendra G 
Reviewed-by: Krutika Dhananjay 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat 
Reviewed-by: Vijay Bellur

cluster/dht: Fix layout sorting

2014-02-04T01:24:13+00:00

The layout was not being sorted in the ascending order leading
to the wrong detection of holes/overlaps.

From looking at the previous git commits it appears that the initial version itself had the err comparison code.
Deductions from the current dht_layout_sort():
1. The zero'ed out layouts should be in the from of list, if needed
2. The layout should be sorted in the ascending order of layout error value.
3. The layout should be sorted in the ascending order of the layout 'start'.
But In some cases, with the err comparison code its not sorted in the ascending order. Example: If the input is as below for dht_layout_sort(), the sorting doesn't happen in ascending order.
Input:
0-1 err:0    2-3 err:0   6-7 err:0    0-0 err:20   4-5 err:0
          
With the current sort, Output:
4-5 err:0    0-0 err:0    0-1 err:0    2-3 err:0    6-7 err:0
Expected: 0-0 err:20 0-1 err:0 2-3 err:0 4-5 err:0 6-7 err:0
Looking at dht_layout_anomalies() it appears that, it doesn't require the layout to be sorted based on error value.
The other solution was to replace line 468 with:
 if ((layout->list[i].err || layout->list[j].err) && (layout->list[i].start > layout->list[j].start))
Since dht_layout_anomalies() didn't expect the layout to be sorted based on the error, removed the err comparison.

Change-Id: I1215f6cd53efc7dba01c0958ba6cc7609dab6ff5
BUG: 1056406
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/6757
Reviewed-by: Anand Avati 
Tested-by: Anand Avati

cluster/dht: Abandoned memory if a call fails

2014-02-04T01:22:15+00:00

If the call to dict_set_dynstr() fails, the memory indicated by
xattr_buf will not have been stored in the dictionary, so it must be
freed.

Patch set 2: Added a missed call to GF_FREE().  Fixed a formatting
             consistency issue.

Patch set 3: Cleaned a minor style nit.

BUG: 789278
CID: 1124786

Change-Id: Id1f85bd2cbfac0b8727a3f6901f0a50ba921817d
Signed-off-by: Christopher R. Hertel 
Reviewed-on: http://review.gluster.org/6826
Reviewed-by: Shyamsundar Ranganathan 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

dht: do not remove linkfile if file exist in cached sub volume

2014-02-03T07:10:38+00:00

Currently with rmdir, if a directory contains only the linkfiles
we remove all the linkfiles and this is causing the problem when the cached
sub volume is down and end-up with duplicate files showing on the mount point.

Solution: Before removing a linkfile check if the
files exists in cached subvolume.

Change-Id: Iedffd0d9298ec8bb95d5ce27c341c9ade81f0d3c
BUG: 1042725
Signed-off-by: Vijaykumar M 
Reviewed-on: http://review.gluster.org/6500
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/dht: set op_errno correctly during migration.

2014-01-25T08:03:26+00:00

Change-Id: I65acedf92c1003975a584a2ac54527e9a2a1e52f
BUG: 1010241
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/6219
Reviewed-by: Shyamsundar Ranganathan 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/dht: goto statements may cause loop exit before memory is freed.

2014-01-24T09:36:31+00:00

Memory is allocated at the top of the while loop via a call to
gf_strdup(), but there are several goto calls that exit the loop, and
the memory is not freed before each of those calls to goto.  This fix
moves the final call to GF_FREE() higher in the loop so that the memory
is correctly freed.

Two variables, dup_str and str_tmp1, point to portions of the allocated
memory.  Neither are used past the final call to GF_FREE( dup_str ).

BUG: 789278
CID: 1124780

Change-Id: Id24b80cdbfd8b8855c80fffec63d7fce98cbed4a
Signed-off-by: Christopher R. Hertel 
Reviewed-on: http://review.gluster.org/6771
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

cluster/dht: Set quota limit key in dht_selfheal of dirs.

2014-01-23T05:39:57+00:00

Also fixed check in dht_is_subvol_in_layout to check if the
layouts are zero'ed out.

Change-Id: I4bf8ebf66d3ef1946309b6c9aac9e79bf8a6d495
BUG: 969461
Signed-off-by: shishir gowda 
Signed-off-by: Varun Shastry 
Reviewed-on: http://review.gluster.org/6392
Reviewed-by: Raghavendra G 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

quota: filter glusterfs quota xattrs

2014-01-23T04:43:04+00:00

Change-Id: I86ebe02735ee88598640240aa888e02b48ecc06c
BUG: 1040423
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/6490
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra G

syncop: Change return value of syncop

2014-01-20T07:05:15+00:00

Problem:
We found a day-1 bug when syncop_xxx() infra is used inside a synctask with
compilation optimization (CFLAGS -O2).

Detailed explanation of the Root cause:
We found the bug in 'gf_defrag_migrate_data' in rebalance operation:

Lets look at interesting parts of the function:

int
gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
                        dict_t *migrate_data)
{
.....
code section - [ Loop ]
        while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
                                       &entries)) != 0) {
.....
code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a
thread)
                /* Need to keep track of ENOENT errno, that means, there is no
                   need to send more readdirp() */
                readdir_operrno = errno;
.....
code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
                        ret = syncop_getxattr (this, &entry_loc, &dict,
                                               GF_XATTR_LINKINFO_KEY);
code section - [ ERRNO-2]   (checking for failures of syncop_getxattr(). This
may not always be executed in same thread which executed [SYNCOP-1])
                        if (ret < 0) {
                                if (errno != ENODATA) {
                                        loglevel = GF_LOG_ERROR;
                                        defrag->total_failures += 1;
.....
}

the function above could be executed by thread(t1) till [SYNCOP-1] and code
from [ERRNO-2] can be executed by a different thread(t2) because of the way
syncop-infra schedules the tasks.

when the code is compiled with -O2 optimization this is the assembly code that
is generated:
 [ERRNO-1]
1165                        readdir_operrno = errno; <<---- errno gets expanded
as *(__errno_location())
   0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0 
   0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------ Address
returned by __errno_location() is stored in a special location in stack for
later use.
   0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
   0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
....
 [ERRNO-2]
1281                                        if (errno != ENODATA) {
   0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----  Because
it already stored the address returned by __errno_location(), it just
dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE
EXECUTED BY SAME THREAD!!!
   0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
   0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
   0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi

The problem is that __errno_location() value of t1 and t2 are different. So
[ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is
executing [ERRNO-2] code section.

When code is compiled without any optimization for [ERRNO-2]:
1281                                        if (errno != ENODATA) {
   0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300
<<--- As it is calling __errno_location() again it gets the
location from t2 so it works as intended.
   0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
   0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
   0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1


Fix:
Make syncop_xxx() return (-errno) value as the return value in
case of errors and all the functions which make syncop_xxx() will need to use
(-ret) to figure out the reason for failure in case of syncop_xxx() failures.

Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57
BUG: 1040356
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/6475
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

dht: Ignore directory with missing xattrs, which have err == 0, and start == stop

2014-01-16T01:20:09+00:00

From the history (Patch: http://review.gluster.org/4668/)

When subvols-per-directory is < available subvols, then there are layouts
which are not populated. This leads to incorrect identification of holes or
overlaps. We need to ignore layouts, which have err == 0, and start == stop.
In the current scenario (start == stop == 0).

Additionally, in layout-merge, treat missing xattrs as err = 0. In case of
missing layouts, anomalies will reset them.

For any other valid subvoles, err != 0 in case of layouts being zeroed out.
Also reverted back dht_selfheal_dir_xattr, which does layout calculation only
on subvols which have errors.

Change-Id: Idb72a869f1a6f103046bb7e6fe0019f6ac853fd4
BUG: 1047331
Signed-off-by: Vijaykumar M 
Reviewed-on: http://review.gluster.org/6618
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan 
Reviewed-by: Anand Avati