glusterfs.git/xlators/storage, branch v3.6.0beta3

Fix invalid seekdir() usage

2014-09-30T16:50:26+00:00

According to POSIX, seekdir() should only be given offset obtained from
telldir() on the same DIR *
http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html

Code from afr-self-heald.c and index.c is operating outside of the
specification, by doing using seekdir() with offset from a previously
open/close/re-open directory. This seems to work on Linux (although with
no guarantee it will always in the future). On NetBSD the seekdir()
with a in invalid offset is a nilpotent operation, and causes an infinite
loop, since index_fill_readdir() always restart from the beginning of the
directory.

The situation is fixed by using a non anonymous fd in afr-self-heald.c:
we explicitely open the directory so that it remains open on the brick
side during the timeframe where we want to reuse offsets in seekdir().
This requires adding an opendir fop in index xlator.

If the brick was not updated, the opendir will fail and we fallback
to the standard violating approach for backward compatibility on Linux.
On other systems we fail since it never worked.

While there, add tests to check seekdir() success in index and posix
xlators, so that incorrect usage from calling code produce an explicit
error instead of an infinite loop. We can only do it on non Linux systems,
for the sake of backward compatibility when the brick was updated but
not the client.

Backport of I88ca90acfcfee280988124bd6addc1a1893ca7ab

BUG: 1138897
Change-Id: I5446a9a17d5451ec5aab8fbd10d381da9a0a23ad
Signed-off-by: Emmanuel Dreyfus 
Reviewed-on: http://review.gluster.org/8860
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Vijay Bellur

glusterd/quota: Heal pgfid xattr on existing data when the quota is enable

2014-09-30T16:42:40+00:00

This is a backport of http://review.gluster.org/#/c/8878/

The pgfid extended attributes are used to construct the ancestry path
(from the file to the volume root) for nameless lookups on files.
As NFS relies on nameless lookups heavily, quota enforcement through NFS
would be inconsistent if quota were to be enabled on a volume with
existing data.

Solution is to heal the pgfid extended attributes as a part of lookup
perfomed by quota-crawl process. In a posix lookup check for pgfid xattr
and if it is missing set the xattr.

BUG: 1147953
Change-Id: I707d91a056e07452bfd1e070af5eddaa752a84ac
Signed-off-by: vmallika 
Reviewed-on: http://review.gluster.org/8890
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

Do not forbid fallocate on non Linux systems

2014-09-26T12:24:04+00:00

Linux fallocate() differs from posix_fallocate() by
an extra flag that can have the FALLOC_FL_KEEP_SIZE value;

Do not test FALLOC_FL_KEEP_SIZE existence to enable fallocate()
in posix xlator, as sys_fallocate() in libglusterfs provides
support for both implementations.

Backport of Idf41a0396028a15e81281791bf6912d7fd674e3f
BUG: 1138897
Signed-off-by: Emmanuel Dreyfus 

Change-Id: Ie6e5ea923561630c52a6db5c7f83313cfdc34811
Reviewed-on: http://review.gluster.org/8862
Tested-by: Gluster Build System 
Reviewed-by: Kaleb KEITHLEY 
Reviewed-by: Vijay Bellur

storage/posix: Log when mkdir is on an existing gfid but non-existent

2014-09-19T16:46:11+00:00

path.

consider following steps on a distribute volume

1. rename (src, dst) on hashed subvolume
2. snapshot taken
3. restore snapshots and do stat on src and dst

Now, we end up with two directories src and dst having same gfid,
because of distribute creating directories on non-existent subvolumes
as part of directory healing.

This can happen even with race between rename and directory healing in
dht-lookup. This can lead to undefined behaviour while accessing any
of both directories. Hence, we are logging paths of both
directories, so that a sysadmin can take some corrective action when
(s)he sees this log. One of the corrective action can be to copy
contents of both directories from backend into a new directory and
delete both directories.

Since effort involved to fix this issue is non-trivial, giving this
workaround till we come up with a fix.

Change-Id: I38f4520e6787ee33180a9cd1bf2f36f46daea1ea
BUG: 1144485
Signed-off-by: Raghavendra G 
Reviewed-on-master: http://review.gluster.org/8008
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Vijay Bellur 
Tested-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8783
Tested-by: Gluster Build System

storage/posix: Don't unlink .glusterfs-hardlink before linkto check

2014-09-12T14:06:16+00:00

BUG: 1138385
Change-Id: I90a10ac54123fbd8c7383ddcbd04e8879ae51232
Signed-off-by: Venkatesh Somyajulu 
Reviewed-on-master: http://review.gluster.org/8559
Tested-by: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8612

storage/posix: Prefer gfid links for inode-handle

2014-09-12T09:55:51+00:00

        Backport of http://review.gluster.org/8575

Problem:
File path could change by other entry operations in-flight so if renames are in
progress at the time of other operations like open, it may lead to failures.
We observed that this issue can also happen while renames and readdirps/lookups
are in progress because dentry-table is going stale sometimes.

Fix:
Prefer gfid-handles over paths for files. For directory handles prefering
gfid-handles hits performance issues because it needs to resolve paths
traversing up the symlinks.
Tests which test if files are opened should check on gfid path after this change.
So changed couple of tests to reflect the same.

Note:
This patch doesn't fix the issue for directories. I think a complete fix is to
come up with an entry operation serialization xlator. Until then lets live with
this.

BUG: 1136821
Change-Id: If93e46d542a4e96a81a0639b5210330f7dbe8be0
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/8594
Reviewed-by: Vijay Bellur 
Tested-by: Gluster Build System

storage/posix: removing deleting entries in case of creation failures

2014-09-10T16:11:20+00:00

The code is not atomic enough to not to delete a dentry created by a
prallel dentry creation operation.

Change-Id: I9bd6d2aa9e7a1c0688c0a937b02a4b4f56d7aa2e
BUG: 1138387
Signed-off-by: Raghavendra G 
Reviewed-on-master: http://review.gluster.org/8327
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8693
Tested-by: Vijay Bellur

cluster/dht: Modified logic of linkto file deletion on non-hashed

2014-09-10T15:48:56+00:00

Currently whenever dht_lookup_everywhere gets called, if in
dht_lookup_everywhere_cbk, a linkto file is found on non-hashed
subvolume, file is unlinked. But there are cases when this file
is under migration. Under such condition, we should avoid deletion
of file.

When  some other rebalance process changes the layout of parent
such that dst_file (w.r.t. migration) falls on non-hashed node,
then may be lookup could have found it as linkto file but just
before unlink, file  is under migration or already migrated
In such cased unlink can be avoided.

Race:
-------
If we have two bricks (brick-1 and brick-2) with initial file "a"
under BaseDir which is hashed as well as cached on (brick-1).

Assume "a"  hashing gives 44.

                              Brick-1              Brick-2

Initial Setup:               BaseDir/a             BaseDir
                             [1-50]                [51-100]

Now add new-brick Brick-3.

1. Rebalance-1 on node Node-1 (Brick-1 node) will reset
the BaseDir Layout.

2. After that it will perform
a)  Create linkto file on  new-hashed (brick-2)
b)  Perform file migration.

1.Rebalance-1 Fixes the base-layout:
                 Brick-1             Brick-2           Brick-3
                 ---------         ----------         ------------
                 BaseDir/a            BaseDir           BaseDir
                  [1-33]              [34-66]           [67-100]

2. Only a) is     BaseDir/a          BaseDir/a(linkto)   BaseDir
   performed                         Create linktofile

Now rebalance 2 on node-2 jumped in and it will perform
step 1 and 2-a.

After (rebal-2, step-1), it changes the layout of the BaseDir.
                    BaseDir/a     BaseDir/a(link)    BaseDir
                    [67-100]           [1-33]        [34-66]

For  (rebale-2, step-2), It will perform lookup at Brick-3 as w.r.t new
layout 44 falls for brick-3. But lookup will fail.
So  dht_lookup_everywhere gets called.

NOTE: On brick-2 by rebalance-1, a linkto file was created.

Currently that linkto files gets deleted by rebalance-2 lookup as it
is considered as stale linkto file.  But  with patch if rebalance is
already in progress or rebalance is over,  linkto file will not be
unlinked. If rebalance is in progress fd will be  open and if rebalance
is over then linkto file wont be set.

Change-Id: I3fee0d28de3c76197325536a9e30099d2413f07d
BUG: 1138385
Signed-off-by: Venkatesh Somyajulu 
Reviewed-on-master: http://review.gluster.org/8345
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Shyamsundar Ranganathan 
Reviewed-by: Vijay Bellur

cluster/dht: Added code to capture races in dht-lookup path

2014-09-09T19:20:07+00:00

Change-Id: I9270d2d40ebd4b113ff961583dfda7754741f151
BUG: 1138385
Signed-off-by: Venkatesh Somyajulu 
Reviewed-on-master: http://review.gluster.org/8430
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur 
Reviewed-by: Jeff Darcy 
Reviewed-on: http://review.gluster.org/8668
Tested-by: Vijay Bellur

cluster/dht: Fix races to avoid deletion of linkto file

2014-09-09T17:44:55+00:00

Explanation of Race between rebalance processes:
https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4

STATE 1:                          BRICK-1
only one brick                   Cached File
in the system

STATE 2:
Add brick-2                       BRICK-1                BRICK-2

STATE 3:                                       Lookup of File on brick-2
                                               by this node's rebalance
                                               will fail because hashed
                                               file is not created yet.
                                               So dht_lookup_everywhere is
                                               about to get called.

STATE 4:                         As part of lookup
                                 link file at brick-2
                                 will be created.

STATE 5:                         getxattr to check that
                                 cached file belongs to
                                 this node is done

STATE 6:

                                            dht_lookup_everywhere_cbk detects
                                            the link created by rebalance-1.
                                            It will unlink it.

STATE 7:                        getxattr at the link
                                file with "pathinfo" key
                                will be called will fail
                                as the link file is deleted
                                by rebalance on node-2

Fix:
So in the STATE 6, we should avoid the deletion of link file. Every time
dht_lookup_everywhere gets called, lookup will be performed on all the nodes.
So to avoid STATE 6, if linkto file is found, it is not deleted until valid
case is found in dht_lookup_everywhere_done.

Case 1: if linkto file points to cached node, and cached file exists,
        uwind with success.

Case 2: if linkto does not point to current cached node, and cached file
        exists:
        a) Unlink stale link file
        b) Create new link file

Case 3: Only linkto file exists:
        Delete linkto file

Case 4: Only cached file
        Create link file (Handled event without patch)

Case 5: Neither cached nor hashed file is present
        Return with ENOENT (handled even without patch)

Change-Id: Ibf53671410d8d613b8e2e7e5d0ec30fc7dcc0298
BUG: 1138385
Signed-off-by: Venkatesh Somyajulu 
Reviewed-on-master: http://review.gluster.org/8231
Reviewed-by: Vijay Bellur 
Tested-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8603
Reviewed-by: Jeff Darcy 
Tested-by: Gluster Build System