glusterfs.git/xlators/cluster/dht, branch release-3.9

cluster/dht: Do rename cleanup as root

2017-01-20T07:18:58+00:00

Problem:
Rename linkfile cleanup is done as non-root which may not have priviliges to do
the rename so it fails with EACCESS. MKDIR on that name in future will start to
hole on this subvolume. It is not easy to hit on fuse mounts because vfs takes
care of the permission checks even before rename fop is wound. But with
nfs-ganesha mounts it happens.

Fix:
Do rename cleanup as root

 >BUG: 1409727
 >Change-Id: I414c1eb6dce76b4516a6c940557b249e6c3f22f4
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/16317
 >Smoke: Gluster Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Raghavendra G 
 >Reviewed-by: N Balachandran 
 >NetBSD-regression: NetBSD Build System 

BUG: 1413061
Change-Id: If94121275b141c5f52084b8aafac86451e667d3d
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/16412
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: N Balachandran

dht/md-cache: Filter invalidate if the file is made a linkto file

2017-01-05T06:10:12+00:00

Backport of http://review.gluster.org/15789

Upcall as a part of setattr, sends an invalidation and the
invalidation carries the resulting stat value. When a file
is converted to linkto files, even then an invalidation
is set and as a result the mountpoint shows the sticky
bit in the stat of the file.
eg: ---------T. 945 root root 0 Nov  8 10:14 hardlink.999

Fix:
When dht recieves a notification of sticky bit change, it updates
the flag, to indicate md-cache to send the subsequent lookup.

>Reviewed-on: http://review.gluster.org/15789
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Niels de Vos 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Susant Palai 
>Reviewed-by: Rajesh Joseph 
>(cherry picked from commit 4536f7bdf16f8286d67598eda9a46c029f0c0bf4)

Change-Id: Ic2fd7a5b196db0754f9b97072e644e6bf69da606
BUG: 1401376
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/16022
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

cluster/dht: Fix memory corruption while accessing regex stored in

2017-01-03T11:36:10+00:00

private

If reconfigure is executed parallely (or concurrently with dht_init),
there are races that can corrupt memory. One such race is modification
of regexes stored in conf (conf->rsync_regex_valid and
conf->extra_regex_valid) through dht_init_regex. With change [1],
reconfigure codepath can get executed parallely (with itself or with
dht_init) and this fix is needed.

Also, a reconfigure can race with any thread doing dht_layout_search,
resulting in dht_layout_search accessing regex freed up by reconfigure
(like in bz 1399134).

[1] http://review.gluster.org/15046

>Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830
>BUG: 1399134
>Signed-off-by: Raghavendra G 
>Reviewed-on: http://review.gluster.org/15945
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: N Balachandran 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Shyamsundar Ranganathan 

Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830
BUG: 1399422
Signed-off-by: Raghavendra G 
(cherry picked from commit 64451d0f25e7cc7aafc1b6589122648281e4310a)
Reviewed-on: http://review.gluster.org/15949
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UP

2016-12-13T09:46:14+00:00

Backport of http://review.gluster.org/#/c/15764/

Currently these are few events related to child_up/down:
GF_EVENT_CHILD_UP :  Issued when any of the protocol client
connects.
GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec
GF_EVENT_CHILD_DOWN : Issued when any of the protocol client
disconnects.
These events get modified at the dht/afr/ec layers. Here is a
brief on the same.

DHT:
- All the subvolumes reported once, and atleast one child came
  up, then GF_EVENT_CHILD_UP is issued
- connect GF_EVENT_CHILD_UP is issued
- disconnect GF_EVENT_CHILD_MODIFIED is issued
- All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued

AFR:
- First subvolume came up, then GF_EVENT_CHILD_UP is issued
- Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED
- Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued
- Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued

Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP,
GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes
go up or down.

Now with md-cache changes, there is a necessity to differentiate between
child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and
getting rid of GF_EVENT_CHILD_MODIFIED.

[1] http://review.gluster.org/12573

>Reviewed-on: http://review.gluster.org/15764
>CentOS-regression: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Smoke: Gluster Build System 
>Reviewed-by: N Balachandran 
>Reviewed-by: Pranith Kumar Karampuri  
>Reviewed-by: Rajesh Joseph 
(cherry picked from commit f7ab6c45963fa0da68acedfb14281cd2456abc68)

Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58
BUG: 1396880
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15890
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/dht: Check for null inode

2016-12-07T14:03:53+00:00

Check for NULL inode before attempting to
set dht inode ctx.

> Change-Id: I7693c18445f138221d8417df5e95b118cedb818a
> BUG: 1395261
> Signed-off-by: N Balachandran 
> Reviewed-on: http://review.gluster.org/15847
> Smoke: Gluster Build System 
> Reviewed-by: Shyamsundar Ranganathan 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
(cherry picked from commit 8313d53accaa22feb14d284fb91245be0a32e16e)

Change-Id: Id8c7bfe181bb40a02cd49b0f5fc3b45cabf5afa6
BUG: 1395517
Signed-off-by: N Balachandran 
Reviewed-on: http://review.gluster.org/15851
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri

dht/rename : Incase of failure remove linkto file properly

2016-12-07T02:06:35+00:00

Generally linkto file is created using root user. Consider following
case, a user is trying to rename a file which he is not permitted.
So the rename fails with EACESS and when rename tries to cleanup the
linkto file, it fails.

The above issue happens when rename/00.t test executed on nfs-ganesha
clients :
Steps executed in script
* create a file "abc" using root
* rename the file "abc" to "xyz" using a non root user, it fails with EACESS
* delete "abc"
* create directory "abc" using root
* again try ot rename "abc" to "xyz" using non root user, test hungs here
which slowly leds to OOM kill of ganesha process

RCA put forwarded by Du for OOM kill of ganesha
Note that when we hit this bug, we've a scenario of a dentry being
present as:
    * a linkto file on one subvol
    * a directory on rest of subvols

When a lookup happens on the dentry in such a scenario, the control flow
goes into an infinite loop of:

    dht_lookup_everywhere
    dht_lookup_everywhere_cbk
    dht_lookup_unlink_cbk
    dht_lookup_everywhere_done
    dht_lookup_directory (as local->dir_count > 0)
    dht_lookup_dir_cbk (sets to local->need_selfheal = 1 as the entry is a linkto file on one of the subvol)
    dht_lookup_everywhere (as need_selfheal = 1).

This infinite loop can cause increased consumption of memory due to:
1) dht_lookup_directory assigns a new layout to local->layout unconditionally
2)  Most of the functions in this loop do a stack_wind of various fops.

This results in growing of call stack (note that call-stack is destroyed only after lookup response is
received by fuse - which never happens in this case)

Thanks Du for root causing the oom kill and Sushant for suggesting the fix

Upstream reference :
>Change-Id: I1e16bc14aa685542afbd21188426ecb61fd2689d
>BUG: 1397052
>Signed-off-by: Jiffin Tony Thottan 
>Reviewed-on: http://review.gluster.org/15894
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Raghavendra G 
>(cherry picked from commit 57d59f4be205ae0c7888758366dc0049bdcfe449)

Change-Id: I1e16bc14aa685542afbd21188426ecb61fd2689d
BUG: 1401023
Signed-off-by: Jiffin Tony Thottan 
Reviewed-on: http://review.gluster.org/16014
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kaleb KEITHLEY

cluster/dht: A hard link is lost during rebalance + lookup

2016-11-29T10:21:03+00:00

Problem: A hard link is lost during rebalance + lookup.Rebalance skip
         files if file has hardlink.In dht_migrate_file
         __is_file_migratable () function checks if a file has hardlink,
         if yes file is not migrated but if link is created after call
         this function then link will lost.

Solution: Call __check_file_has_hardlink to check hardlink existence
          after (S+T) bits in migration process ,if file has hardlink
          then skip the file for migrate rebalance process.

> BUG: 1396048
> Change-Id: Ia53c07ef42f1128c2eedf959a757e8df517b9d12
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit 4b8ccbed28837bd78894cb5ce3cf15bc8f364a93)

BUG: 1399430
Change-Id: Idc869f2cf2355dacf54c36008840092b8e77acb9
Signed-off-by: Mohit Agrawal 
Reviewed-on: http://review.gluster.org/15955
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

cluster/tier: handle fast demotions

2016-11-28T16:51:36+00:00

Demote files on priority if hi-watermark has been breached and continue
to demote until the watermark drops below hi-watermark.

Monitor watermark more frequently.
Trigger demotion as soon as hi-watermark is breached.
Add cluster.tier-query-limit option to limit number
of files returned from the database query for every iteration of
tier_migrate_using_query_file(). If watermark hasn't dropped below
hi-watermark during the first iteration, the next iteration will be
triggered approximately 1 second after tier_demote() returns to the
main tiering loop.
Update changetimerecorder xlator to handle query for emergency demote
mode.

Add tier-ctr-interface.h:
Move tier and ctr interface specific macros and struct definition from
libglusterfs/src/gfdb/gfdb_data_store.h to new header
libglusterfs/src/tier-ctr-interface.h

> Reviewed-on: http://review.gluster.org/15158
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Dan Lambright 
(cherry picked from commit 460016428cf27484c333227f534c2e2f73a37fb1)

Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
BUG: 1394482
Signed-off-by: Milind Changire 
Reviewed-on: http://review.gluster.org/15835
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Dan Lambright

events: Add FMT_WARN for gf_event

2016-11-24T05:44:51+00:00

Raghavendra G found that posix is trying to print %s
but passing an int when HEALTH_CHECK fails in posix.
These are the kind of bugs that should be caught
at compilation itself.
Also fixed the problematic gf_event() callers.

 >BUG: 1386097
 >Change-Id: Id7bd6d9a9690237cec3ca1aefa2aac085e8a1270
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/15671
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Atin Mukherjee 
 >CentOS-regression: Gluster Build System 

BUG: 1396778
Change-Id: Idf8e1f427578d02dccd2a8165884a5cf086eb07e
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15884
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

cluster/dht Set layout after mkdir as root

2016-11-23T09:34:24+00:00

DHT does not set the layout for newly created
directories as root. This causes EPERM failures
when a non-root user with insufficient permissions
creates directories.

credit: srangana@redhat.com for RCA

> Change-Id: Ia646e41665ce172c43c5f01d2707455e8eb374ed
> BUG: 1392772
> Signed-off-by: N Balachandran 
> Reviewed-on: http://review.gluster.org/15794
> Reviewed-by: Susant Palai 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Smoke: Gluster Build System 
> Reviewed-by: Raghavendra G 
> Reviewed-by: Jeff Darcy 
(cherry picked from commit 3e405b546e8b9fe15ae477613474e9cd2d2df4e7)

Change-Id: Ib792d4018e528b5805ec7cff4988fada17fff0da
BUG: 1397252
Signed-off-by: N Balachandran 
Reviewed-on: http://review.gluster.org/15898
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G