glusterfs.git/xlators/features, branch release-3.9

features/shard: Fix EIO error on add-brick

2017-02-27T05:45:54+00:00

        Backport of: https://review.gluster.org/14419

DHT seems to link inode during lookup even before initializing
inode ctx with layout information, which comes after
directory healing.

Consider two parallel writes. As part of the first write,
shard sends lookup on .shard which in its return path would
cause DHT to link .shard inode. Now at this point, when a
second write is wound, inode_find() of .shard succeeds and
as a result of this, shard goes to create the participant
shards by issuing MKNODs under .shard. Since the layout is
yet to be initialized, mknod fails in dht call path with EIO,
leading to VM pauses.

The fix involves shard maintaining a flag to denote whether
a fresh lookup on .shard completed one network trip. If it
didn't, all inode_find()s in fop path will be followed by a
lookup before proceeding with the next stage of the fop.

Big thanks to Raghavendra G and Pranith Kumar K for the RCA
and subsequent inputs and feedback on the patch.

Change-Id: Id0d160157ad8f6bcd52801a2173c5869517d0a96
BUG: 1426512
Signed-off-by: Krutika Dhananjay 
Reviewed-on: https://review.gluster.org/16752
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

features/shard: Put onus of choosing the inode to resolve on individual fops

2017-02-27T05:00:35+00:00

        Backport of: https://review.gluster.org/16709

... as opposed to adding checks in "common" functions to choose the inode
to resolve based local->fop, which is rather ugly and prone to errors.

Change-Id: I84c5b26160150f2fd87e7f245190c500a4b36bd8
BUG: 1426512
Signed-off-by: Krutika Dhananjay 
Reviewed-on: https://review.gluster.org/16751
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

upcall: Resolve dict leak from up_(f)removexattr in upcall code path

2017-01-31T19:51:59+00:00

Problem: In up_(f)removexattr() dict_for_key_value() is used to create a
         new dict. This dict is not correctly unref'd and gets leaked.

Solution: To avoid the leak up_(f)removexattr() now also does a
          dict_unref() on the newly created dict.

While reviewing the code in up_(f)setxattr() for a similar problem, it
was noticed that there is an extra dict created. There is no need for
this copy, upcall_local_init() can just take the dict that was passed as
argument to the FOP.

> BUG: 1412917
> Change-Id: I5bb9a7d99f5087af11c19ae722de62bdb5ad1498
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: http://review.gluster.org/16392
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Niels de Vos 
> Smoke: Gluster Build System 
> (cherry picked from commit afdd83a9b69573b854e732795c0bcba0a00d6c0f)

Change-Id: I0a53545528c43c09b88d360d3a12c460476647ba
BUG: 1417606
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/16480
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos 
Smoke: Gluster Build System

features/changelog: Fix htime xattr during brick crash

2017-01-31T09:31:53+00:00

The htime file contains the path of all the changelogs
that is rolloved over till now. It also maintains xattr
which tracks the latest changelog file rolloved over
and the number of changelogs. The path and and xattr
update happens in two different system calls. If the
brick is crashed between them, the xattr value becomes
stale and can lead to the failure of gf_history_changelog.
To identify this, the total number of changelogs is being
calculated based on htime file size and the record
length. The above value is used in case of mismatch.

> Change-Id: Ia1c3efcfda7b74227805bb2eb933c9bd4305000b
> BUG: 1413967
> Signed-off-by: Kotresh HR 
> Reviewed-on: http://review.gluster.org/16420
> NetBSD-regression: NetBSD Build System 
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Aravinda VK 

Change-Id: Ia1c3efcfda7b74227805bb2eb933c9bd4305000b
BUG: 1415065
Signed-off-by: Kotresh HR 
(cherry picked from commit 6f4811ca9331eee8c00861446f74ebe23626bbf8)
Reviewed-on: https://review.gluster.org/16438
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Aravinda VK

Upcall: Fix possible memleak when inode_ctx_set fails

2017-01-24T10:15:49+00:00

In __upcall_inode_ctx_set(), if inode_ctx_set fails we should
free allocated memory for ctx. This patch takes care of the same.

This is backport of below mainline fix.
http://review.gluster.org/16381

>Change-Id: Iafb42787151a579caf6f396c9b414ea48d16e6b4
>BUG: 1412489
>Reported-by: Nithya Balachandran 
>Signed-off-by: Soumya Koduri 
>Reviewed-on: http://review.gluster.org/16381
>Reviewed-by: N Balachandran 
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Jeff Darcy 
>(cherry picked from commit 84271e12efb783bfc83133329b0fd18aba729c84)

Change-Id: Ia258f3fb12b92795aa7546708c6da5c91f70a08a
BUG: 1414654
Reported-by: Nithya Balachandran 
Signed-off-by: Soumya Koduri 
Reviewed-on: https://review.gluster.org/16430
Reviewed-by: N Balachandran 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

upcall: Fix 'use after free' in a log message

2016-12-19T13:02:40+00:00

There is chance of accessing freed pointer in a log message at TRACE
level while cleaning up expired client entries.

Cherry picked from commit 212c7600d2070a4414bc89fd7d2c186b5994cd54:
> Change-Id: I06b4dad755df63978ab04ca52442bfd4600d139a
> BUG: 1404168
> Reported-by: Ravishankar N 
> Signed-off-by: Soumya Koduri 
> Reviewed-on: http://review.gluster.org/16117
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Niels de Vos 
> Smoke: Gluster Build System 

Change-Id: I06b4dad755df63978ab04ca52442bfd4600d139a
BUG: 1404581
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/16127
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: soumya k

md-cache, afr: Reduce the window of stale read

2016-12-02T05:45:41+00:00

Problem:
Consider a replica setup, where one mount writes data to a
file and the other mount reads the file. In afr, read operations
are not transaction based, a brick(read subvolume) is chosen as
a part of lookup or other operations, read is always wound only
to the read subvolume, even if there was write from a different client
that failed on this brick. This stale read continues until there is
a lookup or any write operation from the mount point. Currently, this
is not a major issue, as a lookup is issued before every read and it will
switch the read subvolume to a correct one. But with the plan of
increasing md-cache timeout to 600s, the stale read problem will be
more pronounced, i.e. stale read can continue for 600s(or more if cascaded
with readdirp), as there will be no lookups.

Solution:
Afr doesn't have any built-in solution for stale read(without affecting
the performance). The solution that came up, was to use upcall. When a file
on any brick is marked bad for the first time, upcall sends a notification
to all the clients that had recently accessed the file. The solution has
2 parts:
- Identifying when a file is marked bad, on any of the bricks,
  for the first time
- Client side actions on recieving the notifications

Identifying when a file is marked bad on any of the bricks for the first time:
-----------------------------------------------------------------------------
The idea is to track xattrop in upcall. xattrop currently comes with 2 afr
xattrs - afr dirty bit and afr pending xattrs.
   Dirty xattr is set to 1 before every write, and is unset if write succeeds.
In certain scenarios, dirty xattr can be 0 and still the file could be bad
copy. Hence do not track dirty xattr.
   Pending xattr is set on the good copy, indicating the other bricks that have
bad copy. It is still not as simple as, notifying when any of the pending xattrs
change. It could lead to flood of notifcations, in case the other brick is
completely down or consistantly failing. Hence it is important to notify only
once, the first time a good copy is marked bad.

Client side actions on recieving pending xattr change, notification:
--------------------------------------------------------------------
md-cache will invalidate the cache of that file, so that further lookup is
passed down to afr and hence update the read subvolume. Invalidating only in
md-cache is not enough, consider the folling oder of opertaions:
- pending xattr invalidation - invalidate md-cache
- readdirp on the bad read subvolume - fill md-cache
- lookup (served from md-cache)
- read - wound to the old read subvol.
Hence, along with invalidating md-cache, it is very important to reset the
read subvolume for that file, in afr.

Design Credit: Anuradha Talur, Ravishankar N

1. xattrop doesn't carry info saying post op/pre op.
2. Pre xattrop will have 0 value for all pending xattrs,
   the cbk of pre xattrop carries the on-disk xattr value.
   Non zero indicated healing is required.
3. Post xattrop will have non zero value for any of the
   pending xattrs, if the fop failed on any of the bricks.

>Reviewed-on: http://review.gluster.org/15398
>Reviewed-by: Pranith Kumar Karampuri 
>Tested-by: Pranith Kumar Karampuri 
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Signed-off-by: Poornima G 

Change-Id: I469cbc111714c433984fe1c922be2ef113c25804
BUG: 1399450
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15958
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/tier: handle fast demotions

2016-11-28T16:51:36+00:00

Demote files on priority if hi-watermark has been breached and continue
to demote until the watermark drops below hi-watermark.

Monitor watermark more frequently.
Trigger demotion as soon as hi-watermark is breached.
Add cluster.tier-query-limit option to limit number
of files returned from the database query for every iteration of
tier_migrate_using_query_file(). If watermark hasn't dropped below
hi-watermark during the first iteration, the next iteration will be
triggered approximately 1 second after tier_demote() returns to the
main tiering loop.
Update changetimerecorder xlator to handle query for emergency demote
mode.

Add tier-ctr-interface.h:
Move tier and ctr interface specific macros and struct definition from
libglusterfs/src/gfdb/gfdb_data_store.h to new header
libglusterfs/src/tier-ctr-interface.h

> Reviewed-on: http://review.gluster.org/15158
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Dan Lambright 
(cherry picked from commit 460016428cf27484c333227f534c2e2f73a37fb1)

Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
BUG: 1394482
Signed-off-by: Milind Changire 
Reviewed-on: http://review.gluster.org/15835
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Dan Lambright

features/index: Delete granular entry indices of already healed directories during crawl

2016-11-26T11:10:52+00:00

        Backport of: http://review.gluster.org/15880

If granular name indices are already in existence for a volume, and
before they are healed, granular entry heal be disabled, a crawl on
indices/xattrop will clear the changelogs on these directories. When
their corresponding entry-changes indices are crawled subsequently,
if it is found that the directories don't need heal anymore, the
granular indices are not cleaned up.
This patch fixes that problem by ensuring that the zero-xattrop
also deletes the stale indices at the level of index translator.

Change-Id: If4a2f14e33a78f2217e9fea8733ebb552af56059
BUG: 1398500
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15926
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

marker: Fix inode value in loc, in setxattr fop

2016-11-22T05:43:26+00:00

Backport of http://review.gluster.org/15826

On recieving a rename fop, marker_rename() stores the,
oldloc and newloc in its 'local' struct, once the rename
is done, the xtime marker(last updated time) is set on
the file, but sending a setxattr fop. When upcall
receives the setxattr fop, the loc->inode is NULL and
it crashes. The loc->inode can be NULL only in one valid
case, i.e. in rename case where the inode of new loc
can be NULL. Hence, marker should have filled the inode
of the new_loc before issuing a setxattr.

> Reviewed-on: http://review.gluster.org/15826
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Kotresh HR 
> Smoke: Gluster Build System 
> Reviewed-by: Rajesh Joseph 
(cherry picked from commit 46e5466850311ee69e6ae9a11c2bba2aabadd5de)

Change-Id: Id638f678c3daaf4a5c29b970b58929d377ae8977
BUG: 1396414
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15877
Reviewed-by: Rajesh Joseph 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System