glusterfs.git/xlators/cluster/ec/src/ec-heal.c, branch v3.13.0rc0

cluster/ec: FORWARD_NULL coverity fix

2017-11-01T13:28:51+00:00

Problem: cbk could be NULL.

Solution: Assigned appropriate value to cbk.

BUG: 789278
Change-Id: I2e4bba9a54f965c6a7bccf0b0cb6c5f75399f6e6
Signed-off-by: Sunil Kumar Acharya

cluster/ec: MISSING_BREAK coverity fix

2017-10-31T22:16:14+00:00

Problem: switch case syntax issue.

Solution: syntax fixed.

BUG: 789278
Change-Id: I76da72c3ab6ffc5db671686a71d6a596beaf496e
Signed-off-by: Sunil Kumar Acharya

cluster/ec: Coverity Fix UNUSED_VALUE in ec_create_name

2017-10-16T08:17:59+00:00

Problem: The value returned by cluster_mkdir is assigned to ret at
ec-heal.c:1076. But this value is overwritten before it can be
used.

Solution: The return value of cluster_mkdir is ignored. It is not
assigned to ret.

Change-Id: Iee6b8d8b04e0bd800dd30d2c24cab755b9e63443
BUG: 789278
Signed-off-by: Kamal Mohanan

cluster/ec: Improve heal info command to handle obvious cases

2017-10-16T02:40:01+00:00

Problem:
1 - If a brick is down and we see an index entry in
.glusterfs/indices, we should show it in heal info
output as it most certainly needs heal.

2 - The first problem is also not getting handled after
ec_heal_inspect. Even if in ec_heal_inspect, lookup will
mark need_heal as true, we don't handle it properly in
ec_get_heal_info and continue with locked inspect which
takes lot of time.

Solution:
1 - In first case we need not to do any further invstigation.
As soon as we see that a brick is down, we should say that
this index entry needs heal for sure.

2 - In second case, if we have need_heal as _gf_true after
ec_heal_inspect, we should show it as heal requires.

Change-Id: Ibe7f9d7602cc0b382ba53bddaf75a2a2c3326aa6
BUG: 1476668
Signed-off-by: Ashish Pandey

cluster/ec: add functions for stripe alignment

2017-10-13T08:17:27+00:00

This patch removes old functions to align offsets and sizes
to stripe size boundaries and adds new ones to offer more
possibilities.

The new functions are:

 * ec_adjust_offset_down()
     Aligns a given offset to a multiple of the stripe size
     equal or smaller than the initial one. It returns the
     size of the gap between the aligned offset and the given
     one.

 * ec_adjust_offset_up()
     Aligns a given offset to a multiple of the stripe size
     equal or greater than the initial one. It returns the
     size of the skipped region between the given offset and
     the aligned one. If an overflow happens, the returned
     valid has negative sign (but correct value) and the
     offset is set to the maximum value (not aligned).

 * ec_adjust_size_down()
     Aligns the given size to a multiple of the stripe size
     equal or smaller than the initial one. It returns the
     size of the missed region between the aligned size and
     the given one.

 * ec_adjust_size_up()
     Aligns the given size to a multiple of the stripe size
     equal or greater than the initial one. It returns the
     size of the gap between the given size and the aligned
     one. If an overflow happens, the returned value has
     negative sign (but correct value) and the size is set
     to the maximum value (not aligned).

These functions have been defined in ec-helpers.h as static
inline since they are very small and compilers can optimize
them (specially the 'scale' argument).

Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7
Signed-off-by: Xavier Hernandez

cluster/ec: Improve performance with xattrop update

2017-10-06T01:42:06+00:00

Existing EC code updates the xattr on the subvolume
in a sequential pattern resulting in very poor performance.

With this fix EC now updates the xattr on the subvolume
in parallel which improves the xattr update performance.

BUG: 1445663
Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486
Signed-off-by: Sunil Kumar Acharya

cluster/ec: Update xattr and heal size properly

2017-06-06T14:41:52+00:00

Problem-1 : Recursive healing of same file is happening
when IO is going on even after data heal completes.

Solution:
RCA: At the end of the write, when ec_update_size_version
gets called, we send it only on good bricks and not
on healing brick. Due to this, xattr on healing brick
will always remain out of sync and when the background
heal check source and sink, it finds this brick to be
healed and start healing from scratch. That involve
ftruncate and writing all of the data again.

To solve this, send xattrop on all the good bricks as
well as healing bricks.

Problem-2: The above fix exposes the data corruption
during heal. If the write on a file is going on and
heal finishes, we find that the file gets corrupted.

RCA:
The real problem happens in ec_rebuild_data(). Here we receive the
'size' argument which contains the real file size at the time of
starting self-heal and it's assigned to heal->total_size.

After that, a sequence of calls to ec_sync_heal_block() are done. Each
call ends up calling ec_manager_heal_block(), which does the actual work
of healing a block.

First a lock on the inode is taken in state EC_STATE_INIT using
ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is
called. This function calls ec_set_inode_size() to store the real size
of the inode (it uses heal->total_size).

The next step is to read the block to be healed. This is done using a
regular ec_readv(). One of the things this call does is to trim the
returned size if the file is smaller than the requested size.

In our case, when we read the last block of a file whose size was = 512
mod 1024 at the time of starting self-heal, ec_readv() will return only
the first 512 bytes, not the whole 1024 bytes.

This isn't a problem since the following ec_writev() sent from the heal
code only attempts to write the amount of data read, so it shouldn't
modify the remaining 512 bytes.

However ec_writev() also checks the file size. If we are writing the
last block of the file (determined by the size stored on the inode that
we have set to heal->total_size), any data beyond the (imposed) end of
file will be cleared with 0's. This causes the 512 bytes after the
heal->total_size to be cleared. Since the file was written after heal
started, the these bytes contained data, so the block written to the
damaged brick will be incorrect.

Solution:
Align heal->total_size to a multiple of the stripe size.

Thanks "Xavier Hernandez" 
to find out the root cause and to fix the issue.

Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
BUG: 1428673
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/16985
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Xavier Hernandez

cluster/ec: Implement self-heal-window_size option

2017-04-25T06:36:27+00:00

Fix implements the heal window size option for
EC. This option control the maximum size of
read/write operation carried out in self-heal
process.

BUG: 1441491
Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17098
Reviewed-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Metadata healing fails to update the version

2017-03-21T07:08:24+00:00

During meatadata heal, we were not updating the version
though all the inode attributes were in sync.

Updated the code to adjust version when all the inode
attributes are in sync.

BUG: 1425703
Change-Id: I6723be3c5f748b286d4efdaf3c71e9d2087c7235
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/16772
Smoke: Gluster Build System 
Reviewed-by: Xavier Hernandez 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

cluster/ec: Don't trigger data/metadata heal on Lookups

2017-02-27T03:06:55+00:00

Problem-1
If Lookup which doesn't take any locks observes version mismatch it can't be
trusted. If we launch a heal based on this information it will lead to
self-heals which will affect I/O performance in the cases where Lookup is
wrong. Considering self-heal-daemon and operations on the inode from client
which take locks can still trigger heal we can choose to not attempt a heal on
Lookup.

Problem-2:
Fixed spurious failure of
tests/bitrot/bug-1373520.t
For the issues above, what was happening was that ec_heal_inspect()
is preventing 'name' heal to happen

Problem-3:
tests/basic/ec/ec-background-heals.t
To be honest I don't know what the problem was, while fixing
the 2 problems above, I made some changes to ec_heal_inspect() and
ec_need_heal() after which when I tried to recreate the spurious
failure it just didn't happen even after a long time.

BUG: 1414287
Signed-off-by: Pranith Kumar K 
Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834
Reviewed-on: https://review.gluster.org/16468
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey