glusterfs.git/xlators/cluster/afr/src/afr-common.c, branch v3.4.0alpha

cluster/afr: wakeup delayed post op on fsync

2013-01-29T20:33:05+00:00

Change-Id: I5d84ef72615f9d71b4af210976e2449de6e02326
BUG: 888174
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4446
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

afr: Modified book-keeping structures for entrylks

2013-01-23T17:17:00+00:00

* There are upto 3 entry lockees that may be needed to perform
  entrylk'ing in posix dir-write operations.

* For eg, rmdir ("/a/b") needs to acquire locks on two entities,
  - entrylk ("/a", "b")
  - entrylk ("/a/b", null)

* Changed existing entrylk/rename/selfheal (entrylk) transactions
  to use the new book-keeping structures

* Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now
  aware of the new entry lockee structure.

Implementation notes:
* Changed 'cookie' sent in stack_wind to encode lockee_entity_no
  and subvol_no.

  cookie is a non-negative integer such that 0 <= cookie < replica_count,
  When more than one lock is being acquired across the subvolumes,
  cookie % replica_count gives the subvol_no
  cookie / replica_count gives the lockee_entity_no.

Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153
BUG: 765564
Signed-off-by: Krishnan Parthasarathi 
Reviewed-on: http://review.gluster.org/2828
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: Fail readv on data-split-brain

2013-01-18T21:20:52+00:00

Problem:
Afr prevents opens on a file in split-brian but the
fd that is already open still has the capability to perform
both reads and writes to the file.

Fix:
Fail readvs on a file with EIO.

Change-Id: I8e07f24c36fab800499b36ab374f984b743332cd
BUG: 873962
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4199
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Anand Avati

afr: conditionally prioritize EIO errors over ENOENT

2013-01-18T17:32:21+00:00

The most important errno logic historically only prioritized ESTALE
over ENOENT. Commit c8c0942d added EIO prioritization over ENOENT
to ensure that split-brain was reported when it occurs in
conjunction with bricks missing the file entry. The unintended side
effect of this change is that (non split-brain) EIO errors reported
from the bricks themselves are now reported to the client when the
expectation is that afr should squash said errors in favor of
marking the file inconsistent.

The high-level problem is that EIO is overloaded with different
meanings from different contexts. This commit adds an eio parameter
to the errno priority logic to conditionally flag when EIO is of
higher priority and should be propagated to the client.

BUG: 892730
Change-Id: Ib692a8a1f1737ef190d57894f392ec53ffb33aab
Signed-off-by: Brian Foster 
Reviewed-on: http://review.gluster.org/4376
Reviewed-by: Jeff Darcy 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

afr: replace afr_more_important_error with afr_most_important_error

2013-01-17T16:56:18+00:00

afr_more_important_error() is written to return whether a new errno
should override an existing errno for high-level operations that
could span multiple sub-operations. It specifically prioritizes
ESTALE over EIO over ENOENT, and otherwise defaults to the latest
error passed having priority.

This change preserves current behavior, but rewrites the logic to
return the higher priority error of the existing and new errno. The
purpose of the change is to make the logic a bit more clear and set
the stage for future changes to make the logic flexible based on
context.

BUG: 892730
Change-Id: Id1aa48855dfb0507abc9d1ef22f2259b30472576
Signed-off-by: Brian Foster 
Reviewed-on: http://review.gluster.org/4375
Reviewed-by: Jeff Darcy 
Tested-by: Gluster Build System

cluster/afr: Remember type of split-brain in inode-ctx

2012-12-12T00:23:53+00:00

Along with this change, fixed the race of setting the
split-brain status in inode-ctx after unwinding the fop from
self-heal in case of back-ground self-heal.

Change-Id: Ifc829300df485f50f139443802e8b6dc7038b4ad
BUG: 873962
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4198
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: mark new entry changelog for create/mknod failures

2012-12-04T22:50:52+00:00

Problem:
When create/mknod fails on some of the nodes, appropriate pending
data/metadata changelogs are not assigned. This was not considered
to be an issue because entry self-heal would do the assigning of
appropriate changelog after creating new entries. But using
the combination of rebalance and remove brick we can construct a
case where a file with same name and gfid can be created in a dir
with different data and link-to xattr without any changelog.

Fix:
When a create/mknod failure is observed mark the appropriate
changelog on the new file created.

Change-Id: I4c32cbf5594a13fb14deaf97ff30b2fff11cbfd6
BUG: 858212
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4207
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

afr: make flush non-transactional

2012-12-04T22:42:58+00:00

Flush is historically a transaction to ensure all previous writes
were complete. This is no longer required as write-behind has
learned to make flush a barrier operation (re: conversation w/
Avati).

Flush taking a full file lock causes VMs running on afr volumes
to stall when a migration occurs and self-heal is in progress.
Make afr_flush() a non-transactional operation.

BUG: 874045
Change-Id: If2db83823e280c86b1b29b41361eed7081601632
Signed-off-by: Brian Foster 
Reviewed-on: http://review.gluster.org/4261
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: Provide option to disable readdir failover

2012-12-03T08:11:02+00:00

In a replica pair unlike files, directories may not have their
content in same order, so readdir for same (offset, size) may
not give same entries on both the sobvolumes of replica pair.
Switching over from one subvolume to another may not be a good
idea sometimes. It may lead to duplicate entries or fewer entries
or both. This patch provides a way to disable readdir-failover
so that applications like rebalance can retry if they want to.

Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0
BUG: 859387
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4159
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

afr: handle short writes in afr_writev_wind and self-heal to avoid corruption

2012-11-29T17:00:28+00:00

The current failure to handle short writes on writev fops leaves
us open to file corruption. A short write on a user request is
ignored and leaves replicas in an inconsistent state. A short write
during a self-heal is ignored and incorrectly marks the files as
consistent if the heal completes.

Modify user writev handling to return the best case return value
from each of the replicas. Short writes that occur relative to this
value are marked as failed and will require a heal. Modify
self-heal to set an error on a short write and abort the heal.

BUG: 853690
Change-Id: I18b30f58702326249230eeebb361b29e40b535f5
Signed-off-by: Brian Foster 
Reviewed-on: http://review.gluster.org/4150
Reviewed-by: Jeff Darcy 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System