glusterfs.git/xlators/cluster/afr/src/afr-dir-write.c, branch v3.4.1qa3

afr: Modified book-keeping structures for entrylks

2013-01-23T17:17:00+00:00

* There are upto 3 entry lockees that may be needed to perform
  entrylk'ing in posix dir-write operations.

* For eg, rmdir ("/a/b") needs to acquire locks on two entities,
  - entrylk ("/a", "b")
  - entrylk ("/a/b", null)

* Changed existing entrylk/rename/selfheal (entrylk) transactions
  to use the new book-keeping structures

* Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now
  aware of the new entry lockee structure.

Implementation notes:
* Changed 'cookie' sent in stack_wind to encode lockee_entity_no
  and subvol_no.

  cookie is a non-negative integer such that 0 <= cookie < replica_count,
  When more than one lock is being acquired across the subvolumes,
  cookie % replica_count gives the subvol_no
  cookie / replica_count gives the lockee_entity_no.

Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153
BUG: 765564
Signed-off-by: Krishnan Parthasarathi 
Reviewed-on: http://review.gluster.org/2828
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: mark new entry changelog for create/mknod failures

2012-12-04T22:50:52+00:00

Problem:
When create/mknod fails on some of the nodes, appropriate pending
data/metadata changelogs are not assigned. This was not considered
to be an issue because entry self-heal would do the assigning of
appropriate changelog after creating new entries. But using
the combination of rebalance and remove brick we can construct a
case where a file with same name and gfid can be created in a dir
with different data and link-to xattr without any changelog.

Fix:
When a create/mknod failure is observed mark the appropriate
changelog on the new file created.

Change-Id: I4c32cbf5594a13fb14deaf97ff30b2fff11cbfd6
BUG: 858212
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4207
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

Preventing client crashing as the callings of GF_CALLOC has been failed.

2012-10-12T03:59:45+00:00

As the callings of GF_CALLOC can seldom come to a failure, glusterfs client
will crash due to segment fault. We should have returned once the variables
of transaction's local can't be alloced.

Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b
BUG: 861335
Signed-off-by: linbaiye 
Reviewed-on: http://review.gluster.org/4005
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: Unwind with correct pre/post parent bufs

2012-08-02T20:12:21+00:00

RCA:
In case of dir fops create, mknod, mkdir, link, symlink, rename
if the fop fails on read-child then unwinds are happening with
all-zero pre/post iatt-bufs. The bug occurs because the parent
bufs are not saved if the response is not from read-child.

Fix:
Save the pre/post-bufs for the first response. If the response
comes from read-child, overwrite whatever we have cached.

Tests:
Attached the mount process to gdb.
Tested that the unwinds happen with proper pre/post iatt bufs in
the following cases:
1) All success case
2) Failure on read-child
3) Failure on non-read-child
4) Failure on all children.

Tested soft-link self-heal to test the change made in that.
Tested errno ENOTEMPTY for rmdir, rename fops.

Change-Id: I82882423d2d766b4f4a3044203bcb5dbcaee1755
BUG: 845242
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.com/3775
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: Handle failures in fop_cbk gracefully

2012-08-01T06:47:23+00:00

RCA:
Afr crashes when a last fop response fails and
'fop output' arguments are NULL. Afr does not handle
these gracefully.

Fix:
Changed the fops to not access the 'fop output' arguments
in case of failures.

Tests:
Changed afr wind_cbk code to fail the last response by setting
op_ret as -1 and op_errno as ENOMEM and setting all other output
variables as NULL to test the change. Removed the code to verify
success cases. No crashes or errors seen.

Change-Id: Iad9bc54db093a162f85bfb8dbeeda5b95acd21d8
BUG: 844689
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.com/3760
Tested-by: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Anand Avati

cluster: fix crash on link of named pipe in stripe/replicate vol

2012-07-25T22:03:57+00:00

A crash occurs when attempting to link a named pipe on a striped,
replicated volume. The cause for this crash is attempting to deref
a NULL inode pointer in stripe_link_cbk(). The RCA for this bug
uncovered a couple of problems:

- AFR ignores the inode pointer it receives on failure (returning
  NULL).
- stripe assumes the inode pointer is valid on failure.

Either one of these changes addresses the crash, but this patch
includes both changes. AFR is modified to pass along the inode
pointer it receives (which could still be NULL). stripe is
modified to not assume the inode pointer is valid on fop failure.

BUG: 842825
Change-Id: I9cb2cc918552620929c3ecbd69bc66d4635eafdc
Signed-off-by: Brian Foster 
Reviewed-on: http://review.gluster.com/3727
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Anand Avati

afr: pass back xdata in create

2012-07-23T18:48:29+00:00

A striped, replicated volume spits an error on file creation because
stripe requires xdata to process stripe information and AFR isn't
passing it back.

This fix was suggested by Amar Tumballi.

BUG: 842373
Change-Id: Ia7063590ca5e873d4a4e155989cf067e8a07501f
Signed-off-by: Brian Foster 
Reviewed-on: http://review.gluster.com/3713
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

remove useless if-before-free (and free-like) functions

2012-07-13T21:03:42+00:00

See comments in http://bugzilla.redhat.com/839925 for
the code to perform this change.

Signed-off-by: Jim Meyering 
BUG: 839925
Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a
Reviewed-on: http://review.gluster.com/3661
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

replicate: add hashed read-child method.

2012-06-01T00:29:01+00:00

Both the first-to-respond method and the round-robin method are susceptible
to clients repeatedly choosing the same servers across a series of opens,
creating hot spots.  Also, the code to handle a replica being down will
ignore both methods and just choose the first remaining (which is not an
issue for two-way but can be otherwise).  The hashed method more reliably
avoids such hot spots.  There are three values/modes.

0: use the old (broken) methods.

1: select a read-child based on a hash of the file's GFID, so all clients
   will choose the same subvolume for a file (ensuring maximum consistency)
   but will distribute load for a set of files.

2: select a read-child based on a hash of the file's GFID plus the client's
   PID, so different children will distribute load even for one file.

Mode 2 will probably be optimal for most cases.  Using response time when we
open the file is problematic, both because a single sample might not have
been representative even then and because load might have shifted in the
hours or days since (for long-lived files).  Trying to use more current load
information can lead to "herd following" behavior which is just as bad.
Pseudo-random distribution is likely to be the best we can reasonably do,
just as it is for DHT.

Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461
BUG: 802513
Signed-off-by: Jeff Darcy 
Reviewed-on: http://review.gluster.com/2926
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

Thou shalt not free(3) memory dirname(3) returned

2012-05-21T20:51:44+00:00

On Linux basename() and dirname() return a pointer within the string
passed as argument. On BSD flavors, basename() and dirname() return
static storage, or pthread specific storage. Both behaviour are
compliant, but calling free on the result in the second case is a bug.

BUG: 764655
Change-Id: Ic82414aff1f8db2a7544b16315761ce1c05276c4
Signed-off-by: Emmanuel Dreyfus 
Reviewed-on: http://review.gluster.com/3377
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri