glusterfs.git/tests/basic, branch v3.10.0rc0

libglusterfs: make memory pools more thread-friendly

2017-02-03T00:44:09+00:00

Early multiplexing tests revealed *massive* contention on certain
pools' global locks - especially for dictionaries and secondarily for
call stubs.  For the thread counts that multiplexing can create, a
more lock-free solution is clearly needed.  Also, the current mem-pool
implementation does a poor job releasing memory back to the system,
artificially inflating memory usage to match whatever the worst case
was since the process started.  This is bad in general, but especially
so for multiplexing where there are more pools and a major point of
the whole exercise is to reduce memory consumption.

The basic ideas for the new design are these

  There is one pool, globally, for each power-of-two size range.
  Every attempt to create a new pool within this range will instead
  add a reference to the existing pool.

  Instead of adding pools for each translator within each multiplexed
  brick (potentially infinite and quite possibly thousands), we
  allocate one set of size-based pools per *thread* (hundreds at
  worst).

  Each per-thread pool is divided into hot and cold lists.  Every
  allocation first attempts to use the hot list, then the cold list.
  When objects are freed, they always go on the hot list.

  There is one global "pool sweeper" thread, which periodically
  reclaims everything in each pool's cold list and then "demotes" the
  current hot list to be the new cold list.

  For normal allocation activity, only a per-thread lock need be
  taken, and even that only to guard against very rare contention from
  the pool sweeper.  When threads start and stop, a global lock must
  be taken to add them to the pool sweeper's list.  Lock contention is
  therefore extremely low, and the hot/cold lists also provide good
  locality.

A more complete explanation (of a similar earlier design) can be found
here:

 http://www.gluster.org/pipermail/gluster-devel/2016-October/051160.html

Backport of:
> Change-Id: I5bc8a1ba57cfb553998f979a498886e0d006e665
> BUG: 1385758
> Reviewed-on: https://review.gluster.org/15645

BUG: 1418091
Change-Id: Id09bbea41f65fcd245822607bc204f3a34904dc2
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16531
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

core: run many bricks within one glusterfsd process

2017-02-02T00:54:58+00:00

This patch adds support for multiple brick translator stacks running in
a single brick server process.  This reduces our per-brick memory usage
by approximately 3x, and our appetite for TCP ports even more.  It also
creates potential to avoid process/thread thrashing, and to improve QoS
by scheduling more carefully across the bricks, but realizing that
potential will require further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global
option.  By default it's off, and bricks are started in separate
processes as before.  If multiplexing is enabled, then *compatible*
bricks (mostly those with the same transport options) will be started in
the same process.

Backport of:
> Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
> BUG: 1385758
> Reviewed-on: https://review.gluster.org/14763

Change-Id: I4bce9080f6c93d50171823298fdf920258317ee8
BUG: 1418091
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16496
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

md-cache: Cache security.ima xattrs

2017-01-30T14:25:03+00:00

Backport of http://review.gluster.org/16296

From kernel version 3.X or greater, creating of a file
results in removexattr call on security.ima xattr. But
this xattr is not set on the file unless IMA feature
is active. With this patch, removxattr call returns
ENODATA if it is not found in the cache.

> Change-Id: I8136096598a983aebc09901945eba1db1b2f93c9
> Signed-off-by: Poornima G 
> Reviewed-on: http://review.gluster.org/16296
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 
> (cherry picked from commit ac629e574935a8aed6526936bc83b1c6d295ae67)

Change-Id: I27abc23024c8fcf07389608df61ef6e64736d414
BUG: 1415918
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16460
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

core: remove experimental xlators and associated tests

2017-01-24T13:25:58+00:00

experimental xlators not included in 3.10

Change-Id: I547480ee5e7912664784643e436feb198b6d16d0
BUG: 1415866
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/16447
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Shyamsundar Ranganathan

tier : Tier as a service

2017-01-17T04:49:47+00:00

tierd is implemented by separating from rebalance process.

The commands affected:

1) Attach tier will trigger this process instead of old one
2) tier start and tier start force will also trigger this process.
3) volume status [tier] will show tier daemon as a process instead
of task and normal tier status and tier detach status works.
4) tier stop implemented.
5) detach tier implemented separately along with new detach tier
status
6) volume tier volname status will work using the changes.
7) volume set works

This patch has separated the tier translator from the legacy
DHT rebalance code. It now sends the RPCs from the CLI
to glusterd separate to the DHT rebalance code.
The daemon is now a service, similar to the snapshot daemon,
and can be viewed using the volume status command.

The code for the validation and commit phase are the same
as the earlier tier validation code in DHT rebalance.

The “brickop” phase has been changed so that the status
command can use this framework.

The service management framework is now used.
DHT rebalance does not use this framework.

This service framework takes care of :

*) spawning the daemon, killing it and other such processes.
*) volume set options , which are written on the volfile.
*) restart and reconfigure functions. Restart is to restart
the daemon at two points
        1)after gluster goes down and comes up.
        2) to stop detach tier.
*) reconfigure is used to make immediate volfile changes.
By doing this, we don’t restart the daemon.
it has the code to rewrite the volfile for topological
changes too (which comes into place during add and remove brick).

With this patch the log, pid, and volfile are separated
and put into respective directories.

Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399
BUG: 1313838
Signed-off-by: hari gowtham 
Reviewed-on: http://review.gluster.org/13365
Smoke: Gluster Build System 
Tested-by: hari gowtham 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Dan Lambright 
Reviewed-by: Atin Mukherjee

readdir-ahead : Perform STACK_UNWIND outside of mutex locks

2017-01-10T04:52:28+00:00

Currently STACK_UNWIND is performnd within ctx->lock.
If readdir-ahead is loaded as a child of dht, then there
can be scenarios where the function calling STACK_UNWIND
becomes re-entrant. Its a good practice to not call
STACK_WIND/UNWIND within local mutex's

Change-Id: If4e869849d99ce233014a8aad7c4d5eef8dc2e98
BUG: 1401812
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/16068
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Raghavendra G

tests: Remove rpm test

2017-01-06T18:03:28+00:00

This is already tested by our smoke test. Testing it again is redundant.

Change-Id: Icae4e8650002cd847dcb7ea76fd0447df7e72816
BUG: 1408755
Signed-off-by: Nigel Babu 
Reviewed-on: http://review.gluster.org/16287
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Anoop C S 
Reviewed-by: Raghavendra Talur 
Reviewed-by: Jeff Darcy

glusterd: Fail add-brick on replica count change, if brick is down

2017-01-06T09:19:38+00:00

Problem:
1. Have a replica 2 volume with bricks b1 and b2
2. Before setting the layout, b1 goes down
3. Set the layout write some data, which gets populated on b2
4. b2 goes down then b1 comes up
5. Add another brick b3, and heal will take place from b1 to b3, which
   basically have no data
6. Write some data. Both b1 and b3 will mark b2 for pending writes
7. b1 goes down, and b2 comes up
8. b2 gets heald from b1. During heal it removes the data which is already
   in b2, considering that as stale data. This leads to data loss.

Solution:
1. In glusterd stage-op, while adding bricks, check whether the replica
   count is being increased
2. If yes, then check whether any of the bricks are down at that time
3. If yes, then fail the add-brick to avoid such data loss
4. Else continue the normal operation.

This check will work enen when we convert plain distribute volume to replicate

Test:
1. Create a replica 2 volume
2. Kill one brick from the volume
3. Try adding a brick to the volume
4. It should fail with all bricks are not up error
5. Cretae a distribute volume and kill one of the brick
6. Try to convert it to replicate volume, by adding bricks.
7. This should also fail.

Change-Id: I9c8d2ab104263e4206814c94c19212ab914ed07c
BUG: 1406411
Signed-off-by: karthik-us 
Reviewed-on: http://review.gluster.org/16330
Tested-by: Ravishankar N 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Atin Mukherjee

tests: Fix split-brain-favorite-child-policy.t failures

2017-01-03T06:13:51+00:00

Problem:
In CentOS-7, the file was receving an extra removexattr(security.ima)
FOP which changed its ctime, breaking the assumption that a particular brick
had the latest ctime based on the writevs done in the .t

Fix:
1. Compare the ctime of both files in the backend and pick the one with
the latest ctime for the fav-child policy. Also unmount the volume
before comparing, to avoid any further FOPS on the file that
can possibly modify the timestamps.

2. Added floating point handling in stat function. Thanks to Pranith for
the helping debugging the regex.

Change-Id: I06041a0f39a29d2593b867af8685d65c7cd99150
BUG: 1408757
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/16288
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

Repair EC tests for FB environment (/mnt/gvfs is teh sux00r)

2016-12-30T05:47:13+00:00

Summary:

Don't blindly 'df', target the volume in question.

(this is because FB build environment has a /mnt/gvfs which
fails df).

Test Plan:

runtests.sh

Reviewers:

Subscribers:

Tasks:

Blame Revision:

Change-Id: Ic2c5883dd102835db64be9594657257e20711ba0
BUG: 1406878
Signed-off-by: Kevin Vigor 
Reviewed-on-3.8-fb: http://review.gluster.org/16182
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Tested-by: Shreyas Siravara 
Reviewed-by: Shreyas Siravara 
Reviewed-on: http://review.gluster.org/16226
Reviewed-by: Kaleb KEITHLEY