glusterfs.git/xlators/cluster/afr/src/afr.c, branch v5.11

afr: thin-arbiter 2 domain locking and in-memory state

2018-12-12T14:26:42+00:00

2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.

- All further write txn failures are handled based on this in-memory
value without querying the TA.

- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.

- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.

- Any write txns got after the release request are maintained in a ta_waitq.

- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.

- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.

Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1648205
Signed-off-by: Ravishankar N 
Signed-off-by: Ashish Pandey

cluster/afr: Use 2 domain locking in SHD for thin-arbiter

2018-11-29T15:33:37+00:00

With this change when SHD starts the index crawl it requests
all the clients to release the AFR_TA_DOM_NOTIFY lock so that
clients will know the in memory state is no more valid and
any new operations needs to query the thin-arbiter if required.

When SHD completes healing all the files without any failure, it
will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on
TA to see whether there are any new failures happened by that time.
If there are new failures marked on TA, SHD will start the crawl
immediately to heal those failures as well. If there are no new
failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets
the xattrs on TA, so that both the data bricks will be considered
as good there after.

>Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
>fixes: bz#1579788
>Signed-off-by: karthik-us 
(cherry picked from commit 5784a00f997212d34bd52b2303e20c097240d91c)

Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
fixes: bz#1648205

Land part 2 of clang-format changes

2018-09-12T12:22:45+00:00

Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu

multiple files: calloc -> malloc

2018-09-04T05:09:09+00:00

xlators/cluster/stripe/src/stripe-helpers.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible

xlators/cluster/dht/src/tier.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-layout.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-helper.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-common.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr-inode-read.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/bugs/replicate/bug-1250170-fsync.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/gfapi/gfapi-async-calls-test.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/ec/ec-fast-fgetxattr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/xdr/src/glusterfs3.h: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-transport/socket/src/socket.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-lib/src/rpc-clnt.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
extras/geo-rep/gsync-sync-gfid.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-xml-output.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-rpc-ops.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-volume.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-system.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-snapshot.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-peer.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-global.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible

It doesn't make sense to calloc (allocate and clear) memory
when the code right away fills that memory with data.
It may be optimized by the compiler, or have a microscopic
performance improvement.

In some cases, also changed allocation size to be sizeof some
struct or type instead of a pointer - easier to read.
In some cases, removed redundant strlen() calls by saving the result
into a variable.

1. Only done for the straightforward cases. There's room for improvement.
2. Please review carefully, especially for string allocation, with the
terminating NULL string.

Only compile-tested!

updates: bz#1193929
Original-Author: Yaniv Kaul 
Signed-off-by: Yaniv Kaul 
Signed-off-by: Amar Tumballi 

Change-Id: I16274dca4078a1d06ae09a0daf027d734b631ac2

afr: common thin-arbiter functions

2018-08-23T06:37:27+00:00

...that can be used by client and self-heal daemon, namely:

afr_ta_post_op_lock()
afr_ta_post_op_unlock()

Note: These are not yet consumed. They will be used in the write txn
changes patch which will introduce 2 domain locking.

updates: bz#1579788
Change-Id: I636d50f8fde00736665060e8f9ee4510d5f38795
Signed-off-by: Ravishankar N

All: run codespell on the code and fix issues.

2018-07-22T14:40:16+00:00

Please review, it's not always just the comments that were fixed.
I've had to revert of course all calls to creat() that were changed
to create() ...

Only compile-tested!

Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5
updates: bz#1193929
Signed-off-by: Yaniv Kaul

afr: Add lease() fop

2018-05-05T11:53:39+00:00

Change-Id: Ied047dd5ee44e9d5a5d3db214826f7df30332ef9
updates: #350
BUG: 1319992
Signed-off-by: Poornima G 
Signed-off-by: Jiffin Tony Thottan

afr: initial changes for thin arbiter

2018-04-30T06:41:11+00:00

1. Create thin arbiter index file during mount.
2. Set pending marker in thin arbiter id file in case of failure.

Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82
updates: #352
Signed-off-by: Ravishankar N

cluster/afr: Keep child-up until ping-event

2018-04-25T05:47:48+00:00

Problem:
If we have 2 bricks, brick-A and brick-B with brick-A within halo-max-latency
and brick-B more than halo-max-latency. If we set both halo-min, halo-max replicas
as '1'. In this case, brick-A comes online and then ping-latency will be updated for it.
When brick-B comes online, we have 2 up-bricks, so the code tries to find the brick with
worst latency to mark it down. Since Brick-B just came online it always had '0' latency
so brick-B used to be marked offline and Brick-B would eventually be the one to be
online even when brick-A is more suited.

Fix:
Consider latency of just-up child as HALO_MAX_LATENCY so that worst-child until
ping-latency is found as the just-up brick. Also keep ping-latency as -1 until
child-up during initialization.

BUG: 1567881
fixes bz#1567881
Change-Id: I148262fe505468190f0eb99225d0f6d57cdb6f04
Signed-off-by: Pranith Kumar K

cluster/afr: Need heal-timeout to be configured as low as 5 seconds

2018-04-20T05:41:20+00:00

In Halo replication, there are pending heals more often than not.
It makes sense to give users the capability to configure it as low
as 5 seconds.

BUG: 1569489
fixes bz#1569489
Change-Id: I451c1975827f66398b903f659c981ef3121d5376
Signed-off-by: Pranith Kumar K