glusterfs.git/xlators/mgmt/glusterd/src/glusterd-syncop.c, branch v3.7.0beta1

glusterd: Remove direct references to peerinfo in frame cookies

2015-04-28T12:49:05+00:00

RCU protection requires that we don't have  direct references to
protected data structures outside read-critical sections

This change was developed on the git branch at [1]. This commit is a
combination of the following commits on the development branch.
  82ebfdd Remove direct references to peerinfo in frame cookies
  dec4bec Remove incorrect and unneeded code from
          gd_syncop_mgmt_v3_unlock_cbk_fn
  7aced7b Use stack allocated uuid for frame cookie.
  38e4124 Address comments from 10192/2

[1]: https://github.com/kshlm/glusterfs/tree/urcu

Change-Id: Ic50e5fca0be72af5090f4cf318efa55d29075de9
BUG: 1205186
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/10399
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Krishnan Parthasarathi

glusterd: Replace transaction peers lists

2015-04-13T06:30:02+00:00

Transaction peer lists were used in GlusterD to peers belonging to a
transaction. This was needed to prevent newly added peers performing
partial transactions, which could be incorrect.

This was accomplished by creating a seperate transaction peers list at
the beginning of every transaction. A transaction peers list referenced
the peerinfo data structures of the peers which were present at the
beginning of the transaction. RCU protection of peerinfos referenced by
the transaction peers list is a hard problem and difficult to do
correctly.

To have proper RCU protection of peerinfos, the transaction peers lists
have been replaced by an alternative method to identify peers that
belong to a transaction. The alternative method is to the global peers
list along with generation numbers to identify peers that should belong
to a transaction.

This change introduces a global peer list generation number, and a
generation number for each peerinfo object. Whenever a peerinfo object
is created, the global generation number is bumped, and the peerinfos
generation number is set to the bumped global generation.

With the above changes, the algorithm to identify peers belonging to a
transaction with RCU protection is as follows,
- At the beginning of a transaction, the current global generation
  number is saved
- To identify if a peers belonging to the transaction,
  - Start a RCU read critical section
  - For each peer in the global peers list,
    - If the peers generation number is not greater than the saved
      generation number, continue with the action on the peer
  - End the RCU read critical section

The above algorithm guarantees that,
- The peer list is not modified when a transaction is iterating through
  it
- The transaction actions are only done on peers that were present when
  the transaction started

But, as a transaction could iterate over the peers list multiple times,
the algorithm cannot guarantee that same set of peers will be selected
every time. A peer could get deleted between two iterations of the list
within a transaction. This problem existed with transaction peers list
as well, but unlike before now it will not lead to invalid memory access
and potential crashes. This problem will be addressed seprately.

This change was developed on the git branch at [1]. This commit is a
combination of the following commits on the development branch.
  52ded5b Add timespec_cmp
  44aedd8 Add create timestamp to peerinfo
  7bcbea5 Fix some silly mistakes
  13e3241 Add start time to opinfo
  17a6727 Use timestamp comparisions to identify xaction peers instead
          of a xaction peer list
  3be05b6 Correct check for peerinfo age
  70d5b58 Use read-critical sections for peer list iteration
  ba4dbca Use peerinfo timestamp checks in op-sm instead of xaction peer
          list
  d63f811 Add more peer status checks when iterating peers list in
          glusterd-syncop
  1998a2a Timestamp based peer list traversal of mgmtv3 xactions
  f3c1a42 Remove transaction peer lists
  b8b08ee Remove unused labels
  32e5f5b Remove 'npeers' usage
  a075fb7 Remove 'npeers' from mgmt-v3 framework
  12c9df2 Use generation number instead of timestamps.
  9723021 Remove timespec_cmp
  80ae2c6 Remove timespec.h include
  a9479b0 Address review comments on 10147/4

[1]: https://github.com/kshlm/glusterfs/tree/urcu

Change-Id: I9be1033525c0a89276f5b5d83dc2eb061918b97f
BUG: 1205186
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/10147
Tested-by: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Anand Nekkunti 
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Krishnan Parthasarathi

Avoid conflict between contrib/uuid and system uuid

2015-04-04T17:48:35+00:00

glusterfs relies on Linux uuid implementation, which
API is incompatible with most other systems's uuid. As
a result, libglusterfs has to embed contrib/uuid,
which is the Linux implementation, on non Linux systems.
This implementation is incompatible with systtem's
built in, but the symbols have the same names.

Usually this is not a problem because when we link
with -lglusterfs, libc's symbols are trumped. However
there is a problem when a program not linked with
-lglusterfs will dlopen() glusterfs component. In
such a case, libc's uuid implementation is already
loaded in the calling program, and it will be used
instead of libglusterfs's implementation, causing
crashes.

A possible workaround is to use pre-load libglusterfs
in the calling program (using LD_PRELOAD on NetBSD for
instance), but such a mechanism is not portable, nor
is it flexible. A much better approach is to rename
libglusterfs's uuid_* functions to gf_uuid_* to avoid
any possible conflict. This is what this change attempts.

BUG: 1206587
Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a
Signed-off-by: Emmanuel Dreyfus 
Reviewed-on: http://review.gluster.org/10017
Tested-by: Gluster Build System 
Reviewed-by: Niels de Vos

glusterd: group server-quorum related code together

2015-04-01T13:07:14+00:00

Server-quorum implementation was spread in many files.  This patch
brings them all together into a single file, namely
glusterd-server-quorum.c. All exported functions are available via
glusterd-server-quorum.h

Change-Id: I8fd77114b5bc6b05127cb8a6a641e0295f0be7bb
BUG: 1205592
Signed-off-by: Krishnan Parthasarathi 
Reviewed-on: http://review.gluster.org/9492
Reviewed-by: Atin Mukherjee 
Tested-by: Gluster Build System 
Reviewed-by: Kaushal M

glusterd: Maintain local xaction_peer list for op-sm

2015-03-26T07:10:45+00:00

http://review.gluster.org/9269 addresses maintaining local xaction_peers in
syncop and mgmt_v3 framework. This patch is to maintain local xaction_peers list
for op-sm framework as well.

Change-Id: Idd8484463fed196b3b18c2df7f550a3302c6e138
BUG: 1204727
Signed-off-by: Atin Mukherjee 
Reviewed-on: http://review.gluster.org/9972
Reviewed-by: Anand Nekkunti 
Tested-by: Gluster Build System 
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Krishnan Parthasarathi

glusterd: Do not use global opinfo in syncop

2015-03-24T06:22:40+00:00

Global opinfo should not be referred by syncop framework as it uses local
txn_opinfo for every transaction. There is one place in the codebase where the
global opinfo is set with the local txn_opinfo which can lead to an incorrect
opinfo for an on-going op-sm transaction which refers to the same global opinfo.

Change-Id: Ida63a8871b8d03fe646146eddfd3f2473f1b1d7c
BUG: 1202745
Signed-off-by: Atin Mukherjee 
Reviewed-on: http://review.gluster.org/9908
Tested-by: Gluster Build System 
Reviewed-by: Anand Nekkunti 
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Krishnan Parthasarathi

features/quota : Introducing inode quota

2015-03-19T01:24:12+00:00

==========================================================================
                             Inode quota
==========================================================================
= Currently, the only way to retrieve the number of files/objects in a   =
= directory or volume is to do a crawl of the entire directory/volume.   =
= This is expensive and is not scalable.                                 =
=                                                                        =
= The proposed mechanism will provide an easier alternative to determine =
= the count of files/objects in a directory or volume.                   =
=                                                                        =
= The new mechanism proposes to store count of objects/files as part of  =
= an extended attribute of a directory. Each directory's extended        =
= attribute value will indicate the number of files/objects present      =
= in a tree with the directory being considered as the root of the tree. =
=                                                                        =
= The count value can be accessed by performing a getxattr().            =
= Cluster translators like afr, dht and stripe will perform aggregation  =
= of count values from various bricks when getxattr() happens on the key =
= associated with file/object count.                                     =

A new interface is introduced:
------------------------------
        limit-objects  : limit the number of inodes at directory level
        list-objects   : list the directories where the limit is set
        remove-objects : remove the limit from the directory

==========================================================================

CLI COMMAND:
gluster volume quota  limit-objects   []

*  is a hard-limit for number of objects limitation for path ""
  If hard-limit is exceeded, creation of file/directory is no longer
permitted.

*  is a soft-limit for number of objects creation for path ""
  If soft-limit is exceeded, a warning is issued for each creation.

CLI COMMAND:
gluster volume quota  remove-objects [path]

==========================================================================

CLI COMMAND:
gluster volume quota  list-objects [path] ...

Sample output:
------------------
  Path                   Hard-limit Soft-limit   Used  Available
Soft-limit exceeded?
Hard-limit exceeded?
  ------------------------------------------------------------------------
--------------------------------------
  /dir                      10       80%          10       0
Yes
        Yes

==========================================================================

[root@snapshot-28 dir]# ls
a  b  file11  file12  file13  file14  file15  file16  file17
[root@snapshot-28 dir]# touch a1
touch: cannot touch `a1': Disk quota exceeded
* Nine files are created in directory "dir" and directory is included in
* the
count too. Hence the limit "10" is reached and further file creation
fails

==========================================================================

Note: We have also done some re-factoring in cli for volume name
validation. New function cli_validate_volname is created

==========================================================================

Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b
BUG: 1190108
Signed-off-by: Sachin Pandit 
Signed-off-by: vmallika 
Reviewed-on: http://review.gluster.org/9769
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

glusterd: Protect the peer list and peerinfos with RCU.

2015-03-16T09:19:14+00:00

The peer list and the peerinfo objects are now protected using RCU.
Design patterns described in the Paul McKenney's RCU dissertation [1]
(sections 5 and 6) have been used to convert existing non-RCU protected
code to RCU protected code.

Currently, we are only targetting guaranteeing the existence of the
peerinfo objects, ie., we are only looking to protect deletes, not all
updaters. We chose this, as protecting all updates is a much more
complex task.

The steps used to accomplish this are,

1. Remove all long lived direct references to peerinfo objects (apart
from the peerinfo list). This includes references in glusterd_peerctx_t
(RPC), glusterd_friend_sm_event_t (friend state machine) and others.
This way no one has a reference to deleted peerinfo object.

2. Replace the direct references with indirect references, ie., use
peer uuid and peer hostname as indirect references to the peerinfo
object. Any reader or updater now uses the indirect references to get to
the actual peerinfo object, using glusterd_peerinfo_find. Cases where a
peerinfo cannot be found are handled gracefully.

3. The readers get and use the peerinfo object only within a RCU read
critical section. This prevents the object from being deleted/freed when
in actual use.

4. The deletion of a peerinfo object is done in a ordered manner
(glusterd_peerinfo_destroy). The object is first removed from the
peerinfo list using an atomic list remove, but the list head is not
reset to allow existing list readers to complete correctly. We wait for
readers to complete, before resetting the list head. This removes the
object from the list completely. After this no new readers can get a
reference to the object, and it can be freed.

This change was developed on the git branch at [2]. This commit is a
combination of the following commits on the development branch.
  d7999b9 Protect the glusterd_conf_t->peers_list with RCU.
  0da85c4 Synchronize before INITing peerinfo list head after removing
          from list.
  32ec28a Add missing rcu_read_unlock
  8fed0b8 Correctly exit read critical section once peer is found.
  63db857 Free peerctx only on rpc destruction
  56eff26 Cleanup style issues
  e5f38b0 Indirection for events and friend_sm
  3c84ac4 In __glusterd_probe_cbk goto unlock only if peer already
          exists
  141d855 Address review comments on 9695/1
  aaeefed Protection during peer updates
  6eda33d Revert "Synchronize before INITing peerinfo list head after
          removing from list."
  f69db96 Remove unneeded line
  b43d2ec Address review comments on 9695/4
  7781921 Address review comments on 9695/5
  eb6467b Add some missing semi-colons
  328a47f Remove synchronize_rcu from
          glusterd_friend_sm_transition_state
  186e429 Run part of glusterd_friend_remove in critical section
  55c0a2e Fix gluster (peer status/ pool list) with no peers
  93f8dcf Use call_rcu to free peerinfo
  c36178c Introduce composite struct, gd_rcu_head

[1]: http://www.rdrop.com/~paulmck/RCU/RCUdissertation.2004.07.14e1.pdf
[2]: https://github.com/kshlm/glusterfs/tree/urcu

Change-Id: Ic1480e59c86d41d25a6a3d159aa3e11fbb3cbc7b
BUG: 1191030
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/9695
Tested-by: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Anand Nekkunti 
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Krishnan Parthasarathi

mgmt/glusterd: Changes required for disperse volume heal commands

2015-03-10T09:22:43+00:00

- Include xattrop64-watchlist for index xlator for disperse volumes.
- Change the functions that exist to consider disperse volumes also
  for sending commands to disperse xls in self-heal-daemon.

Change-Id: Iae75a5d3dd5642454a2ebf5840feba35780d8adb
BUG: 1177601
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/9793
Tested-by: Gluster Build System 
Reviewed-by: Kaushal M

glusterd: Replace libglusterfs lists with liburcu lists

2015-03-04T07:50:22+00:00

This patch replaces usage of the libglusterfs lists data structures and
API in glusterd with the lists data structures and API from liburcu. The
liburcu data structes and APIs are a drop-in replacement for
libglusterfs lists.

All usages have been changed to keep the code consistent, and free from
confusion.

NOTE: glusterd_conf_t->xprt_list still uses the libglusterfs data
structures and API, as it holds rpc_transport_t objects, which is not a
part of glusterd and is not being changed in this patch.

This change was developed on the git branch at [1]. This commit is a
combination of the following commits on the development branch.
  6dac576 Replace libglusterfs lists with liburcu lists
  a51b5ab Fix compilation issues
  d98a06f Fix merge issues
  a5d918e Remove merge remnant
  1cca113 More style cleanup
  1917be3 Address review comments on 9624/1
  8d10f13 Use cds_lists for glusterd_svc_t
  524ad5d Add rculist header in glusterd-conn-helper.c
  646f294 glusterd: add list_add_order API honouring rcu

[1]: https://github.com/kshlm/glusterfs/tree/urcu

Change-Id: Ic613c5b6e496a677b9d3de15fc042a0492109fb0
BUG: 1191030
Signed-off-by: Kaushal M 
Signed-off-by: Krishnan Parthasarathi 
Reviewed-on: http://review.gluster.org/9624
Tested-by: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Gaurav Kumar Garg 
Reviewed-by: Anand Nekkunti