summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src
Commit message (Collapse)AuthorAgeFilesLines
...
* snapshot:cleanup snaps during unprobeMohammed Rafi KC2015-08-264-22/+104
| | | | | | | | | | | | | | | When doing an unprobe, the volume that doesnot contain any brick of the particular node will be deleted. So the snaps associated with that volume should also delete Change-Id: I9f3d23bd11b254ebf7d7722cc1e12455d6b024ff BUG: 1203185 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/9930 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd: Don't allow remove brick start/commit if glusterd is down of the ↵Atin Mukherjee2015-08-262-31/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | host of the brick remove brick stage blindly starts the remove brick operation even if the glusterd instance of the node hosting the brick is down. Operationally its incorrect and this could result into a inconsistent rebalance status across all the nodes as the originator of this command will always have the rebalance status to 'DEFRAG_NOT_STARTED', however when the glusterd instance on the other nodes comes up, will trigger rebalance and make the status to completed once the rebalance is finished. This patch fixes two things: 1. Add a validation in remove brick to check whether all the peers hosting the bricks to be removed are up. 2. Don't copy volinfo->rebal.dict from stale volinfo during restore as this might end up in a incosistent node_state.info file resulting into volume status command failure. Change-Id: Ia4a76865c05037d49eec5e3bbfaf68c1567f1f81 BUG: 1245045 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/11726 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd : Display status of Self Heal Daemon for disperse volumeAshish Pandey2015-08-251-16/+22
| | | | | | | | | | | | | | | | | Problem : Status of Self Heal Daemon is not displayed in "gluster volume status" As disperse volumes are self heal compatible, show the status of self heal daemon in gluster volume status command Change-Id: I83d3e6a2fd122b171f15cfd76ce8e6b6e00f92e2 BUG: 1217311 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10764 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: stop all the daemons services on peer detachGaurav Kumar Garg2015-08-242-15/+41
| | | | | | | | | | | | | | | Currently glusterd is not stopping all the deamon service on peer detach With this fix it will do peer detach cleanup properlly and will stop all the daemon which was running before peer detach on the node. Change-Id: Ifed403ed09187e84f2a60bf63135156ad1f15775 BUG: 1255386 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/11509 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* snapshot: Log deletion of snapshot, during auto-deleteAvra Sengupta2015-08-232-2/+8
| | | | | | | | | | | | | | | | | When auto-delete is enabled, and soft-limit is reached, on creation of a snapshot, the oldest snapshot for that volume is deleted. Displaying a warning log before deleting the oldest snapshot. Change-Id: I75f0366935966a223b63a4ec5ac13f9fe36c0e82 BUG: 1255310 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/11963 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* tiering/glusterd: start tier daemon during volume startMohammed Rafi KC2015-08-173-1/+76
| | | | | | | | | | | | | | | | | | | Tier daemon should always run with tier volume. If volume is stopped and started again, we manually need to start the tier-daemon, instead this patch will automatically trigger tier process along with volume start. A snapshot restored volume will not have node_state_info, so we need to create and store it dynamically Change-Id: I659387c914bec7a1b6929ee5cb61f7b406402075 BUG: 1238593 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11525 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* rpc: add owner xlator argument to rpc_clnt_newKrishnan Parthasarathi2015-08-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The @owner argument tells RPC layer the xlator that owns the connection and to which xlator THIS needs be set during network notifications like CONNECT and DISCONNECT. Code paths that originate from the head of a (volume) graph and use STACK_WIND ensure that the RPC local endpoint has the right xlator saved in the frame of the call (callback pair). This guarantees that the callback is executed in the right xlator context. The client handshake process which includes fetching of brick ports from glusterd, setting lk-version on the brick for the session, don't have the correct xlator set in their frames. The problem lies with RPC notifications. It doesn't have the provision to set THIS with the xlator that is registered with the corresponding RPC programs. e.g, RPC_CLNT_CONNECT event received by protocol/client doesn't have THIS set to its xlator. This implies, call(-callbacks) originating from this thread don't have the right xlator set too. The fix would be to save the xlator registered with the RPC connection during rpc_clnt_new. e.g, protocol/client's xlator would be saved with the RPC connection that it 'owns'. RPC notifications such as CONNECT, DISCONNECT, etc inherit THIS from the RPC connection's xlator. Change-Id: I9dea2c35378c511d800ef58f7fa2ea5552f2c409 BUG: 1235582 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/11436 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: log improvement in glusterd_peer_rpc_notifyAtin Mukherjee2015-08-111-4/+9
| | | | | | | | | | | | | | | If ping time out is enabled glusterd can receive a disconnect event from a peer which has been already deleted resulting into a critical log printed. This patch ensures that critical message is logged only when its a connect event. Change-Id: I67d9aa3f60195e08af7dfc8a42683422aaf90a00 BUG: 1212437 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/10272 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd/rebalance: trusted rebalance volfileN Balachandran2015-08-111-9/+11
| | | | | | | | | | | | | | | | | | Creating the client volfiles with GF_CLIENT_OTHER overwrites the trusted rebalance volfile and causes rebalance to fail if auth.allow is set. Now, we always set the value of trusted-client to GF_CLIENT_TRUSTED for rebalance volfiles. Change-Id: I95eb510256d18dfa9048f96a1aeb71cca4811811 BUG: 1248415 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/11819 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Set nfs.disable to "on" when global NFS-Ganesha key is enabledMeghana M2015-08-101-0/+24
| | | | | | | | | | | | | | | | | | "nfs.disable" gets set to "on" for all the existing volumes, when the command "gluster nfs-ganesha enable" is executed. When a new volume is created,it gets exported via Gluster-NFS on the nodes outside the NFS-Ganesha. To fix this, the "nfs.disable" key is set to "on" before starting the volume, whenever the global option is set to "enable". Change-Id: I7ce58928c36eadb8c122cded5bdcea271a0a4ffa BUG: 1251857 Signed-off-by: Meghana M <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/11871 Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* quota : volume-reset shouldn't remove quota-deem-statfsManikandan Selvaganesh2015-08-074-2/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Volume-reset shouldn't remove quota-deem-statfs, unless explicitly specified, when quota is enabled. 1) glusterd_op_stage_reset_volume () 'gluster volume set/reset <VOLNAME>' features.quota/ features.inode-quota' should not be allowed as it is deprecated. Setting and resetting quota/inode-quota features should be allowed only through 'gluster volume quota <VOLNAME> enable/disable'. 2) glusterd_enable_default_options () Option 'features.quota-deem-statfs' should not be turned off with 'gluster volume reset <VOLNAME>', since quota features can be set/reset only with 'gluster volume quota <VOLNAME> enable/disable'. But, 'gluster volume set features.quota-deem-statfs' can be turned on/off when quota is enabled. Change-Id: Ib5aa00a4d8c82819c08dfc23e2a86f43ebc436c4 BUG: 1250582 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/11839 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: Stop/restart/notify to daemons(svcs) during reset/set on a volumeanand2015-08-0614-171/+420
| | | | | | | | | | | | | | | | | | | | problem : Reset/set commands were not working properly. reset command returns success but it not sending notification to svcs if corresponding graph modified. Fix: Whenever reset/set command issued, generate the temp graph and compare with original graph and do the fallowing actions 1.) If both graph are identical nothing to do with svcs. 2.) If any changes in graph topology restart/stop service by calling svc manager. 3) If changes in options send notify signal by calling glusterd_fetchspec_notify. Change-Id: I852c4602eafed1ae6e6a02424814fe3a83e3d4c7 BUG: 1209329 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/10850 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* SSL improvements: ECDH, DH, CRL, and accessible optionsEmmanuel Dreyfus2015-08-053-77/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Introduce ssl.dh-param option to specify a file containinf DH parameters. If it is provided, EDH ciphers are available. - Introduce ssl.ec-curve option to specify an elliptic curve name. If unspecified, ECDH ciphers are available using the prime256v1 curve. - Introduce ssl.crl-path option to specify the directory where the CRL hash file can be found. Setting to NULL disable CRL checking, just like the default. - Make all ssl.* options accessible through gluster volume set. - In default cipher list, exclude weak ciphers instead of listing the strong ones. - Enforce server cipher preference. - introduce RPC_SET_OPT macro to factor repetitive code in glusterd-volgen.c - Add ssl-ciphers.t test to check all the features touched by this change. Change-Id: I7bfd433df6bbf176f4a58e770e06bcdbe22a101a BUG: 1247152 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/11735 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: fix op-version bump up flowAtin Mukherjee2015-08-031-10/+16
| | | | | | | | | | | | | | | | | | | | | If a cluster is upgraded from 3.5 to latest version, gluster volume set all cluster.op-version <VERSION> will throw an error message back to the user saying unlocking failed. This is because of trying to release a volume wise lock in unlock phase as the lock was taken cluster wide. The problem surfaced because the op-version is updated in commit phase and unlocking works in the v3 framework where it should have used cluster unlock. Fix is to decide which lock/unlock is to be followed before invoking lock phase Change-Id: Iefb271a058431fe336a493c24d240ed833f279c5 BUG: 1248298 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/11798 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: Do not log failure if glusterd_get_txn_opinfo fails in gluster ↵Atin Mukherjee2015-08-022-15/+15
| | | | | | | | | | | | | | | | | | | volume status The first RPC call of gluster volume status fetches the list of the volume names from GlusterD and during that time since no volume name is set in the dictionary gluserd_get_txn_opinfo fails resulting into a failure log which is annoying to the user considering this command is triggered frequently. Fix is to have callers log it depending on the need Change-Id: Ib60a56725208182175513c505c61bcb28148b2d0 BUG: 1238936 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/11520 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com>
* rebalance/glusterd: Refactor rebalance volfileMohammed Rafi KC2015-07-301-24/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | performance xlator loaded in rebalance xlators are dummy translators, since all fops are starting with dht level. Removing the performance xlators from rebalance volfile will help to minimize the chance for a graph switch. The new rebalance xlators will look like->>> (io-stats) || || || (----DHT----) // \\ // \\ // \\ (replica-1) ... (replica-n) // \\ // \\ // \\ // \\ // \\ // \\ client client client client Change-Id: I3808e3b48fd0cb3e60ef386b8ac9fd994e2831e3 BUG: 1240621 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11565 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tiering: Error message change for detach-tier non tier volumeHari Gowtham2015-07-291-5/+13
| | | | | | | | | | | Change-Id: Ib350b201df14b105e475426d2ec20ff5da39a8a1 BUG: 1245935 Signed-off-by: Hari Gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/11745 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* glusterd: getting txn_id from frame->cookie in op_sm call backanand2015-07-271-16/+61
| | | | | | | | | | | | | | | | | RCA: If rebalance start is triggered from one node and one of other nodes in the cluster goes down simultaneously we might end up in a case where callback will use the txn_id from priv->global_txn_id which is always zeros and this means injecting an event with an incorrect txn_id will result into op-sm getting stuck. fix: set txn_id in frame->cookie during sumbit_and_request, so that we can get txn_id in call back functions. Change-Id: I519176c259ea9d37897791a77a7c92eb96d10052 BUG: 1245142 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/11728 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: initialize the daemon services on demandAtin Mukherjee2015-07-2720-147/+190
| | | | | | | | | | | | | | | | | | | | | | As of now all the daemon services are initialized at glusterD init path. Since socket file path of per node daemon demands the uuid of the node, MY_UUID macro is invoked as part of the initialization. The above flow breaks the usecases where a gluster image is built following a template could be Dockerfile, Vagrantfile or any kind of virtualization environment. This means bringing instances of this image would have same UUIDs for the node resulting in peer probe failure. Solution is to lazily initialize the services on demand. Change-Id: If7caa533026c83e98c7c7678bded67085d0bbc1e BUG: 1238135 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/11488 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-245-3/+9
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* rpc,server,glusterd: Init transport list for accepted transportKaushal M2015-07-241-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | GlusterD or a brick would crash when encrypted transport was enabled and an unencrypted client tried to connect to them. The crash occured when GlusterD/server tried to remove the transport from their xprt_list due to a DISCONNECT event. But as the client transport's list head wasn't inited, the process would crash when list_del was performed. Initing the client transports list head during acceptence, prevents this crash. Also, an extra check has been added to the GlusterD and Server notification handlers for client DISCONNECT events. The handlers will now first check if the client transport is a member of any list. GlusterD and Server DISCONNECT event handlers could be called without the ACCEPT handler, which adds the transport to the list, being called. This situation also occurs when an unencrypted client tries to establish a connection with an encrypted server. Change-Id: Icc24a08d60e978aaa1d3322e0cbed680dcbda2b4 BUG: 1243774 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/11692 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Porting the left out gf_log_callingfns to new frameworkNandaja Varma2015-07-185-15/+52
| | | | | | | | | | Change-Id: I1b0ad54238895475ddbacc4fffacac8dc6e887fe BUG: 1235538 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/11590 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: rebalance support for cluster.rc frameworkanand2015-07-161-1/+20
| | | | | | | | | | | | | | | | | | | Issue:Rebalance is failing in cluster framework (any simulated cluster environment in same node ). RCA: 1. we are passing always "localhost" as volfile server for rebalance xlator . 2. Rebalance daemons are overwriting unix socket and log files each other. (All rebalance processes are creating socket with same name) . Fix: set vol_file_server, unix socket and log files properly. Change-Id: I6654461e00c2a164b2f1f1db24a316c4180dd8d5 BUG: 1231437 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/11210 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: use 2 epoll worker threads by defaultKrishnan Parthasarathi2015-07-162-0/+22
| | | | | | | | | | | | | | | The no. of epoll worker threads can be configured by adding the following option into glusterd.vol. option event-threads <NUM-OF-EPOLL_WORKERS> BUG: 1242421 Change-Id: I2a9e2d81c64beaf54872081f9ce45355cf4dfca7 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/11630 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Pass NULL in glusterd_svc_manager in glusterd_restart_bricksGaurav Kumar Garg2015-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | On restarting glusterd quota daemon is not started when more than one volumes are configured and quota is enabled only on 2nd volume. This is because of while restarting glusterd it will restart all the bricks. During brick restart it will start respective daemon by passing volinfo of first volume. Passing volinfo to glusterd_svc_manager will imply daemon managers will take action based on the same volume's configuration which is incorrect for per node daemons. Fix is to pass volinfo NULL while restarting bricks. Change-Id: I2602002a8ba7762fc1eb08123e79fbcf568ecab4 BUG: 1242875 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/11658 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: daemons start/stop should be logged in INFOAtin Mukherjee2015-07-143-3/+22
| | | | | | | | | | Change-Id: I81ceba4b7110140aec790659fcac90403c8e3869 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/11538 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* ec : Implement check for the cluster.heal-timeout valuesAshish Pandey2015-07-141-0/+6
| | | | | | | | | | | | | | | | | | | for disperse volume. Problem : User can set wrong values for cluster.heal-timeout using cli. Any value like string, negative numbers or 0 could be set. Solution : Check the value given by user. It should be numerical, non negative and within the range. Change-Id: I5184ef1a11bb2c225f42ac9ccb1ba680a86cfe09 BUG: 1239037 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11573 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* glusterd: Fix failure in replace-brick when src-brick is offlineAnuradha Talur2015-07-131-75/+0
| | | | | | | | | | Change-Id: I0fdb58e15da15c40c3fc9767f2fe4df0ea9d2350 BUG: 1242609 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11651 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: Fixing dereference after null checkarao2015-07-121-4/+4
| | | | | | | | | | | | | | | CID: 1124557 Checking for the pointer itself before dereferencing it to check for the variable's value it is pointed to. Change-Id: Idcbb034e4c6d58501697e01e90647b6233a5e5ba BUG: 789278 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/9661 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: Send friend update even for EVENT_RCVD_ACCKaushal M2015-07-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a multi-network cluster, a new peer being probed into the cluster will not get all the addresses of the peer that initiated the probe in some cases as it doesn't recieve friend updates from other peers in the cluster. This happens when the new peer establishes connection with the other peers before the other peers connect to the new peer. Assuming, F is the initiator peer, O is one of the other peers in the cluster and N is the new peer, the following series of events occur on O when N establishes the connection first. N is already in the BEFRIENDED state on O, and the actions happening will refer the BEFRIENDED state table. EVENT_RCV_FRIEND_REQ -> results in handle_friend_add_req being called which injects a LOCAL_ACC EVENT_RCVC_LOCAL_ACC -> results in send_friend_update being called which should have sent an update to N, but O has still not established a connection to N, so the update isn't sent EVENT_CONNECTED -> O now connects to N, this results in O sending a friend_add req to N EVENT_RCVD_ACC -> friend_add_cbk inject this event, but this event results in a NOOP when in BEFRIENDED As a result this O doesn't recieve all the addresses of F. If the cluster contains any volumes with bricks attached to the missing addresses of F and O is restarted in this condition, GlusterD will fail to start as it wouldn't be able to resolve those bricks. This commit changes the EVENT_RCVD_ACC action for the BEFRIENDED state from a NOOP to send_friend_update. This makes sure that the new peer recieves the updates from the other existing peers, irrespective of who establishes the connection first, thus solving the problem. Change-Id: Id807bc3032cf4cb13a5ba83819f2d50c96e76e96 BUG: 1241882 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/11625 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: use a real host name (instead of numeric) when we have oneJeff Darcy2015-07-101-0/+6
| | | | | | | | | | Change-Id: Ie9cc201204d3d613e3e585cab066a07283db902c BUG: 1241274 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11587 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* quota/marker: use smaller stacksize in synctask for marker updationvmallika2015-07-091-1/+1
| | | | | | | | | | | | | | | | | Default stacksize that synctask uses is 2M. For marker we set it to 16k Also move market xlator close to io-threads to have smaller stack Change-Id: I8730132a6365cc9e242a3564a1e615d94ef2c651 BUG: 1207735 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11499 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Fix management encryption issues with GlusterDKaushal M2015-07-092-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | Management encryption was enabled incorrectly in GlusterD leading to issues of cluster deadlocks. This has been fixed with this commit. The fix is in two parts, 1. Correctly enable encrytion for the TCP listener in GlusterD and re-enable own-threads for encrypted connections. Without this, GlusterD could try to esatblish the blocking SSL connects in the epoll thread, for eg. when handling friend updates, which could lead to cluster deadlocks. 2. Explicitly enable encryption for outgoing peer connections. Without enabling encryption explicitly for outgoing connections was causing SSL socket events to be handled in the epoll thread. Some events, like disconnects during peer detach, could lead to connection attempts to happen in the epoll thread, leading to deadlocks again. Change-Id: I438c2b43f7b1965c0e04d95c000144118d36272c BUG: 1240564 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/11559 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd/snapd: Stop snapd daemon when glusterd is restartingAvra Sengupta2015-07-081-0/+6
| | | | | | | | | | | | | Stop snapd daemon when glusterd is coming back, if uss is disabled, or volume is stopped. Change-Id: I4313ecaff19de30f3e9ea76881994509402ed5b0 BUG: 1240952 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/11575 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/geo-rep: Fix failure of geo-rep pauseKotresh HR2015-07-071-4/+17
| | | | | | | | | | | | | | | Geo-replication pause fails if one or more of the nodes in the master cluster is not part of master volume. If the master volume bricks are not part of the node, it should be ignored. The check is added to fix the issue. Change-Id: Iba57d66b6db6919f42a95dd66e6db9ad1b21503b BUG: 1240229 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11549 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: Correction in Error message for disperseAshish Pandey2015-07-061-2/+7
| | | | | | | | | | | | | | | | | | | volume create Problem: If all the bricks are on same server and creating "disperse" volume without using "force", it throws a failure message mentioning "replicate" as volume. Solution: Adding failure message for disperse volume too Change-Id: I9e466b1fe9dae8cf556903b1a2c4f0b270159841 BUG: 1232183 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11250 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared storage's ↵Avra Sengupta2015-07-063-16/+11
| | | | | | | | | | | | | | | | | | | | | | | brick The brick path we use to create shared storage is /var/run/gluster/ss_brick. The problem with using this brick path is /var/run/gluster is a tmpfs and all the brick/shared storage data will be wiped off when the node restarts. Hence using /var/lib/glusterd/ss_brick as the brick path for shared storage volume as this brick and the shared storage volume is internally created by us (albeit on user's request), and contains only internal state data and no user data. Change-Id: I808d1aa3e204a5d2022086d23bdbfdd44a2cfb1c BUG: 1218573 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/11533 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd: Get the local txn_info based on trans_id in op_sm call backs.anand2015-07-065-70/+30
| | | | | | | | | | | | | | | | | | | Issue: when two or more transactions are running concurrently in op_sm, global op_info might get corrupted. Fix: Get local txn_info based on trans_id instead of using global txn_info for commands (re-balance, profile ) which are using op_sm in originator. TODO: Handle errors properly in call backs and completely remove the global op_info from op_sm. Change-Id: I9d61388acc125841ddc77e2bd560cb7f17ae0a5a BUG: 1229139 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/11120 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* cluster/ec: Make background healing optional behaviorPranith Kumar K2015-07-061-0/+8
| | | | | | | | | | | Provide options to control number of active background heal count and qlen. Change-Id: Idc2419219d881f47e7d2e9bbc1dcdd999b372033 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11473 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tiering/quota: failed to match subvolumeMohammed Rafi KC2015-06-301-2/+7
| | | | | | | | | | | | | | | | | | quota daemon choose subvolume for a volume using volume-id specified in graph. For that it expect a subvolume (DHT) to be named as volume-id. But tiering translator comes above dht, so it failed to match the correct subvolume. Change-Id: I63d4b63cd8fb2806bc7b2b2f100dbef62202e6da BUG: 1236128 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11431 Reviewed-by: Joseph Fernandes Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* glusterd: Porting left out log messages to new frameworkNandaja Varma2015-06-2625-476/+1677
| | | | | | | | | | | Change-Id: I70d40ae3b5f49a21e1b93f82885cd58fa2723647 BUG: 1235538 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/11388 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* tier/ctr: Ignore creation of T file and Ctr Lookup heal improvememntsJoseph Fernandes2015-06-262-0/+32
| | | | | | | | | | | | | | | | | | 1) Ignore creation of T file in ctr_mknod 2) Ignore lookup for T file in ctr_lookup 3) Ctr_lookup: a. If the gfid and pgfid in empty dont record b. Decreased log level for multiple heal attempts c. Inode/File heal happens after an expiry period, which is configurable. d. Hardlink heal happens after an expiry period, which is configurable. Change-Id: Id8eb5092e78beaec22d05f5283645081619e2452 BUG: 1235269 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11334 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* ganesha: volinfo is not persisted after modifying optionsRajesh Joseph2015-06-261-0/+8
| | | | | | | | | | | | | | | | | | | | ganesha disables gluster NFS when it is enabled. Gluster NFS is disabled by storing nfs.disable as "on" in volinfo of each volume in the cluster. But volinfo is not persisted after the change. Due to which wrong info is passed in handshake leading to volume checksum mismatch. Bug: 1235751 Change-Id: Icd642f5068cc934bb77676fb8ef71b958a7b7384 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/11412 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Meghana M <mmadhusu@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterd/ afr : set afr pending xattrs on replace brickAnuradha2015-06-256-118/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is part one change to prevent data loss in a replicate volume on doing a replace-brick commit force operation. Problem: After doing replace-brick commit force, there is a chance that self heal happens from the replaced (sink) brick rather than the source brick leading to data loss. Solution: During the commit phase of replace brick, after old brick is brought down, create a temporary mount and perform setfattr operation (on virtual xattr) indicating AFR to mark the replaced brick as sink. As a part of this change replace-brick command is being changed to use mgmt_v3 framework rather than op-state-machine framework. Many thanks to Krishnan Parthasarathi for helping me out on this. Change-Id: If0d51b5b3cef5b34d5672d46ea12eaa9d35fd894 BUG: 1207829 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10076 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* snapshot: Fix terminating slash in brick mount pathAvra Sengupta2015-06-251-1/+0
| | | | | | | | | | | | | | | | glusterd_find_brick_mount_path(), returns mount path, with a terminating '/' at the ned of the string in cases where the brick dir is a dir in the lvm root dir. Ignoring the terminating '/' fixes the issue. Change-Id: Ie7e63d37d48e2e03d541ae0076b8f143b8c9112f BUG: 1232430 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/11262 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd: Store peerinfo after updating hostnamesKaushal M2015-06-241-0/+6
| | | | | | | | | | Change-Id: I1d36ac63de810061d60edb28b6f591ae45d5cd3a BUG: 1234842 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/11365 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: fix quorum calculation logicAtin Mukherjee2015-06-241-6/+1
| | | | | | | | | | | | | | | glusterd_get_quorum_cluster_counts () skips quorum calculation if it finds any of its peer in QUORUM_WAITING state. This means if any peer probe has been triggered and at the same point of time a transaction has been initiated, it might pass through the server quorum check which it should not. Change-Id: I44eda8905eab3349c9ebf2842e7131d4e758a528 BUG: 1232686 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/11275 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* glusterd: use mkdir_p for creating rundirAtin Mukherjee2015-06-242-8/+8
| | | | | | | | | | | | | | | | | | | snapdsvc initialization is now dependent on all subsequent directories and volfile creations as per commit 2b9efc9. However this may not hold true correct for all the cases. While importing a volume we would need to start the service before the store creation. To avoid this dependency, use mkdir_p instead of mkdir to create rundir Change-Id: Ib251043398c40f1b76378e3bc6d0c36c1fe4cca3 BUG: 1234819 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/11364 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd/mgmt_v3: Print the node info, with failure messagesAvra Sengupta2015-06-221-4/+2
| | | | | | | | | | | | | While reporting multiple failure messages from different nodes, print the node ip and the failure stage. Change-Id: I657d3debf1b509e4a27baf9e4b580f1ee32e3c5f BUG: 1205596 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/11234 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/uss/snapshot: Intialise snapdsvc after volfiles are createdAvra Sengupta2015-06-223-22/+29
| | | | | | | | | | | | | snapd svc should be initialised only after all relevant volfiles and directories are created. Change-Id: I96770cfc0b350599cd60ff74f5ecec08145c3105 BUG: 1231197 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/11227 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>