summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* libglusterfs:Now mempool is added to ctx pool list under a lockRajesh Joseph2016-11-222-3/+5
| | | | | | | | | | | | | | | | | | | mempool is added to ctx pool list without any lock. This can cause undefined behaviour in case of multithreaded environment. Fix: modify the list only under ctx->lock Change-Id: I7bdbb3db48a899bb0e41427e149b13c0facaedba Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> BUG: 1394719 Reviewed-on: http://review.gluster.org/15842 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterd: dump volinfo->dict in gluster get-stateAtin Mukherjee2016-11-221-2/+6
| | | | | | | | | | | | Change-Id: I7e60629fb8003c620847fa63441f6b098db59721 BUG: 1396807 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15889 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Rohan Kanade <rkanade@redhat.com>
* performance/io-threads: Exit threads in fini() as wellPranith Kumar K2016-11-212-11/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: io-threads starts the thread in 'init()' but doesn't clean them up on 'fini()'. It relies on PARENT_DOWN to exit threads but there can be cases where event before PARENT_UP the graph init code can think of issuing fini(). This code path is hit when glfs_init() is called on a volume that is in 'stopped' state. It leads to a crash in ganesha process, because the io-thread tries to access freed memory. Fix: Ideal fix would be to wait for all fops in io-thread list to be completed on PARENT_DOWN, and have fini() do cleanup of threads. Because there is no proper documentation about how PARENT_DOWN/fini are supposed to be used, we are getting different kinds of sequences in different higher level protocols. So for now cleaning up in both PARENT_DOWN and fini(). Fuse doesn't call fini() gfapi is not calling PARENT_DOWN in some cases, so for now I don't see another way out. BUG: 1396793 Change-Id: I9c9154e7d57198dbaff0f30d3ffc25f6d8088aec Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15888 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht Set layout after mkdir as rootN Balachandran2016-11-211-0/+3
| | | | | | | | | | | | | | | | | | | | DHT does not set the layout for newly created directories as root. This causes EPERM failures when a non-root user with insufficient permissions creates directories. credit: srangana@redhat.com for RCA Change-Id: Ia646e41665ce172c43c5f01d2707455e8eb374ed BUG: 1392772 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15794 Reviewed-by: Susant Palai <spalai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UPPoornima G2016-11-217-38/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently these are few events related to child_up/down: GF_EVENT_CHILD_UP : Issued when any of the protocol client connects. GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec GF_EVENT_CHILD_DOWN : Issued when any of the protocol client disconnects. These events get modified at the dht/afr/ec layers. Here is a brief on the same. DHT: - All the subvolumes reported once, and atleast one child came up, then GF_EVENT_CHILD_UP is issued - connect GF_EVENT_CHILD_UP is issued - disconnect GF_EVENT_CHILD_MODIFIED is issued - All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued AFR: - First subvolume came up, then GF_EVENT_CHILD_UP is issued - Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED - Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued - Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP, GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes go up or down. Now with md-cache changes, there is a necessity to differentiate between child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and getting rid of GF_EVENT_CHILD_MODIFIED. [1] http://review.gluster.org/12573 Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58 BUG: 1396038 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15764 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* snapshot/scheduler: Removing dependency of scheduler on eventingAvra Sengupta2016-11-201-16/+61
| | | | | | | | | | | | Change-Id: I7de156d8186c32092ec5e9d174d023f4782947c0 BUG: 1396364 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/15876 Reviewed-by: Aravinda VK <avishwan@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* events: Add FMT_WARN for gf_eventPranith Kumar K2016-11-188-21/+30
| | | | | | | | | | | | | | | | | Raghavendra G found that posix is trying to print %s but passing an int when HEALTH_CHECK fails in posix. These are the kind of bugs that should be caught at compilation itself. Also fixed the problematic gf_event() callers. BUG: 1386097 Change-Id: Id7bd6d9a9690237cec3ca1aefa2aac085e8a1270 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15671 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* ganesha/scripts : use export id for dbus signalsJiffin Tony Thottan2016-11-172-9/+9
| | | | | | | | | | | | | | | | | Currently for add export and update export parameter passed for executing those signal is "PATH". This is based on assumption that volume name and PATH will always be same. But it is wrong for subdir exports. The only reliable parameter in export configuration file is "Export_Id". Change-Id: Ic63ff44ac7736e14502034b74beaae27292eddf9 BUG: 1389746 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/15751 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cli: Print to screen frequentlyPranith Kumar K2016-11-171-1/+6
| | | | | | | | | | | | | | | | | Problem: CLI appears to be hung because XML document is not flushed periodically Fix: Flush the buffer as soon as we print something BUG: 1395993 Change-Id: Ic5f61d4c7d29ee162a124a049e60ceb810d6da6d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15863 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tools/glusterfind: xml parsing fix for tiered volumesMilind Changire2016-11-171-3/+12
| | | | | | | | | | | | | | | | | | | gluster volume info <vol> --xml for non-tiered volumes have 'bricks/brick' elements under the 'volInfo/volumes/volume' element. However, tiered volumes have a 'bricks/hotBricks/brick' and 'bricks/coldBricks/brick' elements under the 'volInfo/volumes/volume' element. Fix main.py::get_nodes() BUG: 1389481 Change-Id: I2f4465bfa8a55e7fa87917d3ec3e69b05d5241b9 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/15746 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* eventsapi: Auto reload Webhooks data when modifiedAravinda VK2016-11-171-1/+20
| | | | | | | | | | | | | | | | | | | glustereventsd depends on reload signal to reload the Webhooks configurations. But if reload signal missed, no events will be sent to newly added Webhook. Added auto reload based on webhooks file mtime. Before pushing events to Webhooks, reloads webhooks configurations if previously recorded mtime is different than current mtime. BUG: 1388862 Change-Id: I83a41d6a52d8fa1d70e88294298f4a5c396d4158 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15731 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* marker: Fix inode value in loc, in setxattr fopPoornima G2016-11-172-0/+31
| | | | | | | | | | | | | | | | | | | | | | On recieving a rename fop, marker_rename() stores the, oldloc and newloc in its 'local' struct, once the rename is done, the xtime marker(last updated time) is set on the file, but sending a setxattr fop. When upcall receives the setxattr fop, the loc->inode is NULL and it crashes. The loc->inode can be NULL only in one valid case, i.e. in rename case where the inode of new loc can be NULL. Hence, marker should have filled the inode of the new_loc before issuing a setxattr. Change-Id: Id638f678c3daaf4a5c29b970b58929d377ae8977 BUG: 1394131 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15826 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* cli/rebalance: remove brick status is incorrectN Balachandran2016-11-173-5/+61
| | | | | | | | | | | | | | | | | | | | | | If a remove brick operation is preceded by a fix-layout, running remove-brick status on a node which does not contain any of the bricks that were removed displays fix-layout status. The defrag_cmd variable was not updated in glusterd for the nodes not hosting removed bricks causing the status parsing to go wrong. This is now updated. Also made minor modifications to the spacing in the fix-layout status output. Change-Id: Ib735ce26be7434cd71b76e4c33d9b0648d0530db BUG: 1389697 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15749 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* system/posix-acl: Log reason for EACCESPranith Kumar K2016-11-177-17/+176
| | | | | | | | | | | | | | | | It is becoming increasingly difficult to debug the reason why posix-acl decides to fail a fop with EACCES. This patch prints a big log everytime such a condition occurs giving out the details that may help in finding why the fop is errored out. Change-Id: I2505baaafb5d77ef6c187554ff027df9b20468db BUG: 1394548 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15837 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* protocol/client: Fix iobref and iobuf leaks in COMPOUND fopKrutika Dhananjay2016-11-161-1/+2
| | | | | | | | | | | Change-Id: I408879aa2bbd8ea176fbc0d0eba5567e5df1b2b3 BUG: 1395687 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15860 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* protocol/server: Print error-xlator namePranith Kumar K2016-11-162-182/+277
| | | | | | | | | | | | | | | | | | | | Problem: At the moment from which xlator the errors are stemming from is a mystery. Fix: With this patch we can find on the server side which xlator first gave the errno received by server xlator. I am not yet sure how to get this done for client side which has lot of copy_frame()s. May be another patch. Change-Id: Ie13307b965facf2a496123e81ce0bd6756f98ac9 BUG: 1394548 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15836 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* common-ha: nfs-grace-monitor timed out unknown error messagesKaleb S. KEITHLEY2016-11-161-3/+2
| | | | | | | | | | | | | | | grace_mon_monitor() occasionally returns OCF_ERR_GENERIC, but it ought to return OCF_NOT_RUNNING instead. Change-Id: I3d550e33cc3d0a8dce4333ced72db19b3b2f4f2e BUG: 1394224 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15831 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
* geo-rep/cli: Validate Checkpoint labelAravinda VK2016-11-161-0/+25
| | | | | | | | | | | | | | | | | | Checkpoint command accepts "now" or any other Time in "%Y-%m-%d %H:%M:%S" format as label. Validation added with this patch for the input label. Checkpoint set will fail for invalid label. BUG: 1388401 Change-Id: I23518c151ab4b294f64cae3b78baaacb3d8f7b82 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15721 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* Revert "rpc: Fix the race between notification and reconnection"Pranith Kumar Karampuri2016-11-161-4/+3
| | | | | | | | | | | | | | | | This reverts commit a6b63e11b7758cf1bfcb67985e25ec02845f0995. Nithya and Rajesh found that the mount fails sometimes after this patch was merged so reverting it. BUG: 1386626 Change-Id: I959a5b6c7da61368cf4c67c98193c6e8fdd1755d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15838 Reviewed-by: N Balachandran <nbalacha@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Check for null inodeN Balachandran2016-11-151-2/+5
| | | | | | | | | | | | | | | Check for NULL inode before attempting to set dht inode ctx. Change-Id: I7693c18445f138221d8417df5e95b118cedb818a BUG: 1395261 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15847 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* common-ha: remove /etc/corosync/corosync.conf in teardown/cleanupKaleb S. KEITHLEY2016-11-151-2/+5
| | | | | | | | | | | | | | | | | | | | In newer versions of corosync we observe that after tearing down an existing HA cluster, when trying to set up a new cluster, `pcs cluster start --all` will fail if corosync believes the nodes are already in the cluster based on the presence of, and the contents of /etc/corosync/corosync.conf So we summarily delete it. (An alternative/work-around is to use `pcs cluster start --force --all`) Change-Id: I225f4e35e3b605e860ec4f9537c40ed94ac68625 BUG: 1394881 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15843 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com>
* nfs/cli : add warning message while enabling gluster nfsJiffin Tony Thottan2016-11-151-0/+14
| | | | | | | | | | | | | Change-Id: Ice70003f7295d2d8af9877e1ecf6f3c81422b30c BUG: 1376693 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/15805 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Update copyright content for glusterfs binariesAnoop C S2016-11-102-21/+21
| | | | | | | | | | | | Change-Id: I2d5de7ae634d55ae32977e337f366586eab449e4 BUG: 1198849 Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/15819 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs: revalidate lookup converted to fresh lookupMohammed Rafi KC2016-11-104-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | when an inode ctx is missing for a linked inode the revalidate lookups are converted to fresh. This could result in sending ESTALE when the gfid are recreated We are not able to reproduce the issue with normal setup, most part of RCA was done with code reading. Possible scenario in which this bug can reproduce, Delete a file and recreate a new file with same name, at the same time from another client process try to list/or access the file. In this case the second client may throw an ESTALE error for such files Thanks to Soumya and Pranith for doing the complete RCA Change-Id: I73992a65844b09a169cefaaedc0dcfb129d66ea1 BUG: 1379720 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/15580 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* glusterfsd: Continuous errors are getting in mount logs while glusterd is downMohit Agrawal2016-11-101-3/+6
| | | | | | | | | | | | | | | | | | | Problem: when glusterd is down, getting the continuous mgmt_rpc_notify errors messages in the volume mount log for every 3 seconds,it will consume disk space. Solution: To reduce the frequency of error messages use GF_LOG_OCCASIONALLY. BUG: 1388877 Change-Id: I6cf24c6ddd9ab380afd058bc0ecd556d664332b1 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15732 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* performance/open-behind: Avoid deadlock in statedumpPranith Kumar K2016-11-091-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: open-behind is taking fd->lock then inode->lock where as statedump is taking inode->lock then fd->lock, so it is leading to deadlock In open-behind, following code exists: void ob_fd_free (ob_fd_t *ob_fd) { loc_wipe (&ob_fd->loc); <<--- this takes (inode->lock) ....... } int ob_wake_cbk (call_frame_t *frame, void *cookie, xlator_t *this, int op_ret, int op_errno, fd_t *fd_ret, dict_t *xdata) { ....... LOCK (&fd->lock); <<---- fd->lock { ....... __fd_ctx_del (fd, this, NULL); ob_fd_free (ob_fd); <<<--------------- } UNLOCK (&fd->lock); ....... } ================================================================= In statedump this code exists: inode_dump (inode_t *inode, char *prefix) { ....... ret = TRY_LOCK(&inode->lock); <<---- inode->lock ....... fd_ctx_dump (fd, prefix); <<<----- ....... } fd_ctx_dump (fd_t *fd, char *prefix) { ....... LOCK (&fd->lock); <<<------------------ this takes fd-lock { ....... } Fix: Make sure open-behind doesn't call ob_fd_free() inside fd->lock BUG: 1393259 Change-Id: I4abdcfc5216270fa1e2b43f7b73445f49e6d6e6e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15808 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/afr: When failing fop due to lack of quorum, also log error stringKrutika Dhananjay2016-11-091-11/+12
| | | | | | | | | | | | Change-Id: I39de2bcfc660f23a4229a885a0e0420ca949ffc9 BUG: 1392865 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15800 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* upcall: Fix a log levelPoornima G2016-11-081-2/+7
| | | | | | | | | | | | | | | | In upcall_cache_invalidation(), the gfid can be NULL in certain valid test cases(eg: entry for ".." in readdirp), hence change the log level from WARNING to DEBUG. Change-Id: Ic90167a0e2076694e9131913114460df7b939b30 BUG: 1392167 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15777 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* jbr: Sending rollback from failed fop to fdlAvra Sengupta2016-11-0818-58/+347
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of a failed fop, the failure is detected by the leader in the jbr-server in two places. First during a quorum check of +ve responses when it receives responses from all the followers. At this point if the fop hasn't been successfully journaled at a quorum of followers (as in there is no merit in trying the fop in the leader as the quorum will never be met), then we fail the fop. Also if this quorum is met, then the fop is tried on the leader, and after the leader completes the fop a quorum check similar to the previous one is done again, this time including the leaders outcome. If quorum is not met, then we fail the fop. In both these cases, when the fop fails we send a -ve ack to the client. With this patch, now we will also send a rollback through a GF_FOP_IPC to all the followers(and also to the leader in the second case of failure). This rollback will contain the index and term number of the fop which failed. This will be recorded in the respective journals of the bricks and will be used to rollback the fop on that brick later. A subsequent write, and it's respective rollback would look something like the following in the journal. The trusted.jbr.term and trusted.jbr.index present in the dict of both the logs, relate them, and the presence of "rollback-fop" in the dict of IPC indicates that it is a rollback fop, and the value 13(stands for GF_FOP_WRITE) indicates what kind of rollback operation it is. === GF_FOP_WRITE fd = <gfid 77f12ea2-ca56-40e3-a46e-ba2308baa035> vector = <158 bytes> offset = 0 (0x0) flags = 32769 (0x8001) xdata = dict { trusted.jbr.term = 0 <2 bytes> trusted.jbr.index = 4 <2 bytes> } === GF_FOP_IPC xdata = dict { trusted.jbr.term = 0 <2 bytes> trusted.jbr.index = 4 <2 bytes> rollback-fop = 13 <3 bytes> } Change-Id: I70b6a143d20697153d58e2f719e34ecd1ed160a5 BUG: 1349385 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/14783 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* build: Update version check for resource-agents on RHELKaleb KEITHLEY2016-11-081-3/+3
| | | | | | | | | | | | | | | | | | | | | With bug1302545[1] and bug1303037[2], portblock resource agent is made available as part of resource-agents-3.9.5* package on RHEL 6.8 and RHEL 7.3 respectively. This change is to update the same in the glusterfs spec. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1302545 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1303037 Change-Id: I5917e5f22f07b4121d636b099dd8815847e1338f BUG: 1389293 Author: Kaleb KEITHLEY <kkeithle@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15803 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/shard: Fill loc.pargfid too for named lookups on individual shardsKrutika Dhananjay2016-11-084-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a sharded volume when a brick is replaced while IO is going on, named lookup on individual shards as part of read/write was failing with ENOENT on the replaced brick, and as a result AFR initiated name heal in lookup callback. But since pargfid was empty (which is what this patch attempts to fix), the resolution of the shards by protocol/server used to fail and the following pattern of logs was seen: Brick-logs: [2016-11-08 07:41:49.387127] W [MSGID: 115009] [server-resolve.c:566:server_resolve] 0-rep-server: no resolution type for (null) (LOOKUP) [2016-11-08 07:41:49.387157] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null) (00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82) ==> (Invalid argument) [Invalid argument] Client-logs: [2016-11-08 07:41:27.497687] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.497755] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.498500] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.499680] E [MSGID: 133010] Also, this patch makes AFR by itself choose a non-NULL pargfid even if its ancestors fail to initialize all pargfid placeholders. Change-Id: I5f85b303ede135baaf92e87ec8e09941f5ded6c1 BUG: 1392445 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15788 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd/quota: upgrade quota.conf file during an upgradeManikandan Selvaganesh2016-11-075-16/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem ======= When quota is enabled on 3.6, it will have quota conf version in quota.conf as v1.1. This node gets upgraded to 3.7 but it will still have quota conf version as v1.1 until a quota enable/disable/set limit is initiated. When this is not initiated and when this node tries to peer probe a node which is a fresh install of 3.7 (which will have quota conf version as v1.2), then this will result in "Peer rejected" state. This patch fixes the issue. Solution ======== When an upgrade happens from 3.6 to 3.7, quota.conf file needs to be modified as well. With 3.6, in quota.conf the version will be v1.1 and it needs to be changed to v1.2 from 3.7. This is because in 3.7, inode quota feature is introduced. So when an op-version bumpup happens quota.conf needs to be upgraded with quota conf version v1.2 and all the 16 byte uuid needs to be changed to 17 bytes uuid as well. Previously, when the cluster version is upgraded to 3.7, the quota.conf got upgraded as well. But, the upgradation was done only when quota enable/disable/set limit is done. With this patch, the upgradation is done during a cluster op version bump up as well. Change-Id: Idb5ba29d3e1ea0e45c85d87c952c75da9e0f99f0 BUG: 1371539 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/15352 Tested-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* gfapi: async fops should unref in callbacksRaghavendra Talur2016-11-053-18/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If fd is unref'd at the end of async call then the unref in cbks would lead to double unref and possible crash. Removing duplicate unrefs. Added unref only in failure cases. A simple test case has been added to test async write case. Need to extend the same for other async APIs too. Details: All glfd based calls in libgfapi, except for glfs_open and glfs_close, behave in the same way. At the start of the operation, they take a ref on glfd and fd. At the end of the operation, they unref it. Async calls are a little different as they unref in the cbk function. A successfull open call does not unref either the glfd or fd, thereby functioning as a reference for a OPEN file object. glfs_close makes a syncop_flush call sandwiched between a fd ref and unref(this can be removed, more on this below), followed by a call to glfs_mark_glfd_for_deletion which unrefs glfd and also calls glfs_fd_destroy as a release function thereby doing a unref on fd too. Functionally, there is no problem with how everything works when as described above. However, it is a little non-intuitive that we need to perform a fd_unref as a consequence of a implicit fd_ref that happens within glfs_resolve_fd. As we perform a GF_REF_GET(glfd) at the start of every operation, it would be worthwhile to remove the fd_ref that glfs_resovle_fd takes and do away with explicit fd_unref()s at the end of every operation. This is the same reason why we don't need the fd_ref in glfs_close. This is however not in the scope of this patch. Change-Id: I86b1d3b2ad846b16ea527d541dc82b5e90b0ba85 BUG: 1391086 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/15768 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Prasanna Kumar Kalever <pkalever@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* posix-acl: check dictionary before using itRajesh Joseph2016-11-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | If extended attributes are not present in md-cache it returns NULL as xattr. posix acl xlator should check for NULL before using xattr. If normal and default ACLs are not set on file then md-cache will not contain system.posix_acl_access and system.posix_acl_default extended attributes in its cache. Therefore posix_acl_lookup_cbk should check xattr before using it, otherwise the logs will get filled with dictionary errors. Change-Id: Icebf73cf0b313bd3e82ca8cbda63786dd0fa47da BUG: 1391387 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/15769 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* gfapi/upcall: Fix mismatch in few upcall API SYMVERSoumya Koduri2016-11-031-3/+3
| | | | | | | | | | | | | | | | | There is mismatch in few of the upcall API routine definitions and their corresponding symbol version declarations. Fixed the same. Change-Id: I2edfd9546a4c6a9128757f3b68e3ae4edd2c7a79 BUG: 1344714 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15760 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Kaleb KEITHLEY <kkeithle@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* xlators/trash : Remove upper limit for trash max file sizeJiffin Tony Thottan2016-11-032-18/+2
| | | | | | | | | | | | | | | | Currently file which size exceeds more than 1GB never moved to trash directory. This is due to the hard coded check using GF_ALLOWED_MAX_FILE_SIZE. Change-Id: I2ed707bfe1c3114818896bb27a9856b9a164be92 BUG: 1386766 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/15689 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Anoop C S <anoopcs@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* snapshot: Fix the failure to recreate clones with same nameAvra Sengupta2016-11-015-17/+57
| | | | | | | | | | | | | | | | | | | The brick path of snapshot clones contained the clonename, thereby failing to create newer clones with the same name after the original clone had been deleted. This fix creates the brick path with the clone's vol id instead of the clones name. Hence future clones with the same name will not have the namespace clash. Change-Id: I262712adc576122f051b5d1ce171d020efaefd1a BUG: 1387160 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/15683 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* performance/write-behind: fix flush stuck by former failed writesRyan Ding2016-11-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the issue is happened in this case: assume a file is opened with fd1 and fd2. 1. some WRITE opto fd1 got error, they were add back to 'todo' queue because of those error. 2. fd2 closed, a FLUSH op is send to write-behind. 3. FLUSH can not be unwind because it's not a legal waiter for those failed write(as func __wb_request_waiting_on() say). and those failed WRITE also can not be ended if fd1 is not closed. fd2 stuck in close syscall. to resolve this issue, we can change the way we determine 2 requests is 'conflict': flush/fsync is not conflict with those write that is not belonged to them. so __wb_pick_winds() can wind the FLUSH op. below is some information when the stuck issue happen: glusterdump logs: [xlator.performance.write-behind.wb_inode] path=/ltp-F9eG0ZSOME/rw-buffered-16436 inode=0x7fdbe8039b9c window_conf=1048576 window_current=249856 transit-size=0 dontsync=0 [.WRITE] request-ptr=0x7fdbe8020200 refcount=1 wound=no generation-number=4 req->op_ret=-1 req->op_errno=116 sync-attempts=3 sync-in-progress=no size=131072 offset=1220608 lied=-1 append=0 fulfilled=0 go=0 [.WRITE] request-ptr=0x7fdbe8068c30 refcount=1 wound=no generation-number=5 req->op_ret=-1 req->op_errno=116 sync-attempts=2 sync-in-progress=no size=118784 offset=1351680 lied=-1 append=0 fulfilled=0 go=0 [.FLUSH] request-ptr=0x7fdbe8021cd0 refcount=1 wound=no generation-number=6 req->op_ret=0 req->op_errno=0 sync-attempts=0 gdb detail about above 3 requests: (gdb) print *((wb_request_t *)0x7fdbe8021cd0) $2 = {all = {next = 0x7fdbe803a608, prev = 0x7fdbe8068c30}, todo = {next = 0x7fdbe803a618, prev = 0x7fdbe8068c40}, lie = {next = 0x7fdbe8021cf0, prev = 0x7fdbe8021cf0}, winds = {next = 0x7fdbe8021d00, prev = 0x7fdbe8021d00}, unwinds = {next = 0x7fdbe8021d10, prev = 0x7fdbe8021d10}, wip = { next = 0x7fdbe8021d20, prev = 0x7fdbe8021d20}, stub = 0x7fdbe80224dc, write_size = 0, orig_size = 0, total_size = 0, op_ret = 0, op_errno = 0, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_FLUSH, lk_owner = {len = 8, data = "W\322T\f\271\367y$", '\000' <repeats 1015 times>}, iobref = 0x0, gen = 6, fd = 0x7fdbe800f0dc, wind_count = 0, ordering = {size = 0, off = 0, append = 0, tempted = 0, lied = 0, fulfilled = 0, go = 0}} (gdb) print *((wb_request_t *)0x7fdbe8020200) $3 = {all = {next = 0x7fdbe8068c30, prev = 0x7fdbe803a608}, todo = {next = 0x7fdbe8068c40, prev = 0x7fdbe803a618}, lie = {next = 0x7fdbe8068c50, prev = 0x7fdbe803a628}, winds = {next = 0x7fdbe8020230, prev = 0x7fdbe8020230}, unwinds = {next = 0x7fdbe8020240, prev = 0x7fdbe8020240}, wip = { next = 0x7fdbe8020250, prev = 0x7fdbe8020250}, stub = 0x7fdbe8062c3c, write_size = 131072, orig_size = 4096, total_size = 0, op_ret = -1, op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, iobref = 0x7fdbe80311a0, gen = 4, fd = 0x7fdbe805c89c, wind_count = 3, ordering = {size = 131072, off = 1220608, append = 0, tempted = -1, lied = -1, fulfilled = 0, go = 0}} (gdb) print *((wb_request_t *)0x7fdbe8068c30) $4 = {all = {next = 0x7fdbe8021cd0, prev = 0x7fdbe8020200}, todo = {next = 0x7fdbe8021ce0, prev = 0x7fdbe8020210}, lie = {next = 0x7fdbe803a628, prev = 0x7fdbe8020220}, winds = {next = 0x7fdbe8068c60, prev = 0x7fdbe8068c60}, unwinds = {next = 0x7fdbe8068c70, prev = 0x7fdbe8068c70}, wip = { next = 0x7fdbe8068c80, prev = 0x7fdbe8068c80}, stub = 0x7fdbe806746c, write_size = 118784, orig_size = 4096, total_size = 0, op_ret = -1, op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, iobref = 0x7fdbe8052b10, gen = 5, fd = 0x7fdbe805c89c, wind_count = 2, ordering = {size = 118784, off = 1351680, append = 0, tempted = -1, lied = -1, fulfilled = 0, go = 0}} you can see they are all on 'todo' queue, and FLUSH op fd is not the same WRITE op fd. Change-Id: Id687f9cd3b9f281e1a97c83f1ce981ede272b8ab BUG: 1372211 Signed-off-by: Ryan Ding <ryan.ding@open-fs.com> Reviewed-on: http://review.gluster.org/15380 Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* build: incorrect Requires for portblock resource agentKaleb S. KEITHLEY2016-10-281-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | was: Requires: /usr/lib/ocf/resource.d/portblock s/b: Requires: /usr/lib/ocf/resource.d/heartbeat/portblock or: Requires: resource-agents >= 3.9.6 Note: RHEL6.8 and RHEL7.2 have resource-agents-3.9.5 which does not contain the portblock resource agent. I'm not sure what the point is actually of: Requires: /usr/lib/ocf/resource.d/heartbeat/portblock as it will fail to install on RHEL whether you have the resource-agents package installed or not. Hence wrapping it in %if ( fedora ). Change-Id: Ia7d6a475464c7469018678c98fc710a3b3bfc553 BUG: 1389293 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15743 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tests: fix cleanup in bug-1110262.tJeff Darcy2016-10-281-7/+8
| | | | | | | | | | | | | | | | | The test wasn't cleaning up the user/group it had created if it terminated abnormally. We have a mechanism (push_trapfunc) to add cleanup actions in a way that ensures they'll be run when necessary, so I changed the test to use it. While I was there, I fixed it to use kill_brick instead of "ps|grep|kill" because that will be necessary for it to pass with brick multiplexing anyway. Change-Id: Ia515bd2420050f922970d28c5856c55df9b5247b Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/15744 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd/shared storage: Check for hook-script at stagingAvra Sengupta2016-10-272-6/+25
| | | | | | | | | | | | | | | | | Check if S32gluster_enable_shared_storage.sh is present at /var/lib/glusterd/hooks/1/set/post/ at staging before proceeding with the command. Fail the command with the appropriate error message in case it is not present. Change-Id: I84e3912f1cdffb927f8a40d74d52be43ee69388b BUG: 1388348 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/15718 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* md-cache: Invalidate cache entry for open() with O_TRUNCSoumya Koduri2016-10-261-0/+48
| | | | | | | | | | | | | | | | | When a file is opened with O_TRUNC flag set, its size gets set to '0'. This case needs to be handled in md-cache to avoid sending incorrect cached stat. Change-Id: I95d1f8a6634734898883ede010c3e7b0b7eb97d9 BUG: 1382266 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15618 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: jiffin tony Thottan <jthottan@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/tier: break the monolith processing functionMilind Changire2016-10-262-309/+431
| | | | | | | | | | | | | | Break tier_migrate_using_query_file() into a more manageable tier_migrate_link() and helpers. Change-Id: I5eb2d2cff9e7a2a2da567c3c4c2d53aab195f477 BUG: 1358296 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/14957 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* afr,ec: Heal device files with correct major, minor numbersPranith Kumar K2016-10-264-13/+27
| | | | | | | | | | | | | | Thanks a lot to xiaoping.wu@nokia.com from Nokia for the bug and the fix. BUG: 1384297 Change-Id: Ie443237e85d34633b5dd30f85eaa2ac34e45754c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15728 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* tools/glusterfind: kill remote processes and separate run-time directoriesMilind Changire2016-10-254-16/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem #1: Hitting CTRL+C leaves stale processes on remote nodes if glusterfind pre has been initiated. Solution #1: Adding "-t -t" to ssh command-line forces pseudo-terminal to be assigned to remote process. When local process receives Keyboard Interrupt, SIGHUP is immediately conveyed to the remote terminal causing remote changelog.py process to terminate immediately. Problem #2: Concurrent glusterfind pre runs are not possible on the same glusterfind session in case of a runaway process. Solution #2: glusterfind pre runs now add random directory name to the working directory to store and manage temporary database and changelog processing. If KeyboardInterrupt is received, the function call run_cmd_nodes("cleanup", args, tmpfilename=gtmpfilename) cleans up the remote run specific directory. Patch: 7571380 cli/xml: Fix wrong XML format in volume get command broke "gluster volume get <vol> changelog.rollover-time --xml" Now fixed function utils.py::get_changelog_rollover_time() Fixed spurious trailing space getting written if second path is empty in main.py::write_output() Fixed repetitive changelog processing in changelog.py::get_changes() Change-Id: Ia8d96e2cd47bf2a64416bece312e67631a1dbf29 BUG: 1382236 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/15609 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* CLI/TIER: throw warning regarding the removal of the older commands.hari2016-10-251-0/+28
| | | | | | | | | | | | | | | | The older tier commands for attach tier and detach tier have to be removed from code. This patch sends a warning asking to use new command as older ones are depricated and will be removed. Change-Id: Ie1c62947bad6ff106f40331ff6134838a6c72a7a BUG: 1388062 Signed-off-by: hari <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/15713 Tested-by: hari gowtham <hari.gowtham005@gmail.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* glusterd: use GF_BRICK_STOPPING as intermediate brickinfo->status stateAtin Mukherjee2016-10-252-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | On a volume stop trigger glusterd issues a brick-op to terminate the brick process during brick-op phase , however in the commit-op glusterd once again tries to kill the same process if it exists and then mark the brickinfo->status flag to GF_BRICK_STOPPED. In the former case, if brick is successfully killed there is a possibility that GlusterD will receive RPC_CLNT_DISCONNECT from the said brick process before even the commit op phase is executed and hence by that time brickinfo->status will still be set to GF_BRICK_STARTED. BRICK_DISCONNECT event should be only sent if a brick has been killed and not through a volume stop/remove brick trigger, however due to this trace, this event is also sent out on a volume stop. Fix is to introduce an intermediate state GF_BRICK_STOPPING which can be used to mark the brick status at brick op phase of volume stop/remove brick to avoid sending spurious BRICK_DISCONNECT events on a volume stop trigger. Change-Id: Ieed4450e1c988715e0f9958be44faa6b14be81e1 BUG: 1387652 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15699 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com>
* cluster/dht: Incorrect volname in rebalance eventsN Balachandran2016-10-251-6/+50
| | | | | | | | | | | | | | | | The rebalance event code was using strtok to parse the volume name which is incorrect. Reworked the code to get the correct volume name using strstr. Change-Id: Ib5f3305a34e6bf1ecfef677d87c5aff96bdeb0e6 BUG: 1388010 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15712 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* snapshot: Fix for memory leaks in snapshot code pathAvra Sengupta2016-10-253-12/+47
| | | | | | | | | | | Change-Id: Idc2cb16574d166e3c0ee1f7c3a485f1acb19fc8c BUG: 1386088 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/15668 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* rpc: Fix the race between notification and reconnectionPranith Kumar K2016-10-241-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: There was a hang because unlock on an entry failed with ENOTCONN. Client thinks the connection is down where as server thinks the connection is up. This is the race we are seeing: 1) Connection from client to the brick disconnects. 2) Saved frames unwind is called which unwinds all frames that were wound before disconnect. 3) connection from client to the brick happens and setvolume. 4) Disconnect notification for the connection in 1) comes now and calls client_rpc_notify() which marks the connection to be offline even when the connection is up. This is happening because I/O can retrigger connection before disconnect notification is sent to the higher layers in rpc. Fix: Notify the higher layers that a disconnect happened and then go ahead with reconnect logic. For the logs which point to the information above check: https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1 Thanks to Raghavendra G for suggesting the correct fix. BUG: 1386626 Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15681 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>