summaryrefslogtreecommitdiffstats
path: root/rpc
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'origin/release-3.8' into merge-3.8-againKevin Vigor2017-01-054-41/+62
|\ | | | | | | | | Change-Id: I844adf2aef161a44d446f8cd9b7ebcb224ee618a Signed-off-by: Kevin Vigor <kvigor@fb.com>
| * rpc: fix for race between rpc and protocol/clientRajesh Joseph2016-12-202-40/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible that the notification thread which notifies protocol/client layer about the disconnection is put to sleep and meanwhile, a fuse thread or a timer thread initiates and completes reconnection to the brick. The notification thread is then woken up and protocol/client layer updates its flags to indicate that network is disconnected. No reconnection is initiated because reconnection is rpc-lib layer's responsibility and its flags indicate that connection is connected. Fix: Serialize connect and disconnect notify > Credit: Raghavendra Talur <rtalur@redhat.com> > Reviewed-on: http://review.gluster.org/15916 > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra Talur <rtalur@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit aa22f24f5db7659387704998ae01520708869873) Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d BUG: 1401534 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/16025 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
| * cluster/afr: CLI for granular entry heal enablement/disablementKrutika Dhananjay2016-11-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/15747 When there are already existing non-granular indices created that are yet to be healed, if granular-entry-heal option is toggled from 'off' to 'on', AFR self-heal whenever it kicks in, will try to look for granular indices in 'entry-changes'. Because of the absence of name indices, granular entry healing logic will fail to heal these directories, and worse yet unset pending extended attributes with the assumption that are no entries that need heal. To get around this, a new CLI is introduced which will invoke glfsheal program to figure whether at the time an attempt is made to enable granular entry heal, there are pending heals on the volume OR there are one or more bricks that are down. If either of them is true, the command will be failed with the appropriate error. New CLI: gluster volume heal <VOL> granular-entry-heal {enable,disable} Change-Id: I342e0390f847fcb015a50ef58aedfcbcb58f4ed3 BUG: 1398501 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15942 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
| * rpc/socket.c : Modify socket_poller code in case of ENODATA error code.Mohit Agrawal2016-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Continuous warning message(ENODATA) are coming in socket_rwv while SSL is enabled. Solution: To avoid the warning message update one condition in socket_poller loop code before break from loop in case of error returned by poll functions. > BUG: 1386450 > Change-Id: I19b3a92d4c3ba380738379f5679c1c354f0ab9b1 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Reviewed-on: http://review.gluster.org/15677 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > (cherry picked from commit ec64ce2e1684003f4e7a20d4372e414bfbddb6fb) Change-Id: I70eaf8d454a1538e14b50c6fb1074f84dd10cdf5 BUG: 1387976 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15706 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* | Add halo-min-samples option, better swap logic, edge case fixesRichard Wareing2016-12-281-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Changes halo-decision to be based on the lowest halo value observed - Adds halo-min-sample option to wait until N latency samples have been gathered prior to activating halos. - Fixed 3 edge cases where halo's weren't being correctly config'd, or not configured as quickly as is possible. Namely: 1. Don't mark a child down if there's no better alternative (and you'd no longer satisfy min/max replicas); fixes unneccessary flapping. 2. If a child goes down and this causes us to fall below max_replicas, swap in a warm child immediately if it is within our halo latency (don't wait around for the next "ping"); swaps in a new child immediately helping with resiliency. 3. If the child latency is within the halo, and it's currently marked up, mark it down if it's the highest latency child and the number of children is > max_replicas; this will allow us to support the SHD use-case where we can "beam" a single copy to a geo and have it replicate within the geo after that. - More commenting Test Plan: - Run halo prove tests - Pointed compiled code at gfsglobal.prn2, tested out an NFS daemon and FUSE mounts to ensure they worked as expected on a large scale cluster. Reviewers: dph, jackl, cjh, mmckeen Reviewed By: mmckeen FB-commit-id: 7e2e8ae6b8ec62a5e0b31c9fd6100c81795b3424 Change-Id: Iba2b2f1bc848b4546cb96117ff1895f83953a4f8 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16304 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* | Make Halo calculate & use average latencies, not realtimeRichard Wareing2016-12-271-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Realtime latencies in practice have far too much jitter under real loading conditions, instead let's use a running average which will get very "heavy" over time such that temp spikes in brick latency will not affect halo decisions. Test Plan: - Run prove tests Reviewed By: mmckeen Change-Id: I5ebf9bc93c67d9a226287796dd7ca5eeb7b1cfa5 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16301 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* | Repair cluster prove tests for FB environmentKevin Vigor2016-12-212-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Several prove tests use the 'launch_cluster' function to set up a clustered volume. This replies on using multiple local IP addresses, one for each server. Since IPV6 provides only ::1 as a local address, as opposed to IPv4's complete 127.x.x.x subnet, this cannot work in a pure IPv6 environment. However, FB systems do at least have enough IPv4 stack to talk locally, so fix launch_cluster to work properly when default transport is IPv6. To do this: 1) explicitly set transport.address-family volume option to inet in launch_cluster(). 2) teach glusterd to honor transport.address-family when connecting to peer glusterds in glusterd_friend_rpc_create(). Previously transport.address-family was used only for binding local socket, not for communicating with peers. Test Plan: prove -f --timer ./tests/basic/glusterd/arbiter-volume-probe.t Reviewers: Subscribers: Tasks: Blame Revision: Change-Id: I077d8549dcdbe4919ac7df34856a4b2d1428cdb6 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16225 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* | Fix deadlock observed in T13390459Kevin Vigor2016-12-161-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fix deadlock in ping timer callback. Test Plan: run, mount volume. Reviewers: rwareing Reviewed By: rwareing Differential Revision: https://phabricator.intern.facebook.com/D3744945 Signature: t1:3744945:1474061471:3e3d1a5cefc541d26973535887c1f08c017fc049 Change-Id: Iaf94eb4c3acaa8b3ceeeb6a273db4109eea29a7c Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16168 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* | Allow OS to assign us a portKevin Vigor2016-12-162-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Replace complex and slow port selection code with bind(0). Test Plan: runtests.sh Reviewers: sshreyas Subscribers: Tasks: Blame Revision: Change-Id: I408a8528e58e1aafcd32eba6a8f1a759e0bf274e Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16150 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* | socket: Keepalives should happen on IPv6 as well as IPv4Shreyas Siravara2016-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Check for AF_INET *and* AF_INET6. - This is a cherry-pick of D3057373 to 3.8 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I53eb79284eddfee6e13821c6570809f575b96769 Reviewed-on: http://review.gluster.org/16155 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kevin Vigor <kvigor@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* | Halo Replication feature for AFR translatorRichard Wareing2016-12-153-11/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* | gluster: IPv6 single stack supportRichard Wareing2016-12-075-3/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - This diff changes all locations in the code to prefer inet6 family instead of inet. This will allow change GlusterFS to operate via IPv6 instead of IPv4 for all internal operations while still being able to serve (FUSE or NFS) clients via IPv4. - The changes apply to NFS as well. - This diff ports D1892990, D1897341 & D1896522 to the 3.8 branch. Test Plan: Prove tests! Reviewers: dph, rwareing Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I34fdaaeb33c194782255625e00616faf75d60c33 Reviewed-on: http://review.gluster.org/16059 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Tested-by: Shreyas Siravara <sshreyas@fb.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* | [glusterfs] Allow to set dynamic library path from env variableazzolini2016-12-061-1/+14
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This allows to ship all glusterfs dependencies to hadoop machines in a tarball. Test Plan: - build tarball: https://phabricator.fb.com/P2848521 - scp to a machine with no gluster installed echo "Hellow world" | LD_LIBRARY_PATH=glusterfs_libs GLUSTER_LIBDIR=glusterfs_libs ./glfscat $(shuf -n 1 <(smcc ls storage.gluster.gfsops.frc1) | cut -d: -f 1) groot /gfsetlprocstore/adslearner/users/azzolini/hello_world.txt (code for glfscat follows in a separate diff) Reviewers: rwareing Reviewed By: rwareing Differential Revision: https://phabricator.fb.com/D1009665 Change-Id: I8812929fc127ca291aa66e2430b5633892235915 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16032 Reviewed-by: Shreyas Siravara <shreyas.siravara@gmail.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd/quota: upgrade quota.conf file during an upgradeManikandan Selvaganesh2016-11-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem ======= When quota is enabled on 3.6, it will have quota conf version in quota.conf as v1.1. This node gets upgraded to 3.7 but it will still have quota conf version as v1.1 until a quota enable/disable/set limit is initiated. When this is not initiated and when this node tries to peer probe a node which is a fresh install of 3.7 (which will have quota conf version as v1.2), then this will result in "Peer rejected" state. This patch fixes the issue. Solution ======== When an upgrade happens from 3.6 to 3.7, quota.conf file needs to be modified as well. With 3.6, in quota.conf the version will be v1.1 and it needs to be changed to v1.2 from 3.7. This is because in 3.7, inode quota feature is introduced. So when an op-version bumpup happens quota.conf needs to be upgraded with quota conf version v1.2 and all the 16 byte uuid needs to be changed to 17 bytes uuid as well. Previously, when the cluster version is upgraded to 3.7, the quota.conf got upgraded as well. But, the upgradation was done only when quota enable/disable/set limit is done. With this patch, the upgradation is done during a cluster op version bump up as well. > Reviewed-on: http://review.gluster.org/15352 > Tested-by: Atin Mukherjee <amukherj@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 4b2cff614462508eef529c5d128e0974720e3f50) Change-Id: Idb5ba29d3e1ea0e45c85d87c952c75da9e0f99f0 BUG: 1392716 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/15791 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> Tested-by: Manikandan Selvaganesh <manikandancs333@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc/socket: Close pipe on disconnectionKaushal M2016-10-261-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Encrypted connections create a pipe, which isn't closed when the connection disconnects. This leaks fds, and gluster eventually ends up in a situation with fd starvation which leads to operation failures. > Change-Id: I144e1f767cec8c6fc1aa46b00cd234129d2a4adc > BUG: 1336371 > Signed-off-by: Kaushal M <kaushal@redhat.com> > Reviewed-on: http://review.gluster.org/14356 > Tested-by: MOHIT AGRAWAL <moagrawa@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I144e1f767cec8c6fc1aa46b00cd234129d2a4adc BUG: 1336376 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/15703 Tested-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* rpc/socket.c : Modify gf_log message in socket_poller code in case of errorMohit Agrawal2016-10-141-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In case of SSL after stopping the volume if client(mount point) is still trying to write the data on socket then it will throw an EIO error on that socket and given this log message is captured at every attempt this would flood the log file. Solution: To reduce the frequency of stored log message use GF_LOG_OCCASIONALLY instead of gf_log. > BUG: 1381115 > Change-Id: I66151d153c2cbfb017b3ebc4c52162278c0f537c > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Reviewed-on: http://review.gluster.org/15605 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > (cherry picked from commit 070145750006c87099f945b4990a4460d814c21f) Change-Id: I08a8a8ae66b80bba2fdb5afbcab19a0950f85104 BUG: 1384356 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15631 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* socket: log the client identifier in ssl connectMohit Agrawal2016-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: client identifier is not logged in message in ssl_setup_connection Solutuion: In ssl_setup_connection xl_private is not available in rpc_transport so changed to this peerinfo.identifier. > BUG: 1380275 > Change-Id: I05006a3d63e46de8c388298c22faa9a3329eb6f3 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Reviewed-on: http://review.gluster.org/15596 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> > (cherry picked from commit 2e23c62cc50037c8e61bcd9c04348409e7627181) Change-Id: Iad08817ee2c2828a08bc22e78c273390562ae9fb BUG: 1383882 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15624 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* socket: pollerr event shouldn't trigger socket_connnect_finishAtin Mukherjee2016-09-302-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If connect fails with any other error than EINPROGRESS we cannot get the error status using getsockopt (... SO_ERROR ... ). Hence we need to remember the state of connect and take appropriate action in the event_handler for the same. As an added note, a event can come where poll_err is HUP and we have poll_in as well (i.e some status was written to the socket), so for such cases we need to finish the connect, process the data and then the poll_err as is the case in the current code. Special thanks to Kaushal M & Raghavendra G for figuring out the issue. >Signed-off-by: Shyam <srangana@redhat.com> >Reviewed-on: http://review.gluster.org/15440 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ic45ad59ff8ab1d0a9d2cab2c924ad940b9d38528 BUG: 1373723 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15532 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* rpc: increase RPC/XID with each callbackNiels de Vos2016-09-202-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RPC/XID for callbacks has been hardcoded to GF_UNIVERSAL_ANSWER. In Wireshark these RPC-calls are marked as "RPC retransmissions" because of the repeating RPC/XID. This is most confusing when verifying the callbacks that the upcall framework sends. There is no way to see the difference between real retransmissions and new callbacks. This change was verified by create and removal of files through different Gluster clients. The RPC/XID is increased on a per connection (or client) base. The expectations of the RPC protocol are met this way. > Change-Id: I2116bec0e294df4046d168d8bcbba011284cd0b2 > BUG: 1377097 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/15524 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit e9b39527d5dcfba95c4c52a522c8ce1f4512ac21) Change-Id: I2116bec0e294df4046d168d8bcbba011284cd0b2 BUG: 1377290 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/15528 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* socket: SSL_shutdown should be called before socket shutdownRajesh Joseph2016-08-311-13/+28
| | | | | | | | | | | | | | | | | | | | | | | | SSL_shutdown shuts down an active SSL connection. But we are calling this after underlying socket is disconnected. > Change-Id: Ia943179d23395f42b942450dbcf26336d4dfc813 > BUG: 1362602 > Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> > Reviewed-on: http://review.gluster.org/15072 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> (cherry picked from commit 79e006b31a1e6d71f1af02176f8e8acaed7f8cd2) Change-Id: I6ce58bd5606278880e44c96d386acaeb0fef6275 BUG: 1371650 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/15359 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com>
* snapshot/cli: Fix snapshot status xml outputAvra Sengupta2016-08-241-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/14018/ snap status --xml errors out if a brick is down and doesn't have pid. It is handled in the cli of the snap status where "N/A" is displayed in such a scenario. Handled the same in xml snap status <snapname> --xml fails as the writer is not initialised for the same. Using GF_SNAP_STATUS_TYPE_ITER instead of GF_SNAP_STATUS_TYPE_SNAP for all snap's status to differentiate between the two scenarios. Added testcase volume-snapshot-xml.t to check all snapshot commands xml outputs > Reviewed-on: http://review.gluster.org/14018 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Change-Id: I99563e8f3e84f1aaeabd865326bb825c44f5c745 BUG: 1369372 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/15291 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* rpc/socket.c : Modify socket_poller code in case of ENODATA error code.Mohit Agrawal2016-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Polling failure errors are coming till volume is not come while SSL is enabled. Solution: To avoid the message update one condition in socket_poller code It will not exit from thread in case of received ENODATA from ssl_do function. Backport of commit 84e9fc2fb5fabf9d1e553a420854a306cdb8a168 > Change-Id: Ia514e99b279b07b372ee950f4368ac0d9c702d82 > BUG: 1349709 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Reviewed-on: http://review.gluster.org/14786 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> > (cherry picked from commit 84e9fc2fb5fabf9d1e553a420854a306cdb8a168) BUG: 1359654 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: If1820c0b3d0cd976875137bc1175d4b2008779af Reviewed-on: http://review.gluster.org/14999 Tested-by: MOHIT AGRAWAL <moagrawa@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* changelog/rpc: Fix rpc_clnt_t mem leaksKotresh HR2016-07-242-6/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/13658 PROBLEM: 1. Freeing up rpc_clnt object might lead to crashes. Well, it was not a necessity to free rpc-clnt object till now because all the existing use cases needs to reconnect back on disconnects. Hence timer code was not taking ref on rpc-clnt object. Glusterd had some use-cases that led to crash due to ping-timer and they fixed only those code paths that involve ping-timer. Now, since changelog has an use-case where rpc-clnt need to be freed up, we need to fix timer code to take refs 2. In changelog, because of issue 1, only mydata was being freed which is incorrect. And there are races where rpc-clnt object would access the freed mydata which would lead to crashes. Since changelog xlator resides on brick side and is long living process, if multiple libgfchangelog consumers register to changelog and disconnect/reconnect mulitple times, it would result in leak of 'rpc-clnt' object for every connect/disconnect. SOLUTION: 1. Handle ref/unref of 'rpc_clnt' structure in timer functions properly. 2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT after disabling timers and free mydata on RPC_CLNT_DESTROY. RPC SETUP IN CHANGELOG: 1. changelog xlator initiates rpc server say 'changelog_rpc_server' 2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server' 3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server' 4. In return changelog_rpc_server initiates a rpc client and connects back to 'libgfchangelog_rpc_server' REF/UNREF HANDLING IN TIMER FUNCTIONS: Let's say rpc clnt refcount = 1 1. Take the ref before reigstering callback to timer queue >>>> rpc_clnt_ref (say ref count becomes = 2) 2. Register a callback to timer say 'callback1' 3. If register fails: >>>> rpc_clnt_unref (ref count = 1) 4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end in 'callback1'. This is corresponding to ref taken in step 1 >>>> rpc_clnt_unref (ref count = 1) 5. The cycle from step-1 to step-4 continues....until timer cancel event happens 6. timer cancel of say 'callback1' If timer cancel fails: Do nothing, Step-4 would have unrefd If timer cancel succeeds: >>>> rpc_clnt_unref (ref count = 1) Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09 BUG: 1359364 Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 637ce9e2e27e9f598a4a6c5a04cd339efaa62076) Reviewed-on: http://review.gluster.org/14994 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* rpc/socket: pthread resources are not cleaned upN Balachandran2016-07-221-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | A socket_connect failure creates a new pthread which is not a detached thread. As no pthread_join is called, the thread resources are not cleaned up causing a memory leak. Now, socket_connect creates a detached thread to handle failure. > Change-Id: Idbf25d312f91464ae20c97d501b628bfdec7cf0c > BUG: 1343374 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: http://review.gluster.org/14875 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> (cherry picked from commit 9886d568a7a8839bf3acc81cb1111fa372ac5270) Change-Id: I69ef46013c8dbc70cbda2695f12be1f6d3720055 BUG: 1354250 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/14979 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc/socket.c: Modify approach to cleanup threads of socket_poller in ↵Mohit Agrawal2016-07-211-141/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | socket_spawn. Problem: Current approach to cleanup threads of socket_poller is not appropriate. Solution: Enable detach flag at the time of thread creation in socket_spawn. Fix: Write a new wrapper(gf_create_detach_thread) to create detachable thread instead of store thread ids in a queue. Test: Fix is verfied on gluster process, To test the patch followed below procedure Enable the client.ssl and server.ssl option on the volume Start the volume and count anon segment in pmap output for glusterd process pmap -x <glusterd-pid> | grep "\[ anon \]" | wc -l Stop the volume and check again count of anon segment it should not increase. Backport of commit 2ee48474be32f6ead2f3834677fee89d88348382 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Change-Id: Ib8f7ec7504ec8f6f74b45ce6719b6fb47f9fdc37 > BUG: 1336508 > Reviewed-on: http://review.gluster.org/14694 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> BUG: 1354395 Change-Id: Ibdbbae508d9dda2fd36220a9b1e47f7776336929 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/14891 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* socket: log the client identifier in ssl connectRaghavendra Bhat2016-07-161-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | Backport of commit d308fb5e152d8c908bf4f5da81f553fbe3d0400a > Change-Id: I4b463ecafb66de16cbe7ed23fae800bb1204f829 > BUG: 1333912 > Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> > Reviewed-on: http://review.gluster.org/14242 > Tested-by: Vijay Bellur <vbellur@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > (cherry picked from commit d308fb5e152d8c908bf4f5da81f553fbe3d0400a) Change-Id: Id007d3e28292f504913b7df8b8eb693c0427b22b BUG: 1351878 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/14845 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* rpc: invalid argument when function setsockopt sets option TCP_USER_TIMEOUTNiels de Vos2016-07-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | If option "transport.tcp-user-timeout" hasn't been setted, glusterd's priv->timeout will be -1, which will cause invalid argument when set TCP_USER_TIMEOUT. Cherry picked from commit b2c73cbf423de6201f956f522b7429615c88869d: > Change-Id: Ibc16264ceac0e69ab4a217ffa27c549b9fa21df9 > BUG: 1349657 > Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> > Reviewed-on: http://review.gluster.org/14785 > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: Ibc16264ceac0e69ab4a217ffa27c549b9fa21df9 BUG: 1354405 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14888 Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* xdr/nfs: free complete groupnode structureBipin Kunal2016-06-131-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | The groupnode->gr_next pointer is not traversed upon free. This is currently not a problem, because the pointer is never used. However the correct way to free a groupnode should check the ->gr_next pointer and free any of the groups that it encounters. This problem was identified while correcting a problem with the MOUNT protocol. The change "nfs: build exportlist with multiple groups" starts to use ->gr_next. This is backport of below mainline fix - http://review.gluster.org/#/c/14666/ Change-Id: I9d04eaf4c65bdb8db136321d60e70789da1739d7 BUG: 1343287 Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Bipin Kunal <bkunal@redhat.com> Reviewed-on: http://review.gluster.org/14699 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: bipin kunal <kunalbipin@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* protocol: Add framework to send transaction id with recallPoornima G2016-06-102-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/14647/ Issue: The upcall(cache invalidation/recall) event is sent from the bricks to clients. In AFR/EC setup, it can so happen that all the bricks will send the upcall for the same event, and if AFR/EC doesn't filter out these duplicate notifications, the logic above cluster xlators can fail. Solution: Use transaction id to filter out duplicate notifications. This patch adds framework for duplicate notifications. AFR/EC can build up on this patch for deduping the notifications Change-Id: I66b08e63b8799bc5932f2b2545376138a5701168 BUG: 1337638 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/14648 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs (timer): race conditions, illegal mem access, mem leakKaleb S KEITHLEY2016-06-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While investigating gfapi memory consumption with valgrind, valgrind reported several memory access issues. Also see the timer 'registry' being recreated (shortly) after being freed during teardown due to the way it's currently written. Passing ctx as data to gf_timer_proc() is prone to memory access issues if ctx is freed before gf_timer_proc() terminates. (And in fact this does happen, at least in valgrind.) gf_timer_proc() doesn't need ctx for anything, it only needs ctx->timer, so just pass that. Nothing ever calls gf_timer_registry_init(). Nothing outside of timer.c that is. Making it and gf_timer_proc() static. backport mainline: > http://review.gluster.org/14247 > BUG: 1333925 Change-Id: Ia28454dda0cf0de2fec94d76441d98c3927a906a BUG: 1342620 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14644 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* core: Honour mandatory lock flags during lock migrationAnoop C S2016-05-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lk_flags from posix_lock_t structure is the primary key used to differentiate locks as either advisory and mandatory type. During lock migration this field is not read in getactivelk() call path. So in order to copy the exact lock state from source to destination it is necessary to include lk_flags within lock_migration_info_t structure to maintain accurate state. This change also includes minor modifications to setactivelk() call to consider lk_flags during lock migration. > Reviewed-on: http://review.gluster.org/14189 > Smoke: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Susant Palai <spalai@redhat.com> > Reviewed-by: Poornima G <pgurusid@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit deaf8439fc42435988aae6a7b9ab681cc0d36b09) Change-Id: I20a7b6b6a0f3bdac5734cce8a2cd2349eceff195 BUG: 1337805 Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/14457 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* rpc: change client insecure port ceiling from 65535 to 49151Prasanna Kumar Kalever2016-05-192-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | current port allocation to various processes (clumsy): 1023 - 1 -> client ports range if bind secure is turned on 49151 - 1024 -> fall back to this, if in above case ports exhaust 65535 - 1024 -> client port range if bind insecure is on 49152 - 65535 -> brick port range now, we have segregated port ranges 0 - 65535 to below 3 ranges 1023 - 1 -> client ports range if bind secure is turned on 49151 - 1024 -> client port range if bind insecure is on (fall back to this, if in above case ports exhaust) 49152 - 65535 -> brick port range so now we have a clean segregation of port mapping Backport of: > Change-Id: Ie3b4e7703e0bbeabbe0adbdd6c60d9ef78ef7c65 > BUG: 1335776 > Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> > Reviewed-on: http://review.gluster.org/14326 > Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> > Reviewed-by: Raghavendra Talur <rtalur@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ie3b4e7703e0bbeabbe0adbdd6c60d9ef78ef7c65 BUG: 1337127 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/14413 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* socket: Fix incorrect handling of partial readsXavier Hernandez2016-05-111-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | The usage of function local variables in the protocol state machine caused an incorrect behaviour when a partial read from the socket forced the function to return and restart later when more data was available. At this point the local variables contained incorrect data. > Change-Id: I4db1f4ef5c46a3d2d7f7c5328e906188c3af49e6 > BUG: 1334285 > Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> > Reviewed-on: http://review.gluster.org/14270 > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Tested-by: Raghavendra G <rgowdapp@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Change-Id: I92b7c91b4c0dfc15224aea39308c93b27028dd4f BUG: 1334287 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14293 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com>
* glusterd: add defence mechanism to avoid brick port clashesPrasanna Kumar Kalever2016-05-102-41/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intro: Currently glusterd maintain the portmap registry which contains ports that are free to use between 49152 - 65535, this registry is initialized once, and updated accordingly as an then when glusterd sees they are been used. Glusterd first checks for a port within the portmap registry and gets a FREE port marked in it, then checks if that port is currently free using a connect() function then passes it to brick process which have to bind on it. Problem: We see that there is a time gap between glusterd checking the port with connect() and brick process actually binding on it. In this time gap it could be so possible that any process would have occupied this port because of which brick will fail to bind and exit. Case 1: To avoid the gluster client process occupying the port supplied by glusterd : we have separated the client port map range with brick port map range more @ http://review.gluster.org/#/c/13998/ Case 2: (Handled by this patch) To avoid the other foreign process occupying the port supplied by glusterd : To handle above situation this patch implements a mechanism to return EADDRINUSE error code to glusterd, upon which a new port is allocated and try to restart the brick process with the newly allocated port. Note: Incase of glusterd restarts i.e. runner_run_nowait() there is no way to handle Case 2, becuase runner_run_nowait() will not wait to get the return/exit code of the executed command (brick process). Hence as of now in such case, we cannot know with what error the brick has failed to connect. This patch also fix the runner_end() to perform some cleanup w.r.t return values. Backport of: > Change-Id: Iec52e7f5d87ce938d173f8ef16aa77fd573f2c5e > BUG: 1322805 > Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> > Reviewed-on: http://review.gluster.org/14043 > Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Id7d8351a0082b44310177e714edc0571ad0f7195 BUG: 1333711 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/14235 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* rpc: define client port rangePrasanna Kumar Kalever2016-05-102-2/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: when bind-insecure is 'off', all the clients bind to secure ports, if incase all the secure ports exhaust the client will no more bind to secure ports and tries gets a random port which is obviously insecure. we have seen the client obtaining a port number in the range 49152-65535 which are actually reserved as part of glusterd's pmap_registry for bricks, hence this will lead to port clashes between client and brick processes. Solution: If we can define different port ranges for clients incase where secure ports exhaust, we can avoid the maximum port clashes with in gluster processes. Still we are prone to have clashes with other non-gluster processes, but the chances being very low, but that's a different story on its own, which will be handled in upcoming patches. Backportof: > Change-Id: Ib5ce05991aa1290ccb17f6f04ffd65caf411feaf > BUG: 1322805 > Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> > Reviewed-on: http://review.gluster.org/13998 > Smoke: Gluster Build System <jenkins@build.gluster.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I2ab9608ddbefcdf5987d817c23dd066010148e19 BUG: 1333711 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/14234 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* protocol: add setactivelk () fopSusant Palai2016-05-012-0/+13
| | | | | | | | | | | Change-Id: I60fe2d59c454095febce4c0fbef87a2dad9636e4 BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14013 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* core: add setactivelk () fopSusant Palai2016-05-011-0/+1
| | | | | | | | | | | Change-Id: Ic2ba77a1fdd27801a6e579e04e6c0dd93cd7127b BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14011 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* protocol: add getactivelk () fopSusant Palai2016-05-012-1/+20
| | | | | | | | | | | Change-Id: Ie38198db990f133fe163ba160cdf647e34f83f4f BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/13994 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* core: add getactivelk () fopSusant Palai2016-05-011-0/+1
| | | | | | | | | | | Change-Id: Ifd0ff278dcf43da064021f5c25e5dcd34347fcde BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/13970 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* protocol/client : Implementation of compound fopAnuradha Talur2016-04-301-0/+2
| | | | | | | | | | | Change-Id: Iade71daf3bc70e60833d693ac55151c9cf691381 BUG: 1303829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/14114 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* Protocol: Add lease fopPoornima G2016-04-293-0/+82
| | | | | | | | | | | | Change-Id: I64c361d3e4ae86d57dc18bb887758d044c861237 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/11597 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* rpc: added structures to support compound fopsAnuradha Talur2016-04-291-0/+119
| | | | | | | | | | | | | Change-Id: Ida81e7b3145fb09afa37353244ff8721a4dc4c6a BUG: 1303829 Signed-off-by: Anuradha Talur <atalur@redhat.com> [ndevos: move definitions around to align with changes for bug 1328502] Reviewed-on: http://review.gluster.org/13331 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* socket: Reap own-threadsKaushal M2016-04-292-0/+122
| | | | | | | | | | | | | | | | Dead own-threads are reaped periodically (currently every minute). This helps avoid memory being leaked, and should help prevent memory starvation issues with GlusterD. Change-Id: Ifb3442a91891b164655bb2aa72210b13cee31599 BUG: 1331289 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/14101 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Jeff Darcy <jdarcy@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* rpc: split FOPs enum from glusterfs.hNiels de Vos2016-04-288-1/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | Moving the enumeration of FOPs and some of the other parts that are defining the network protocol to the rpc/xdr/ section. These structures need some care when modifications are made, moving them out of the common glusterfs.h header helps with that. The protocol definition structures are generated in a new glusterfs-fops header. This file is present in rpc/xdr/src/ and libglusterfs/src/, it is a little ugly, but prevents the need to update all Makefile.am files with the additional -I option for finding the new header file. The generation of the .c and .h files from the .x descriptions needed small modifications to accommodate these changes. The build/xdrgen script was improved slightly for this. The .c and .h files are incorrectly in the $(top_srcdir), instead of $(top_builddir). This is an existing issue, and bug 1330604 has been filed to get that addressed. Change-Id: I98fc8cf7e4b631082c7b203b5a0a77111bec1fb9 BUG: 1328502 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14032 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: try to connect on GF_PMAP_PORT_FOREIGN aswellPrasanna Kumar Kalever2016-04-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | This patch fix couple of things mentioned below: 1. previously we use to try to connect on only GF_PMAP_PORT_FREE in the pmap_registry_alloc(), it could happen that some foreign process would have freed the port by this time ?, hence it is worth giving a try on GF_PMAP_PORT_FOREIGN ports as well instead of wasting them all. 2. fix pmap_registry_remove() to mark the port asGF_PMAP_PORT_FREE 3. added useful comments on gf_pmap_port_type enum members Change-Id: Id2aa7ad55e76ae3fdece21bed15792525ae33fe1 BUG: 1322805 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/14080 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* rpc: fix gf_process_reserved_portsPrasanna Kumar Kalever2016-04-272-15/+3
| | | | | | | | | | | | | | | this patch also does minor code cleanups. Change-Id: I0d005bd0f9baaaae498aa1df4faa6fcb65fa7a6e BUG: 1198849 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/13997 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* performance/decompounder: Introducing decompounder xlatorAnuradha Talur2016-04-251-0/+1
| | | | | | | | | | | | | | This xlator decompounds the compound fops received, and executes them serially. Change-Id: Ieddcec3c2983dd9ca7919ba9d7ecaa5192a5f489 BUG: 1303829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/13577 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* rpc: check the right variable after gf_strdupZhou Zhengping2016-04-141-1/+1
| | | | | | | | | | | | | Change-Id: If4628bd37f2c85a070f6c3b3e0583d939100d7ec BUG: 1325683 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: http://review.gluster.org/13934 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anoop C S <anoopcs@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* socket: Don't cleanup encrypted transport in socket_connect()Kaushal M2016-04-071-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ..instead cleanup only in socket_poller() With commit d117466 socket_poller() wasn't launched from socket_connect (for encrypted connections), if connect() failed. This was done to prevent the socket private data from being double unreffed, from the cleanups in both socket_poller() and socket_connect(). This allowed future reconnects to happen successfully. If a socket reconnects is sort of decided by the rpc notify function registered. The above change worked with glusterd, as the glusterd rpc notify function (glusterd_peer_rpc_notify()) continuously allowed reconnects on failure. mgmt_rpc_notify(), the rpc notify function in glusterfsd, behaves differently. For a DISCONNECT event, if more volfile servers are available or if more addresses are available in the dns cache, it allows reconnects. If not it terminates the program. For a CONNECT event, it attempts to do a volfile fetch rpc request. If sending this rpc fails, it immediately terminates the program. One side effect of commit d117466, was that the encrypted socket was registered with epoll, unintentionally, on a connect failure. A weird thing happens because of this. The epoll notifier notifies mgmt_rpc_notify() of a CONNECT event, instead of a DISCONNECT as expected. This causes mgmt_rpc_notify() to attempt an unsuccessful volfile fetch rpc request, and terminate. (I still don't know why the epoll raises the CONNECT event) Commit 46bd29e fixed some issues with IPv6 in GlusterFS. This caused address resolution in GlusterFS to also request of IPv6 addresses (AF_UNSPEC) instead of just IPv4. On most systems, this causes the IPv6 addresses to be returned first. GlusterD listens on 0.0.0.0:24007 by default. While this attaches to all interfaces, it only listens on IPv4 addresses. GlusterFS daemons and bricks are given 'localhost' as the volfile server. This resolves to '::1' as the first address. When using management encryption, the above reasons cause the daemon processes to fail to fetch volfiles and terminate. Solution -------- The solution to this is simple. Instead of cleaning up the encrypted socket in socket_connect(), launch socket_poller() and let it cleanup the socket instead. This prevents the unintentional registration with epoll, and socket_poller() sends the correct events to the rpc notify functions, which allows proper reconnects to happen. Change-Id: Idb0c0a828743cccca51cfdd1aa6458cfa0a9d100 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/13926 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Jeff Darcy <jdarcy@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* rpc: assign port only if it is unreservedPrasanna Kumar Kalever2016-04-062-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current order: assign port; check for port; if reserved { port--; continue to i; } bind(); basically, we are assigning port first then checking if it is reserved Fix: get unreserved port; assign port; bind(); from now, we get unreserved port first and then assign it Change-Id: I004580c5215e7c9cae3594af6405b20fcd9fa4ad BUG: 1323659 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/13900 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>