glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/afr: Keep child-up until ping-event	Pranith Kumar K	2018-04-25	3	-25/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If we have 2 bricks, brick-A and brick-B with brick-A within halo-max-latency and brick-B more than halo-max-latency. If we set both halo-min, halo-max replicas as '1'. In this case, brick-A comes online and then ping-latency will be updated for it. When brick-B comes online, we have 2 up-bricks, so the code tries to find the brick with worst latency to mark it down. Since Brick-B just came online it always had '0' latency so brick-B used to be marked offline and Brick-B would eventually be the one to be online even when brick-A is more suited. Fix: Consider latency of just-up child as HALO_MAX_LATENCY so that worst-child until ping-latency is found as the just-up brick. Also keep ping-latency as -1 until child-up during initialization. BUG: 1567881 fixes bz#1567881 Change-Id: I148262fe505468190f0eb99225d0f6d57cdb6f04 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	libglusterfs/syncop: Handle barrier_{init/destroy} in error cases	Pranith Kumar K	2018-04-23	2	-4/+27
\| \| \| \| \| \| \|	BUG: 1568521 updates: bz#1568521 Change-Id: I53e60cfcaa7f8edfa5eca47307fa99f10ee64505 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	features/shard: Add option to barrier parallel lookup and unlink of shards	Krutika Dhananjay	2018-04-23	2	-28/+89
\| \| \| \| \| \| \| \| \|	Also move the common parallel unlink callback for GF_FOP_TRUNCATE and GF_FOP_FTRUNCATE into a separate function. Change-Id: Ib0f90a5f62abdfa89cda7bef9f3ff99f349ec332 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	cluster/dht: Fix dht_rename lock order	N Balachandran	2018-04-23	1	-18/+47
\| \| \| \| \| \| \| \| \| \|	Fixed dht_order_rename_lock to use the same inodelk ordering as that of the dht selfheal locks (dictionary order of lock subvolumes). Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d fixes: bz#1568348 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	server/auth: add option for strict authentication	Mohammed Rafi KC	2018-04-20	6	-12/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When this option is enabled, we will check for a matching username and password, if not found then the connection will be rejected. This also does a checksum validation of volfile The option is invalid when SSL/TLS is in use, at which point the SSL/TLS certificate user name is used to validate and hence authorize the right user. This expects TLS allow rules to be setup correctly rather than the default *. This option is not settable, as a result this cannot be enabled for volumes using the CLI. This is used with the shared storage volume, to restrict access to the same in non-SSL/TLS environments to the gluster peers only. Tested: ./tests/bugs/protocol/bug-1321578.t ./tests/features/ssl-authz.t - Ran tests on volumes with and without strict auth checking (as brick vol file needed to be edited to test, or rather to enable the option) - Ran tests on volumes to ensure existing mounts are disconnected when we enable strict checking Change-Id: I2ac4f0cfa5b59cc789cc5a265358389b04556b59 fixes: bz#1568844 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	shared storage: Prevent mounting shared storage from non-trusted client	Mohammed Rafi KC	2018-04-20	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster shared storage is a volume used for internal storage for various features including ganesha, geo-rep, snapshot. So this volume should not be exposed to the client, as it is a special volume for internal use. This fix wont't generate non trusted volfile for shared storage volume. Change-Id: I8ffe30ae99ec05196d75466210b84db311611a4c fixes: bz#1568844 BUG: 1568844 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	server: fix unresolved symbols by moving them to libglusterfs	Mohit Agrawal	2018-04-20	5	-104/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: glusterd2 build is failed due to undefined symbol (xlator_mem_cleanup , glusterfsd_ctx) in server.so Solution: To resolve the same done below two changes 1) Move xlator_mem_cleanup code from glusterfsd-mgmt.c to xlator.c to be part of libglusterfs.so 2) replace glusterfsd_ctx to this->ctx because symbol glusterfsd_ctx is not part of server.so BUG: 1544090 Change-Id: Ie5e6fba9ed458931d08eb0948d450aa962424ae5 fixes: bz#1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	cluster/afr: Need heal-timeout to be configured as low as 5 seconds	Pranith Kumar K	2018-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In Halo replication, there are pending heals more often than not. It makes sense to give users the capability to configure it as low as 5 seconds. BUG: 1569489 fixes bz#1569489 Change-Id: I451c1975827f66398b903f659c981ef3121d5376 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	features/bitrot: show the corresponding brick for the corrupted objects	Raghavendra Bhat	2018-04-20	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \|	Currently with "gluster volume bitrot <volume name> scrub status" command the corrupted objects of a node are shown. But to what brick that corrupted object belongs to is not shown. Showing the brick of the corrupted object will help in situations where a node hosts multiple bricks of a volume. Change-Id: I7fbdea1e0072b9d3487eb10757468bc02d24df21 fixes: bz#1569198 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	eventsapi: Handle Unicode string during signing	Aravinda VK	2018-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Python 2.7 HMAC does not support Unicode strings. Secret is read from file so it is possible that glustereventsd reads the content as Unicode. This patch converts the secret to `str` type before generating HMAC signature. Fixes: bz#1568820 Change-Id: I7daa64499ac4ca02544405af26ac8af4b6b0bd95 Signed-off-by: Aravinda VK <avishwan@redhat.com>
*	Make glusterfsd binary print statedump & xlator dir	Prashanth Pai	2018-04-19	5	-7/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The glusterd2 needs following options, some of which are provided by gluster CLI today: --print-xlatordir --print-statedumpdir --print-logdir However, the CLI package need not be present on the machine running glusterd2. This change adds the above CLI options to glusterfsd binary which glusterd2 depends on. Reverts 9a1ae47c8d60836ae0628a04a153f28c1085c0e8 Related changes: https://review.gluster.org/#/c/19882/ https://github.com/gluster/glusterd2/pull/663 Updates: bz#1193929 Change-Id: I18c123b0d3350d2bd4f2400783e3b94e402a4e29 Signed-off-by: Prashanth Pai <ppai@redhat.com>
*	gluster: Sometimes Brick process is crashed at the time of stopping brick	Mohit Agrawal	2018-04-19	20	-112/+365
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Sometimes brick process is getting crashed at the time of stop brick while brick mux is enabled. Solution: Brick process was getting crashed because of rpc connection was not cleaning properly while brick mux is enabled.In this patch after sending GF_EVENT_CLEANUP notification to xlator(server) waits for all rpc client connection destroy for specific xlator.Once rpc connections are destroyed in server_rpc_notify for all associated client for that brick then call xlator_mem_cleanup for for brick xlator as well as all child xlators.To avoid races at the time of cleanup introduce two new flags at each xlator cleanup_starting, call_cleanup. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/#/c/19700/) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ic4ab9c128df282d146cf1135640281fcb31997bf updates: bz#1544090
*	glusterd: volume inode/fd status broken with brick mux	hari gowtham	2018-04-19	9	-87/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The values for inode/fd was populated from the ctx received from the server xlator. Without brickmux, every brick from a volume belonged to a single brick from the volume. So searching the server and populating it worked. With brickmux, a number of bricks can be confined to a single process. These bricks can be from different volumes too (if we use the max-bricks-per-process option). If they are from different volumes, using the server xlator to populate causes problem. Fix: Use the brick to validate and populate the inode/fd status. Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: I2543fa5397ea095f8338b518460037bba3dfdbfd fixes: bz#1566067
*	features/shard: Make operations on internal directories generic	Krutika Dhananjay	2018-04-18	2	-92/+206
\| \| \| \| \| \|	Change-Id: Iea7ad2102220c6d415909f8caef84167ce2d6818 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	fuse: do fd_resolve in fuse_getattr if fd is received	Susant Palai	2018-04-18	2	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	problem: With the current code, post graph switch the old fd is received for fuse_getattr and since it is associated with old inode, it does not have the inode ctx across xlators in new graph. Hence, dht errored out saying "no layout" for fstat call. Hence the EINVAL. Solution: if fd is passed, init and resolve fd to carry on getattr test case: - Created a single brick distributed volume - Started untar - Added a new-brick Without this fix, untar used to abort with ERROR. Change-Id: I5805c463fb9a04ba5c24829b768127097ff8b9f9 fixes: bz#1566207 Signed-off-by: Susant Palai <spalai@redhat.com>
*	glusterd: update listen-backlog value to 1024	Milind Changire	2018-04-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Update default value of listen-backlog to 1024 to reflect the changes in socket.c This keeps the actual implementation in socket.c and the help text in glusterd-volume-set.c consistent Change-Id: If04c9e0bb5afb55edcc7ca57bbc10922b85b7075 fixes: bz#1564600 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	cluster/afr: Make sure latency-arg is passed to afr	Pranith Kumar K	2018-04-18	4	-3/+6
\| \| \| \| \| \| \| \| \| \| \|	xlator_notify doesn't pass the extra arguments that come in the input function, so XLATOR_NOTIFY macro should be used instead to pass the extra arguments to the function. BUG: 1567881 fixes bz#1567881 Change-Id: Ic15b6c446638cbacf3149693147a754219037c47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	libglusterfs: fix comparison of a NULL dict with a non-NULL dict	Xavi Hernandez	2018-04-18	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	Function are_dicts_equal() had a bug when the first argument was NULL and the second one wasn't NULL. In this case it incorrectly returned that the dicts were different when they could be equal. Fixes: bz#1566732 Change-Id: I0fc245c2e7d1395865a76405dbd05e5d34db3273 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	Add CLI option to print XLATORDIR	Prashanth Pai	2018-04-18	2	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glusterfs gets the path to xlator dir from a compile time flag named XLATORDIR which gets passed through a -D flag to GCC. This path is used to find and load xlator shared objects. The XLATORDIR path isn't easily accessible to glusterd2. Glusterd2 currently uses the following command (hack) to get value of XLATORDIR: $ strings -d `which glusterfsd` \| awk '/glusterfs/*/xlator$/' This change introduces "print-xlatordir" CLI option to expose XLATORDIR. The option is intentionally not documented. Updates: bz#1193929 Change-Id: Ic7247457600f11cd8d68eb3d0ad2526fdfda0b02 Signed-off-by: Prashanth Pai <ppai@redhat.com>
*	afr: fixes to afr-eager locking	Ravishankar N	2018-04-18	2	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|	1. If pre-op fails on all bricks,set lock->release to true in afr_handle_lock_acquire_failure so that the GF_ASSERT in afr_unlock() does not crash. 2. Added a missing 'return' after handling pre-op failure in afr_transaction_perform_fop(), fixing a use-after-free issue. Change-Id: If0627a9124cb5d6405037cab3f17f8325eed2d83 fixes: bz#1561129 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	Revert "storage/posix: add pgfid in readdirp if needed"	Nigel Babu	2018-04-18	1	-38/+8
\| \| \| \| \| \| \| \|	This reverts commit d206fab73f6815c927a84171ee9361c9b31557b1. Change-Id: I5b43fdcf916bc844437c9d60f6957bc40936e3c2 Updates: bz#1560319 Signed-off-by: Nigel Babu <nigelb@redhat.com>
*	build: exclude '--with-previous-options' to prevent infinite loop	Xie Changlong	2018-04-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Reproducible Steps: 1. cd glusterfs/; rm -rf *; git reset --hard #clean repo 2. cd extras/LinuxRPM/; ./make_glusterrpms #it's ok here 3. ./make_glusterrpms #infinite loop 4. cd ../../; make distclean #infinite loop Change-Id: I162953d4576cedea7c6f6c631a77163a5cca023e updates: #439 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
*	maintainers: promote Deepshikha to maintainer	Nigel Babu	2018-04-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Deepshikha has been doing excellent work across the CI system. She is now ready to co-maintain the Continuous Integration module and be responsible for the CI ecosystem in its entirety. Fixes: bz#1567880 Change-Id: If204301d26731f93b2dccfe8b6571ee748a47b26 Signed-off-by: Nigel Babu <nigelb@redhat.com>
*	fuse: retire statvfs tweak	Csaba Henk	2018-04-16	1	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fuse xlator used to override the filesystem block size of the storage backend to indicate its preferences. Now we retire this tweak and pass on what we get from the backend. This fixes the anomaly reported in the referred BUG. For more background, see the following email, which was sent out to gluster-devel and gluster-users mailing lists to gauge if anyone sees any use of this tweak: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054660.html http://lists.gluster.org/pipermail/gluster-users/2018-March/033775.html Noone vetoed the removal of it but it got endorsement: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054686.html BUG: 1523219 Change-Id: I3b7111d3037a1b91a288c1589f407b2c48d81bfa Signed-off-by: Csaba Henk <csaba@redhat.com>
*	geo-rep: Fix syncing of symlink	Kotresh HR	2018-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If symlink is created on master pointing to current directory (e.g symlink -> ".") with non root uid or gid, geo-rep worker crashes with ENOTSUP. Cause: Geo-rep creates the symlink on slave and fixes the uid and gid using chown cmd. os.chown dereferences the symlink which is pointing to ".gfid" which is not supported. Note that geo-rep operates on aux-gfid-mount (e.g. "/mnt/.gfid/<gfid-of-symlink-file>"). Solution: The uid or gid change is acutally on symlink file. So use os.lchown, i.e, don't deference. BUG: 1567209 Change-Id: I63575fc589d71f987bef1d350c030987738c78ad updates: bz#1567209 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	extras: Disable choose-local in groups virt and gluster-block	Krutika Dhananjay	2018-04-13	2	-0/+2
\| \| \| \| \| \|	Change-Id: Icba68406d86623195d59d6ee668e0850c037c63a fixes: bz#1566386 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	rpc: set listen-backlog to high value	Milind Changire	2018-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: On node reboot, when glusterd starts volumes rapidly, there's a flood of connections from the bricks to glusterd and from the self-heal daemons to the bricks. This causes SYN Flooding and dropped connections when the listen-backlog is not enough to hold the pending connections to compensate for the rate at which connections are accepted by the RPC layer. Solution: Increase the listen-backlog value to 1024. This is a partial solution. Part of the solution is to rearm the listener socket early for quicker accept() of connections. See commit 6964640a977cb10c0c95a94e03c229918fa6eca8 (change 19833) Change-Id: I62283d1f4990dd43839f9a6932cf8a36effd632c fixes: bz#1564600 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	cluster/dht: Handle file migrations when brick down	N Balachandran	2018-04-13	1	-5/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	core/build/various: python3 compat, prepare for python2 -> python3	Kaleb S. KEITHLEY	2018-04-12	59	-102/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note 1) we're not supposed to be using #!/usr/bin/env python, see https://fedoraproject.org/wiki/Packaging:Guidelines?rd=Packaging/Guidelines#Shebang_lines Note 2) we're also not supposed to be using "!/usr/bin/python, see https://fedoraproject.org/wiki/Changes/Avoid_usr_bin_python_in_RPM_Build#Quick_Opt-Out The previous patch (https://review.gluster.org/19767) tried to do too much in one patch, so it was abandoned. This patch does two things: 1) minor cleanup of configure(.ac) to explicitly use python2 2) change all the shebang lines to #!/usr/bin/python2 and add them where they were missing based on warnings emitted during rpmbuild. In a follow-up patch python2 will eventually be changed to python3. Before that python2-isms (e.g. print, string.join(), etc.) need to be converted to python3. Some of those can be rewritten in version agnostic python. E.g. print statements become print() with "from __future_ import print_function". The python 2to3 utility will be used for some of those. Also Aravinda has given guidance in the comments to the first patch for changes. updates: #411 Change-Id: I471730962b2526022115a1fc33629fb078b74338 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	cluster/dht: Wind open to all subvols	N Balachandran	2018-04-11	1	-10/+5
\| \| \| \| \| \| \| \| \| \|	dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	xlators/performance: Add pass-through option	Varsha Rao	2018-04-11	9	-10/+139
\| \| \| \| \| \| \| \| \| \|	Add pass-through option in performance traslators. Set the option in GF_OPTION_INIT() and GF_OPTION_RECONF() Updates: #304 Change-Id: If1537450147d154905831e36f7162a32866d7ad6 Signed-off-by: Varsha Rao <varao@redhat.com>
*	posix: reserve option behavior is not correct while using fallocate	Mohit Agrawal	2018-04-11	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: storage.reserve option is not working correctly while disk space is allocate throguh fallocate Solution: In posix_disk_space_check_thread_proc after every 5 sec interval it calls posix_disk_space_check to monitor disk space and set the flag in posix priv.In 5 sec timestamp user can create big file with fallocate that can reach posix reserve limit and no error is shown on terminal even limit has reached. To resolve the same call posix_disk_space for every fallocate fop instead to call by a thread after 5 second BUG: 1560411 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I39ba9390e2e6d084eedbf3bcf45cd6d708591577
*	storage/posix: add pgfid in readdirp if needed	Kinglong Mee	2018-04-10	1	-8/+38
\| \| \| \| \| \|	Change-Id: I6745428fd9d4e402bf2cad52cee8ab46b7fd822f fixes: bz#1560319 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
*	posix: check file state before continuing with fops	Susant Palai	2018-04-10	5	-16/+756
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In context of Cloudsync: In scenarios where a data modification fop e.g. a write landed in POSIX thinking that the file is local, while the file was actually remote, can be dangerous. Ofcourse we don’t want to take inodelk for every read/write operation to check the archival status or coordinate with an upload or a download of a file. To avoid inodelk, we will check the status of the file in POSIX it self, before we resume the fop. This helps us avoiding any races mentioned above. Now e.g. if a write reached POSIX for a file which was actually remote, it can check the status of the file and will get to know that the file is remote. It can error out with this status “remote” and cloudsync xlator will retry the same operation, once it finished downloading the file. This patch includes the setxattr changes to do the post processing of upload i.e. truncate and setting the remote xattr "trusted.glusterfs.cs.remote" to indicate the file is REMOTE Each file will have no xattr if the file is LOCAL, one remote xattr if the file is REMOTE and a combination of REMOTE and DOWNLOADING xattr if the file is getting downloaded. There is healing logic of these xattrs to recover from crash inconsitencies. Fixes: #387 Change-Id: Ie93c2d41aa8d6a798a39bdbef9d1669f057e5fdb Signed-off-by: Susant Palai <spalai@redhat.com>
*	cluster/dht: act as passthrough for renames on single child DHT	Raghavendra G	2018-04-10	1	-7/+15
\| \| \| \| \| \| \| \| \| \|	Various synchronization present in dht_rename while handling directories and files is necessary only if we have more than only one child. Change-Id: Ie21ad419125504ca2f391b1ae2e5c1d166fee247 fixes: bz#1563511 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	experimental/cloudsync: Download xlator for archival feature	Susant Palai	2018-04-10	21	-4/+2468
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spec-files: https://review.gluster.org/#/c/18854/ Overview: * Cloudsync maintains three file states in it's inode-ctx i.e 1 - LOCAL, 2 - REMOTE, 3 - DOWNLOADING. * A data modifying fop is allowed only if the state is LOCAL. If the state is REMOTE or DOWNLOADING, client will download or wait for the download to finish initiated by other client. * Multiple download and upload from different clients are synchronized by inodelk. * In POSIX a state check is done (part of different commit)before allowing the fop to continue. If the state is remote/downloading the fop is unwound with EREMOTE. The client will then download the file and continue with the fop again. * Basic Algo for fop (let's say write fop): - If LOCAL -> resume fop - If REMOTE -> - INODELK - STAT (this gets state and heal the state if needed) - DOWNLOAD - resume fop Note: * Developers will need to write plugins for download, based on the remote store they choose. In phase-1, support will be added for one remote store per volume. In future, more options for multiple remote stores will be explored. TODOs: - Implement stat/lookup/readdirp to return size info from xattr - Make plugins configurable - Implement unlink fop - Add metrics collection - Add sharding support Design Contributions: Aravinda V K <avishwan@redhat.com> Amar Tumballi <amarts@redhat.com> Ram Ankireddypalle <areddy@commvault.com> Susant Palai <spalai@redhat.com> updates: #387 Change-Id: Iddf711ee7ab4e946ae3e472ff62791a7b85e6d4b Signed-off-by: Susant Palai <spalai@redhat.com>
*	quota: allow writes when with EINVAL on pgfid isnot exist	Kinglong Mee	2018-04-09	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NFS client gets "Invalid argument" when writing file through nfs-ganesha. 1. With quota disabled; nfs client mount nfs-ganesha share, and do 'll' in the testing directory. 2. Enable quota; getfattr: Removing leading '/' from absolute path names trusted.gfid=0xe2edaac0eca8420ebbbcba7e56bbd240 trusted.gfid2path.b3250af8fa558e66=0x39663134343566662d653530332d343831352d396635312d3236633565366332633137642f7465737466696c653932 trusted.glusterfs.quota.9f1445ff-e503-4815-9f51-26c5e6c2c17d.contri.3=0x00000000000002000000000000000001 Notice: testfile92 without trusted.pgfid xattr. 3. restart glusterfs volume by "gluster volume stop/start gvtest" 4. echo somedata > testfile92 5. ll testfile92 -rw-r--r-- 1 root root 0 Mar 6 21:43 testfile92 BUG: 1560319 Change-Id: Iaa4dd1e891c99069fb85b7b11bb0482cbf2303b1 fixes: bz#1560319 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
*	rpc: rearm listener socket early	Milind Changire	2018-04-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: On node reboot, when glusterd starts volumes, a setup with a large number of bricks might cause SYN Flooding and connections to be dropped if the connections are not accepted quickly enough. Solution: accept() the connection and rearm the listener socket early to receive more connection requests as soon as possible. Change-Id: Ibed421e50284c3f7a8fcdb4de7ac86cf53d4b74e fixes: bz#1564600 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	features/index: Choose different base file on EMLINK error	Pranith Kumar K	2018-04-06	2	-18/+61
\| \| \| \| \| \| \|	Change-Id: I4648816af908539efdc2528608aa2ebf7f0d0e2f fixes: bz#1559004 BUG: 1559004 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	doc: Update the admin guide link	Varsha Rao	2018-04-06	1	-1/+1
\| \| \| \| \| \| \| \|	Update the existing admin guide link as it is incorrect. Change-Id: I05669192623aeac287dfa9002caa0f390ea79499 Updates: bz#1193929 Signed-off-by: Varsha Rao <varao@redhat.com>
*	cluster/ec: Turn ON the stripe-cache option by default	Ashish Pandey	2018-04-06	1	-1/+1
\| \| \| \| \| \|	Change-Id: I0a290396c30c635b13ee73004d20259efb76a954 fixes: bz#1563945 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	gfapi: fix a couple of minor issues	Kaleb S. KEITHLEY	2018-04-05	3	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	duplicatation of exported functions in gfapi.map. Only the newest one is needed. Both the legacy and current symbols are exported. glfs_io_cbk34 typedef should not be in a public header file. The old application was compiled with the original glfs_io_cbk. Outside of libgfapi, nothing now uses/needs this old typedef, move it into the C file that needs it. Similarly glfs_realpath34() decl should not be in glfs.h. Period. Old applications were compiled with the then glfs_realpath() decl and linked with glfs_realpath@@GFAPI_3_4.0. New applications should only call glfs_realpath() and it will be linked to the new/current glfs_realpath(). Change-Id: Icd5b0c9e9b68f0c133f14447b09ace35f33dbab2 fixes: bz#1564235 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	glusterd: show brick online after port registration	Atin Mukherjee	2018-04-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster-block project needs a dependency check to see if all the bricks are online before bringing up the relevant gluster-block services. While the patch https://review.gluster.org/#/c/19785/ attempts to write the script but brick should be only marked as online only when the pmap_signin is completed. While this is perfectly fine for non brick multiplexing, but with brick multiplexing this patch still doesn't eliminate the race completely as the attach_req call is asynchrnous and glusterd immediately marks the port as registerd. Change-Id: I81db54b88f7315e1b24e0234beebe00de6429f9d Fixes: bz#1563273 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	afr: add quorum checks in pre-op	Ravishankar N	2018-04-05	1	-33/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: We seem to be winding the FOP if pre-op did not succeed on quorum bricks and then failing the FOP with EROFS since the fop did not meet quorum. This essentially masks the actual error due to which pre-op failed. (See BZ). Fix: Skip FOP phase if pre-op quorum is not met and go to post-op. Fixes: 1561129 Change-Id: Ie58a41e8fa1ad79aa06093706e96db8eef61b6d9 fixes: bz#1561129 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd: mark port_registered to true for all running bricks with brick mux	Atin Mukherjee	2018-04-05	3	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glusterd maintains a boolean flag 'port_registered' which is used to determine if a brick has completed its portmap sign in process. This flag is (re)set in pmap_sigin and pmap_signout events. In case of brick multiplexing this flag is the identifier to determine if the very first brick with which the process is spawned up has completed its sign in process. However in case of glusterd restart when a brick is already identified as running, glusterd does a pmap_registry_bind to ensure its portmap table is updated but this flag isn't which is fine in case of non brick multiplex case but causes an issue if the very first brick which came as part of process is replaced and then the subsequent brick attach will fail. One of the way to validate this is to create and start a volume, remove the first brick and then add-brick a new one. Add-brick operation will take a very long time and post that the volume status will show all other brick status apart from the new brick as down. Solution is to set brickinfo->port_registered to true for all the running bricks when brick multiplexing is enabled. Change-Id: Ib0662d99d0fa66b1538947fd96b43f1cbc04e4ff Fixes: bz#1560957 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	features/changelog: Update option levels	Aravinda VK	2018-04-05	1	-0/+7
\| \| \| \| \| \| \| \|	Options levels for Changelog Xlator Change-Id: Idd246717e38096c44258a990a0939f82e5fc9654 Updates: #430 Signed-off-by: Aravinda VK <avishwan@redhat.com>
*	cluster/dht: enable lookup-optimize by default	N Balachandran	2018-04-04	3	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lookup-optimize has been shown to improve create performance. The code has been in the project for several years and is considered stable. Enabling this by default in order to test this in the upstream regression runs. Change-Id: Iab792979ee34f0af4713931e0b5b399c23f65313 updates: bz#1557435 BUG: 1557435 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	glusterd: fix txn_opinfo memory leak	Atin Mukherjee	2018-04-04	3	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	For transactions where there's no volname involved (eg : gluster v status), the originator node initiates with staging phase and what that means in op-sm there's no unlock event triggered which resulted into a txn_opinfo dictionary leak. Credits : cynthia.zhou@nokia-sbell.com Change-Id: I92fffbc2e8e1b010f489060f461be78aa2b86615 Fixes: bz#1550339 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: honour localtime-logging for all the daemons	Atin Mukherjee	2018-04-03	5	-0/+30
\| \| \| \| \| \|	Change-Id: I97a70d29365b0a454241ac5f5cae56d93eefd73a Fixes: bz#1563334 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	cluster/afr: Prevent ping-event handling on shd	Pranith Kumar K	2018-04-03	1	-0/+2
\| \| \| \| \| \| \| \| \|	On shd, we shouldn't treat any brick down based on latency, otherwise self-heal will never happen fixes: bz#1562717 Change-Id: Ica07fcc4fae91a6bfd9c9a670e2be464704d94b7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>