glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	cluster/ec: Change handling of heal failure to avoid crash	Ashish Pandey	2019-11-04	2	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ec_getxattr_heal_cbk was called with NULL as second argument in case heal was failing. This function was dereferencing "cookie" argument which caused crash. Solution: Cookie is changed to carry the value that was supposed to be stored in fop->data, so even in the case when fop is NULL in error case, there won't be any NULL dereference. Thanks to Xavi for the suggestion about the fix. Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c fixes: bz#1729085
*	afr: lock healing changes	Ravishankar N	2019-10-30	7	-36/+849
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implements lock healing for gluster-block fencing use case. If mandatory lock is enabled: - Add domain lock/unlock to afr_lk fop. - Maintain a list of locks to be healed in afr_private_t. - Add lock to the list if afr_lk(F_SETLK or F_SETLKW) was sucessful. - Remove it from the list during afr_lk(F_UNLCK). - On child_down, mark lock as needing heal on that child. If lock is lost on quorum no. of bricks, remove it from the list and mark fd bad. - For fds marked as bad, fail the subsequent fd based fops. - On parent up, traverse the list and heal the locks IFF the client is the lk owner and has quorum. (shd does not heal any locks). updates: #613 Change-Id: I03c46ceaea30f5e6236d5ec13f71d843d827f1bc Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	cluster/afr: Take a copy of xattr-req	Pranith Kumar K	2019-10-30	1	-2/+7
\| \| \| \| \| \| \| \| \| \|	Afr adds its own xattrs to the req, so it should take a copy of the dictionary to prevent parent xlator re-using the modified xattr-req to another subvolume fixes: bz#1765155 Change-Id: I268e2dbd1b12323135d369e90a22a8bdde2cf7c2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	dht: Rebalance causing IO Error - File descriptor in bad state	Mohit Agrawal	2019-10-15	5	-17/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : When a file is migrated, dht attempts to re-open all open fds on the new cached subvol. Earlier, if dht had not opened the fd, the client xlator would be unable to find the remote fd and would fall back to using an anon fd for the fop. That behavior changed with https://review.gluster.org/#/c/glusterfs/+/15804, causing fops to fail with EBADFD if the fd was not available on the cached subvol. The client xlator returns EBADFD if the remote fd is not found but dht only checks for EBADF before re-opening fds on the new cached subvol. Solution: Handle EBADFD at dht code path to avoid the issue Change-Id: I43c51995cdd48d05b12e4b2889c8dbe2bb2a72d8 Fixes: bz#1758579
*	cluster/afr: Add afr_seek to fops table	Pranith Kumar K	2019-10-14	2	-0/+4
\| \| \| \| \| \|	fixes: bz#1760189 Change-Id: Iffbf8d6f4c50b8e2de8364658697bdbe96549f5d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	Multiple files: make root gfid a static variable	Yaniv Kaul	2019-10-14	4	-15/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	In many places we use it, compare to it, etc. It could be a static variable, as it really doesn't change. I think it's better than initializing to 0 and then doing gfid[15] = 1 or other tricks. I think there are additional oppportunuties to make more variables static. This is an attempt at an easy one. Change-Id: I7f23a30a94056d8f043645371ab841cbd0f90d19 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	afr: align structs	Yaniv Kaul	2019-10-14	2	-138/+138
\| \| \| \| \| \| \| \| \| \|	squash >50 warnings on padding of structs in afr structures. The warnings were found by manually added '-Wpadded' to the GCC command line. Change-Id: I961fbdeb33715cedf3dd10db8e4f8ef40cd3e867 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	cluster/afr: Heal entries when there is a source & no healed_sinks	karthik-us	2019-10-09	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame anything for entry heal, heal will not complete even though we have clear source and sinks. This will happen because while doing afr_selfheal_find_direction() only the bricks which are blamed by non-accused bricks are considered as sinks. Later in __afr_selfheal_entry_finalize_source() when it tries to mark all the non-sources as sinks it fails to do so because there won't be any healed_sinks marked, no witness present and there will be a source. Fix: If there is a source and no healed_sinks, then reset all the locked sources to 0 and healed sinks to 1 to do conservative merge. Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548 Fixes: bz#1749322 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	afr: support split-brain CLI for replica 3	Ravishankar N	2019-10-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ever since we added quorum checks for lookups in afr via commit bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution commands would not work for replica 3 because there would be no readables for the lookup fop. The argument was that split-brains do not occur in replica 3 but we do see (data/metadata) split-brain cases once in a while which indicate that there are a few bugs/corner cases yet to be discovered and fixed. Fortunately, commit 8016d51a3bbd410b0b927ed66be50a09574b7982 added GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD, split-brain resolution commands will work for replica 3 volumes too. Likewise, the check is added in shard_lookup as well to permit resolving split-brains by specifying "/.shard/shard-file.xx" as the file name (which previously used to fail with EPERM). Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9 Fixes: bz#1756938 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: replace afr_frame_return() when possible with direct call	Yaniv Kaul	2019-10-07	5	-15/+9
\| \| \| \| \| \| \| \| \| \| \| \|	If you are already under lock, just decrement the call count directly instead of removing the lock, re-taking the lock and decrementing. Implements https://github.com/gluster/glusterfs/issues/728 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I3fa20b4651fbdb826655c5a03baeed46e99b5487
*	cluster/dht: Correct fd processing loop	N Balachandran	2019-10-02	1	-22/+62
\| \| \| \| \| \| \| \| \| \| \| \|	The fd processing loops in the dht_migration_complete_check_task and the dht_rebalance_inprogress_task functions were unsafe and could cause an open to be sent on an already freed fd. This has been fixed. Change-Id: I0a3c7d2fba314089e03dfd704f9dceb134749540 Fixes: bz#1757399 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	cluster/ec: Implement read-mask feature	Pranith Kumar K	2019-09-27	3	-0/+82
\| \| \| \| \| \|	fixes: #725 Change-Id: Iaaefe6f49c8193c476b987b92df6bab3e2f62601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: prevent filling shd log with "table not found" messages	Xavi Hernandez	2019-09-26	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When self-heal daemon receives an inodelk contention notification, it tries to locate the related inode using inode_find() and the inode table owned by top-most xlator, which in this case doesn't have any inode table. This causes many messages to be logged by inode_find() function because the inode table passed is NULL. This patch prevents this by making sure the inode table is not NULL before calling inode_find(). Change-Id: I8d001bd180aaaf1521ba40a536b097fcf70c991f Fixes: bz#1755344 Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
*	afr-common.c, afr-self-heal.h: calloc/alloca0 -> malloc/alloca	Yaniv Kaul	2019-09-20	2	-5/+4
\| \| \| \| \| \| \| \| \| \|	In 3 cases, there was a memory allocation and zeroing, followed directly by populating it with content. Replaced with memory allocation that did not zero the memory. Change-Id: I4fbb5c924fb3a144e415d2368126b784dde760ea updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	cluster/dht: Handle file truncates during migration	N Balachandran	2019-09-17	1	-26/+34
\| \| \| \| \| \| \| \| \|	File truncate operations during a migration were not handled properly. This has been fixed. Change-Id: Ic642d257e893641236a4a21ab69fcc7a569dd70a Fixes: bz#1745967 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	ctime/rebalance: Heal ctime xattr on directory during rebalance	Kotresh HR	2019-09-16	3	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After add-brick and rebalance, the ctime xattr is not present on rebalanced directories on new brick. This patch fixes the same. Note that ctime still doesn't support consistent time across distribute sub-volume. This patch also fixes the in-memory inconsistency of time attributes when metadata is self healed. Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df fixes: bz#1734026 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	cluster/ec: Mark release only when it is acquired	Pranith Kumar K	2019-09-12	2	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Mount-1 Mount-2 1)Tries to acquire lock on 'dir1' 1)Tries to acquire lock on 'dir1' 2)Lock is granted on brick-0 2)Lock gets EAGAIN on brick-0 and leads to blocking lock on brick-0 3)Gets a lock-contention 3) Doesn't matter what happens on mount-2 notification, marks lock->release from here on. to true. 4)New fop comes on 'dir1' which will be put in frozen list as lock->release is set to true. 5) Lock acquisition from step-2 fails because 3 bricks went down in 4+2 setup. Fop on mount-1 which is put in frozen list will hang because no codepath will move it from frozen list to any other list and the lock will not be retried. Fix: Don't set lock->release to true if lock is not acquired at the time of lock-contention-notification fixes: bz#1743573 Change-Id: Ie6630db8735ccf372cc54b873a3a3aed7a6082b7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: quorum-count implementation	Pranith Kumar K	2019-09-08	6	-59/+110
\| \| \| \| \| \|	fixes: #721 Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: Fix coverity issues	Pranith Kumar K	2019-09-07	1	-12/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed the following coverity issue in both flush/fsync >>> CID 1404964: Null pointer dereferences (REVERSE_INULL) >>> Null-checking "fd" suggests that it may be null, but it has already been dereferenced on all paths leading to the check. >>> if (fd != NULL) { >>> fop->fd = fd_ref(fd); >>> if (fop->fd == NULL) { >>> gf_msg(this->name, GF_LOG_ERROR, 0, >>> "Failed to reference a " >>> "file descriptor."); fixes bz#1748836 Change-Id: I19c05d585e23f8fbfbc195d1f3775ec528eed671 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: Fail fsync/flush for files on update size/version failure	Pranith Kumar K	2019-09-06	5	-1/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If update size/version is not successful on the file, updates on the same stripe could lead to data corruptions if the earlier un-aligned write is not successful on all the bricks. Application won't have any knowledge of this because update size/version happens in the background. Fix: Fail fsync/flush on fds that are opened before update-size-version went bad. fixes: bz#1748836 Change-Id: I9d323eddcda703bd27d55f340c4079d76e06e492 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	afr/lookup: Pass xattr_req in while doing a selfheal in lookup	Mohammed Rafi KC	2019-09-05	3	-5/+16
\| \| \| \| \| \| \| \| \| \|	We were not passing xattr_req when doing a name self heal as well as a meta data heal. Because of this, some xdata was missing which causes i/o errors Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f Fixes: bz#1728770 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	afr: wake up index healer threads	Ravishankar N	2019-08-30	5	-11/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	...whenever shd is re-enabled after disabling or there is a change in `cluster.heal-timeout`, without needing to restart shd or waiting for the current `cluster.heal-timeout` seconds to expire. See BZ 1743988 for more details. Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe fixes: bz#1744548 Reported-by: Glen Kiessling <glenk1973@hotmail.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	cluster/afr - Unused variables	Barak Sason	2019-08-24	2	-6/+9
\| \| \| \| \| \| \| \| \| \| \|	-Minor change to if-else structure to avoid code duplication. -Added logging in case method calls fails CID: 1394654 Updates: bz#789278 Change-Id: Ibef4450dc89ddd3bf951303d5b87f503924fd250 Signed-off-by: Barak Sason <bsasonro@redhat.com>
*	afr: restore timestamp of parent dir during entry-heal	Ravishankar N	2019-08-14	1	-0/+2
\| \| \| \| \| \|	Fixes: bz#1734370 Change-Id: I29e338bac62104233a6f80212df8d0fb016affda Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	cluster/ec: Fix coverity issue.	Ashish Pandey	2019-08-13	1	-1/+1
\| \| \| \| \| \|	Change-Id: I727287784a15d89441865de7f438002e4a370250 fixes: bz#1738763 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/ec: Update lock->good_mask on parent fop failure	Pranith Kumar K	2019-08-07	2	-0/+4
\| \| \| \| \| \| \| \| \| \|	When discard/truncate performs write fop, it should do so after updating lock->good_mask to make sure readv happens on the correct mask fixes bz#1727081 Change-Id: Idfef0bbcca8860d53707094722e6ba3f81c583b7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/dht: Log hashes in hex	N Balachandran	2019-08-06	4	-15/+13
\| \| \| \| \| \| \| \| \|	Log layout hash ranges in hex to make it easier to compare them to the on disk xattrs. Change-Id: Ib75c2508bf8e0ab7f5ae26d0443ef02b792b7307 Fixes: bz#1697293 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	Multiple files: get trivial stuff done before lock	Yaniv Kaul	2019-08-01	2	-7/+5
\| \| \| \| \| \| \| \| \|	Initialize a dictionary for example seems to be prefectly fine to be done before taking a lock. Change-Id: Ib29516c4efa8f0e2b526d512beab488fcd16d2e7 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	cluster/ec: Create heal task with heal process id	Ashish Pandey	2019-07-30	1	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ec_data_undo_pending calls syncop_fxattrop->SYNCOP without a frame. In this case SYNCOP gets the frame of the task. However, when we create a synctask for heal we provide frame as NULL. Now, if the read-only feature is ON, it will receive the process ID of the shd as 0 and will consider that it as not an internal process. This will prevent healing of a file with "Read-only file system" error message log. Solution: While launching heal, create a synctask using frame and set process id of the SHD which is -6. Change-Id: I37195399c85de322cbcac75633888922c4e3db4a Fixes: bz#1734252
*	cluster/ec: Fix reopen flags to avoid misbehavior	Pranith Kumar K	2019-07-30	2	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: when a file needs to be re-opened O_APPEND and O_EXCL flags are not filtered in EC. - O_APPEND should be filtered because EC doesn't send O_APPEND below EC for open to make sure writes happen on the individual fragments instead of at the end of the file. - O_EXCL should be filtered because shd could have created the file so even when file exists open should succeed - O_CREAT should be filtered because open happens with gfid as parameter. So open fop will create just the gfid which will lead to problems. Fix: Filter out these two flags in reopen. Change-Id: Ia280470fcb5188a09caa07bf665a2a94bce23bc4 Fixes: bz#1733935 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: Always read from good-mask	Pranith Kumar K	2019-07-26	2	-5/+25
\| \| \| \| \| \| \| \| \| \|	There are cases where fop->mask may have fop->healing added and readv shouldn't be wound on fop->healing. To avoid this always wind readv to lock->good_mask fixes bz#1727081 Change-Id: I2226ef0229daf5ff315d51e868b980ee48060b87 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: fix EIO error for concurrent writes on sparse files	Xavi Hernandez	2019-07-24	1	-9/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1730715 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	(multiple files) use dict_allocate_and_serialize() where applicable.	Yaniv Kaul	2019-07-22	2	-41/+9
\| \| \| \| \| \| \| \|	This function does length, allocation and serialization for you. Change-Id: I142a259952a2fe83dd719442afaefe4a43a8e55e updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	dht: log getxattr failure for node-uuid at "DEBUG"	Susant Palai	2019-07-18	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two ways to fetch node-uuid information from dht. 1 - #define GF_XATTR_LIST_NODE_UUIDS_KEY "trusted.glusterfs.list-node-uuids" This key is used by AFR. 2 - #define GF_REBAL_FIND_LOCAL_SUBVOL "glusterfs.find-local-subvol" This key is used for non-afr volume type. We do two getxattr operations. First on the #1 key followed by on #2 if getxattr on #1 key fails. Since the parent function "dht_init_local_subvols_and_nodeuuids" logs failure, moving the log-level to DEBUG in dht_find_local_subvol_cbk. fixes: bz#1730175 Change-Id: I4d88244dc26587b111ca5b00d4c00118efdaac14 Signed-off-by: Susant Palai <spalai@redhat.com>
*	cluster/ec: skip updating ctx->loc again when ec_fix_open/opendir	Kinglong Mee	2019-07-17	2	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	The ec_manager_open/opendir memsets ctx->loc which causes memory/inode leak, and ec_fheal uses ctx->loc out of fd->lock that loc_copy may copy bad data when memset it. This patch skips updating ctx->loc when it is initilizaed. With it, ctx->loc is filled once, and never updated. Change-Id: I3bf5ffce4caf4c1c667f7acaa14b451d37a3550a fixes: bz#1729772 Signed-off-by: Kinglong Mee <mijinlong@horiscale.com>
*	cluster/ec: inherit healing from lock when it has info	Kinglong Mee	2019-07-16	1	-2/+3
\| \| \| \| \| \| \| \| \|	If lock has info, fop should inherit healing mask from it. Otherwise, fop cannot inherit right healing when changed_flags is zero. Change-Id: Ife80c9169d2c555024347a20300b0583f7e8a87f fixes: bz#1727081 Signed-off-by: Kinglong Mee <mijinlong@horiscale.com>
*	dht-common.h: reorder variables to reduce padding.	Yaniv Kaul	2019-07-15	1	-73/+81
\| \| \| \| \| \| \| \| \|	Manually added '-Wpadded' to get warnings on padding, and reordered structs to reduce most of them. Change-Id: I0c505fcb3dfef76399ac9d5d33bfb235354532de updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	Fix spelling errors	Aravinda VK	2019-07-14	1	-1/+1
\| \| \| \| \| \| \|	Fixes: bz#1728554 Change-Id: I88357aed7c14988a12616035c3738c32c09a8f9a Signed-off-by: Patrick Matthäi <pmatthaei@debian.org> Signed-off-by: Aravinda VK <avishwan@redhat.com>
*	cluster/ta: Notify the clients only if there are pending heals	karthik-us	2019-07-12	4	-22/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In case of thin arbiter, before index healer starts crawling the indices at every heal-timeout interval, even if there is nothing to be healed it will send an upcall notification to all the clients to release any AFR_TA_DOM_NOTIFY locks that they hold. SHD will wait for the upcall to return before proceeding with the heal even though there is nothing to be healed. This will also invalidates the cached information about the bricks states on the clients which leads to extra calls on TA from clients for the next reads & writes if needed. This will impact the IO performance. Fix: - Before sending the upcall to the clients, check for any pending heals on TA without taking any locks. - If there is nothing marked bad on TA, then continue with the index crawl to heal any dirty markings present on the files due to any post-op failure. - If there is a brick marked as bad on TA, then take the AFR_TA_DOM_NOTIFY lock on TA from SHD, get the state on TA and continue with the current healing process. Change-Id: Ieb477bc6cb18bbdfd4e7a0453c5ed79b574ec9d6 fixes: bz#1724184 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	cluster/afr: Fix incorrect reporting of gfid & type mismatch	karthik-us	2019-07-12	2	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems: 1. When checking for type and gfid mismatch, if the type or gfid is unknown because of missing gfid handle and the gfid xattr it will be reported as type or gfid mismatch and the heal will not complete. 2. If the source selected during entry heal has null gfid the same will be sent to afr_lookup_and_heal_gfid(). In this function when we try to assign the gfid on the bricks where it does not exist, we are considering the same gfid and try to assign that on those bricks. This will fail in posix_gfid_set() since the gfid sent is null. Fix: If the gfid sent to afr_lookup_and_heal_gfid() is null choose a valid gfid before proceeding to assign the gfid on the bricks where it is missing. In afr_selfheal_detect_gfid_and_type_mismatch(), do not report type/gfid mismatch if the type/gfid is unknown or not set. Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da fixes: bz#1722507 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	cluster/dht: Fixed a memleak in dht_rename_cbk	N Balachandran	2019-07-02	1	-11/+33
\| \| \| \| \| \| \| \| \|	Fixed a memleak in dht_rename_cbk when creating a linkto file. Change-Id: I705adef3cb79e33806520fc2b15558e90e2c211c fixes: bz#1722698 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	[RFC] change get_real_filename implementation to use ENOATTR instead of ENOENT	Michael Adam	2019-06-26	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	get_real_filename is implemented as a virtual extended attribute to help Samba implement the case-insensitive but case preserving SMB protocol more efficiently. It is implemented as a getxattr call on the parent directory with the virtual key of "get_real_filename:<entryname>" by looking for a spelling with different case for the provided file/dir name (<entryname>) and returning this correct spelling as a result if the entry is found. Originally (05aaec645a6262d431486eb5ac7cd702646cfcfb), the implementation used the ENOENT errno to return the authoritative answer that <entryname> does not exist in any case folding. Now this implementation is actually a violation or misuse of the defined API for the getxattr call which returns ENOENT for the case that the dir that the call is made against does not exist and ENOATTR (or the synonym ENODATA) for the case that the xattr does not exist. This was not a problem until the gluster fuse-bridge was changed to do map ENOENT to ESTALE in 59629f1da9dca670d5dcc6425f7f89b3e96b46bf, after which we the getxattr call for get_real_filename returned an ESTALE instead of ENOENT breaking the expectation in Samba. It is an independent problem that ESTALE should not leak out to user space but is intended to trigger retries between fuse and gluster. But nevertheless, the semantics seem to be incorrect here and should be changed. This patch changes the implementation of the get_real_filename virtual xattr to correctly return ENOATTR instead of ENOENT if the file/directory being looked up is not found. The Samba glusterfs_fuse vfs module which takes advantage of the get_real_filename over a fuse mount will receive a corresponding change to map ENOATTR to ENOENT. Without this change, it will still work correctly, but the performance optimization for nonexisting files is lost. On the other hand side, this change removes the distinction between the old not-implemented case and the implemented case. So Samba changed to treat ENOATTR like ENOENT will not work correctly any more against old servers that don't implement get_real_filename. I.e. existing files will be reported as non-existing Change-Id: I971b427ab8410636d5d201157d9af70e0d075b67 fixes: bz#1722977 Signed-off-by: Michael Adam <obnox@samba.org>
*	cluster/ec: Prevent double pre-op xattrops	Pranith Kumar K	2019-06-22	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Race: Thread-1 Thread-2 1) Does ec_get_size_version() to perform pre-op fxattrop as part of write-1 2) Calls ec_set_dirty_flag() in ec_get_size_version() for write-2. This sets dirty[] to 1 3) Completes executing ec_prepare_update_cbk leading to ctx->dirty[] = '1' 4) Takes LOCK(inode->lock) to check if there are any flags and sets dirty-flag because lock->waiting_flag is 0 now. This leads to fxattrop to increment on-disk dirty[] to '2' At the end of the writes the file will be marked for heal even when it doesn't need heal. Fix: Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked as '1' in step 2) above Updates bz#1593224 Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	ec-heal: check file's gfid when deleting stale name	Kinglong Mee	2019-06-20	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \|	A name-less lookup does not contain parent's stat, It is hard to check the lookuped file is at the right path. This patch changes to a name lookup, and check file's gfid with expected gfid. If the gfid is different, mark it estale. fixes: bz#1702131 Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
*	afr/read: Implement latency based read child selection	Mohammed Rafi KC	2019-06-20	3	-27/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Network latency is an important factor selecting a read subvolume. So this patch is adding two new policy. 1) We measure the latency of a child during a GF_DUMP rpc call. Then use this latency to pick a read subvol having the least latency. 2) Second one is an hybrid mode where it calculates the effective latency by multiplying outstanding pending read request and latency, and choose the least one. Change-Id: Ia49c8a08ab61f7dcdad8b8950aa4d338e7accf97 fixes: #520 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	cluster/dht: Strip out dht xattrs	N Balachandran	2019-06-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Some internal DHT xattrs were not being removed when calling getxattr in pass-through mode. This has been fixed. Change-Id: If7e3dbc7b495db88a566bd560888e3e9c167defa fixes: bz#1721435 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	afr/fini: Free local_pool data during an afr fini	Mohammed Rafi KC	2019-06-17	1	-0/+6
\| \| \| \| \| \| \| \| \|	We should free the mem_pool local_pool during an afr_fini. Otherwise this will lead to mem leak for shd Change-Id: I805a34a88077bf7b886c28b403798bf9eeeb1c0b Updates: bz#1716695 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	clang-scan: resolve warning	Amar Tumballi	2019-06-15	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dht-common.c: because there was a 'goto err' before assigning the 'local' variable, there is possibility of NULL dereference. As the check which was done wouldn't ever be true, removed the check. glusterd-geo-rep.c: a possible path where 'slave_host' could be NULL when it gets passed to strcmp() is found. strcmp() expects a valid string. Add a NULL check. Updates: bz#1622665 Change-Id: I64c280bc1beac9a2b109e8fa88f2a5ce8b823c3a Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	multiple files: another attempt to remove includes	Yaniv Kaul	2019-06-14	31	-95/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	Cluster/afr: Don't treat all bricks having metadata pending as split-brain	karthik-us	2019-06-10	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks. Fix: Pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. This patch also adds restriction of all the bricks to be up to perform metadata heal to avoid any metadata loss. Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t as it was doing metadata heal even when only 2 of 3 bricks were up. Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09 fixes: bz#1717819 Signed-off-by: karthik-us <ksubrahm@redhat.com>