summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/dht: Handle file migrations when brick downN Balachandran2018-04-181-5/+51
| | | | | | | | | | | | | | | | | | | The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1566822 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 1f0765242a689980265c472646c64473a92d94c0) Change-Id: I3072ca1f2975eb7ad3c38798e65d60d2312fd057
* cluster/dht: Wind open to all subvolsN Balachandran2018-04-131-10/+5
| | | | | | | | | | | | dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. (cherry picked from commit c4251edec654b4e0127577e004923d9729bc323d) Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1566822 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Prevent ping-event handling on shdPranith Kumar K2018-04-091-0/+2
| | | | | | | | | | On shd, we shouldn't treat any brick down based on latency, otherwise self-heal will never happen fixes: bz#1562728 Change-Id: Ica07fcc4fae91a6bfd9c9a670e2be464704d94b7 BUG: 1562728 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: send list-node-uuids request to all subvolumesXavi Hernandez2018-03-281-1/+1
| | | | | | | | | | | | | | | The xattr trusted.glusterfs.list-node-uuids was only sent to a single subvolume. This was returning null uuids from the other subvolumes as if they were down. This fix forces that xattr to be requested from all subvolumes. Backport of: > BUG: 1561406 Change-Id: If62eb39a6857258923ba625e153d4ad79018ea2f BUG: 1561721 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/ec: fix SHD crash for null gfid'sXavi Hernandez2018-03-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | When the self-heal daemon is doing a full sweep it uses readdirp to get extra stat information from each file. This information is obtained in two steps by the posix xlator: first the directory is read to get the entries and then each entry is stated to get additional info. Between these two steps, it's possible that the file is removed by the user, so we'll get an error, leaving stat info empty. EC's heal daemon was using the gfid blindly, causing an assert failure when protocol/client was trying to encode the gfid. To fix the problem a check has been added. If we detect a null gfid, we simply ignore it and continue healing. Backport of: > BUG: 1558016 Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228 BUG: 1559079 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/ec: Change default read policy to gfid-hashAshish Pandey2018-03-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Whenever we read data from file over NFS, NFS reads more data then requested and caches it. Based on the stat information it makes sure that the cached/pre-read data is valid or not. Consider 4 + 2 EC volume and all the bricks are on differnt nodes. In EC, with round-robin read policy, reads are sent on different set of data bricks. This way, it balances the read fops to go on all the bricks and avoid heating UP (overloading) same set of bricks. Due to small difference in clock speed, it is possible that we get minor difference for atime, mtime or ctime for different bricks. That might cause a different stat returned to NFS based on which NFS will discard cached/pre-read data which is actually not changed and could be used. Solution: Change read policy for EC as gfid-hash. That will force all the read to go to same set of bricks. >Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84 >BUG: 1554743 >Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84 BUG: 1557906 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: avoid delays in self-healXavi Hernandez2018-03-154-48/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Self-heal creates a thread per brick to sweep the index looking for files that need to be healed. These threads are started before the volume comes online, so nothing is done but waiting for the next sweep. This happens once per minute. When a replace brick command is executed, the new graph is loaded and all index sweeper threads started. When all bricks have reported, a getxattr request is sent to the root directory of the volume. This causes a heal on it (because the new brick doesn't have good data), and marks its contents as pending to be healed. This is done by the index sweeper thread on the next round, one minute later. This patch solves this problem by waking all index sweeper threads after a successful check on the root directory. Additionally, the index sweep thread scans the index directory sequentially, but it might happen that after healing a directory entry more index entries are created but skipped by the current directory scan. This causes the remaining entries to be processed on the next round, one minute later. The same can happen in the next round, so the heal is running in bursts and taking a lot to finish, specially on volumes with many directory levels. This patch solves this problem by immediately restarting the index sweep if a directory has been healed. Backport of: > BUG: 1547662 Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e BUG: 1555198 Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
* cluster/afr: Fix dict-leak in pre-opPranith Kumar K2018-03-033-20/+20
| | | | | | | | | | | | | | | | At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the disk and at the time of post-op it gets over-written without unreffing the previous value stored leading to a leak. This is a regression we missed in https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e >BUG: 1550078 >Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >(cherry picked from commit e7b79c59590c203c65f7ac8548b30d068c232d33) BUG: 1550808 Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7
* cluster/dht: Handle single dht child in dht_lookupN Balachandran2018-02-231-0/+13
| | | | | | | | | | | | | | | | | | | This patch limits itself to only handling the case where no file (data or linkto) exists on the subvol. Additional cases to be handled: 1. A linkto file was found on the only child subvol. This currently calls dht_lookup_everywhere which eventually deletes it. It can be deleted directly as it will not be pointing to a valid subvol. 2. Directory lookups - locking might be unnecessary in some cases. > Change-Id: I940ba34531f2aaee1d36fd9ca45ecfd46be662a4 > BUG: 1546620 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I940ba34531f2aaee1d36fd9ca45ecfd46be662a4 BUG: 1548271 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Ignore ENODATA from getxattr for posix aclsN Balachandran2018-02-231-6/+8
| | | | | | | | | | | | | dht_migrate_file no longer prints an error if getxattr for posix acls fails with ENODATA/ENOATTR. > Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c > BUG: 1546954 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c BUG: 1548264 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Fixed a typoN Balachandran2018-02-231-2/+2
| | | | | | | | | | | | Replaced "then" with "than" > Change-Id: I73090e8c1a639befd7c5458e8d63bd173248bc7d > BUG: 1547128 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I73090e8c1a639befd7c5458e8d63bd173248bc7d BUG: 1547842 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* posix/afr: handle backward compatibility for rchecksum fopRavishankar N2018-02-203-9/+29
| | | | | | | | | | Added a volume option 'fips-mode-rchecksum' tied to op version 4. If not set, rchecksum fop will use MD5 instead of SHA256. updates: #230 Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 6daa6535692b2c68b493636a9bbfdcbc475b3d80)
* cluster/dht: Unlink linkto files as rootN Balachandran2018-02-091-3/+7
| | | | | | | | | | | | | | | Non-privileged users cannot delete linkto files. However the failure to unlink a stale linkto causes DHT to fail the lookup with EIO and hence prevent access to the file. > Change-Id: Id295362d41e52263790694602f36f1219f0646a2 > BUG: 1542318 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: Id295362d41e52263790694602f36f1219f0646a2 BUG: 1543487 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: avoid overwriting client writes during migrationSusant Palai2018-02-064-12/+138
| | | | | | | | | | | | | | | | | | | | | | | | | For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 545a7ce6762a1b3a7b989b43a9d18b5b1b299df0)
* afr: capture the correct errno in post-op quorum checkRavishankar N2018-02-061-8/+8
| | | | | | | | | | If the post-op phase of txn did not meet quorm checks, use that errno to unwind the FOP rather than blindly setting ENOTCONN. Change-Id: I0cb0c8771ec75a45f9a25ad4cd8601103deddf0c BUG: 1542382 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 440a048f24b006c80af3d7bcd0a1f13fe3459d87)
* afr: don't treat all cases all bricks being blamed as split-brainRavishankar N2018-02-062-9/+48
| | | | | | | | | | | | | | | | | | | | | | | | Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks while I/O or heal are in progress. Fix: Instead of undoing the post-op, pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae BUG: 1542380 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 0e6e8216823c2d9dafb81aae0f6ee3497c23d140)
* cluster/afr: remove unnecessary child_up initializationXavier Hernandez2018-02-061-7/+0
| | | | | | | | | | | | | | | The child_up array was initialized with all elements being -1 to allow afr_notify() to differentiate down bricks from bricks that haven't reported yet. With current implementation this is not needed anymore and it was causing unexpected results when other parts of the code considered that if child_up[i] != 0, it meant that it was up. Backport of: > BUG: 1541038 Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472 BUG: 1541928 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/dht: Fixed leak in dht_populate_inode_for_dentryN Balachandran2018-02-062-6/+10
| | | | | | | | | | | | | | | | | Fixed an issue in dht_populate_inode_for_dentry where a layout is set in the inode without checking if it is already set. This overwrites the value each time without freeing the already existing layout. Also includes the changes in https://review.gluster.org/19471 which fix a leak introduced in the original patch. > Change-Id: I651bf539a0b82b4ddc4c355890c16a8e91f5f1fd > BUG: 1541264 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I651bf539a0b82b4ddc4c355890c16a8e91f5f1fd BUG: 1541277 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/ec: Do lock conflict check correctly for wait-listPranith Kumar K2018-02-021-8/+15
| | | | | | | | | | | | | | Problem: ec_link_has_lock_conflict() is traversing over only owner_list but the function is also getting called with wait_list. Fix: Modify ec_link_has_lock_conflict() to traverse lists correctly. Updated the callers to reflect the changes. BUG: 1540882 Change-Id: Ibd7ea10f4498e7c2761f9a6faac6d5cb7d750c91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec : EC options for GD2Sunil Kumar Acharya2018-01-221-1/+34
| | | | | | | Updates #302 Change-Id: I31b4648f7b1a394fceece5cba8120c579c66edd9 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* afr: add quorum checks in post-opRavishankar N2018-01-191-0/+29
| | | | | | | | | | afr relies on pending changelog xattrs to identify source and sinks and the setting of these xattrs happen in post-op. So if post-op fails, we need to unwind the write txn with a failure. Change-Id: I0f019ac03890108324ee7672883d774918b20be1 BUG: 1506140 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Adding option to take full file lockkarthik-us2018-01-193-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In replica 3 volumes there is a possibilities of ending up in split brain scenario, when multiple clients writing data on the same file at non overlapping regions in parallel. Scenario: - Initially all the copies are good and all the clients gets the value of data readables as all good. - Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks. - C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks. - All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain. Fix: Have an option to take either full lock or range lock on files while doing data transactions, to prevent the possibility of ending up in split brains. With this change, by default the files will take full lock while doing IO. If you want to make use of the old range lock change the value of "cluster.full-lock" to "no". Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962 BUG: 1535438 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* core: fix some of the dict_{get,set} with proper APIsAmar Tumballi2018-01-178-13/+12
| | | | | | | updates #220 Change-Id: I6e25dbb69b2c7021e00073e8f025d212db7de0be Signed-off-by: Amar Tumballi <amarts@redhat.com>
* locks: added inodelk/entrylk contention upcall notificationsXavier Hernandez2018-01-164-94/+239
| | | | | | | | | | | | | | The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/afr: Fixing the flaws in arbiter becoming source patchkarthik-us2018-01-137-180/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Setting the write_subvol value to read_subvol in case of metadata transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03) might lead to the original problem of arbiter becoming source. Scenario: 1) All bricks are up and good 2) 2 writes w1 and w2 are in progress in parallel 3) ctx->read_subvol is good for all the subvolumes 4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on the disk 5) read/lookup comes on the same file and refreshes read_subvols back to all good 6) metadata transaction happens which makes ctx->write_subvol to be assigned with ctx->read_subvol which is all good 7) w2 succeeds on brick1 and fails on brick0 and this will update the brick in reverse order leading to arbiter becoming source Fix: Instead of setting the ctx->write_subvol to ctx->read_subvol in the pre-op statge, if there is a metadata transaction, check in the function __afr_set_in_flight_sb_status() if it is a data/metadata transaction. Use the value of ctx->write_subvol if it is a data transactions and ctx->read_subvol value for other transactions. With this patch we assign the value of ctx->write_subvol in the afr_transaction_perform_fop() with the on disk value, instead of assigning it in the afr_changelog_pre_op() with the in memory value. Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4 BUG: 1482064 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/dht: Update options for gd2N Balachandran2018-01-121-15/+40
| | | | | | | | | Update DHT options for GD2 Updates gluster/glusterfs#302 Change-Id: Ia597fe364e97edd7bcf72d89f4ccdd50713a8837 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Change datatype of search_unhashed variableVarsha Rao2018-01-101-1/+1
| | | | | | | | | | Variable search_unhashed is of type boolean, change it to integer type. This fixes the warning increment of a boolean expression. BUG: 1531987 Change-Id: Ibf153f6a9ad704da38bff346b6a21a71323ed9bb Signed-off-by: Varsha Rao <varao@redhat.com>
* cluster/ec: OpenFD heal implementation for ECSunil Kumar Acharya2018-01-057-91/+243
| | | | | | | | | | | | | Existing EC code doesn't try to heal the OpenFD to avoid unnecessary healing of the data later. Fix implements the healing of open FDs before carrying out file operations on them by making an attempt to open the FDs on required up nodes. BUG: 1431955 Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/dht: Use percentages for space checkN Balachandran2018-01-021-5/+20
| | | | | | | | | | | | | | | | | With heterogenous bricks now being supported in DHT we could run into issues where files are not migrated even though there is sufficient space in newly added bricks which just happen to be considerably smaller than older bricks. Using percentages instead of absolute available space for space checks can mitigate that to some extent. Marking bug-1247563.t as that used to depend on the easier code to prevent a file from migrating. This will be removed once we find a way to force a file migration failure. Change-Id: I3452520511f304dbf5af86f0632f654a92fcb647 BUG: 1529440 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* posix: Introduce flags for validity of iatt membersRavishankar N2017-12-292-5/+5
| | | | | | | | | | | | | | | | | | v1 of the patch started off as adding new fields to iatt that can be filled up using statx but the discussions were more around introducing masks to check the validity of different fields from a RIO perspective. To that extent, I have dropped the statx call in this version and introduced a 64 bit mask for existing fields. The masks I have defined are similar with the statx() flags' masks. I have *not* changed iatt_to_stat() to use the macros IATT_TYPE_VALID, IATT_GFID_VALID etc before blindly copying from struct iatt to struct. Also fixed warnings in xlators because of atime/mtime/ctime seconds field change from uint32_t to int64_t. Change-Id: I4ac614f1e8d5c8246fc99d5bc2d2a23e7941512b Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* mgmt/glusterd: Adding validation for setting quorum-countkarthik-us2017-12-291-1/+2
| | | | | | | | | | | In a replicated volume it was allowing to set the quorum-count value between the range [1 - 2147483647]. This patch adds validation for allowing only maximum of replica_count number of quorum-count value to be set on a volume. Change-Id: I13952f3c6cf498c9f2b91161503fc0fba9d94898 BUG: 1529515 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/dht: Add migration checks to dht_(f)xattropN Balachandran2017-12-265-47/+326
| | | | | | | | | | | | The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c BUG: 1471031 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/ec: Fix possible shift overflowXavier Hernandez2017-12-223-10/+12
| | | | | | | | | | A coverity scan has revelaed a potential shift overflow while scanning the bitmap of available subvolumes. The actual overflow cannot happen, but I've changed to test used to control the limit to make it explicit. Change-Id: Ieb55f010bbca68a1d86a93e47822f7c709a26e83 BUG: 789278 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Change [f]getxattr to parallel-dispatch-onePranith Kumar K2017-12-224-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment in EC, [f]getxattr operations wait to acquire a lock while other operations are in progress even when it is in the same mount with a lock on the file/directory. This happens because [f]getxattr operations follow the model where the operation is wound on 'k' of the bricks and are matched to make sure the data returned is same on all of them. This consistency check requires that no other operations are on-going while [f]getxattr operations are wound to the bricks. We can perform [f]getxattr in another way as well, where we find the good_mask from the lock that is already granted and wind the operation on any one of the good bricks and unwind the answer after adjusting size/blocks to the parent xlator. Since we are taking into account good_mask, the reply we get will either be before or after a possible on-going operation. Using this method, the operation doesn't need to depend on completion of on-going operations which could be taking long time (In case of some slow disks and writes are in progress etc). Thus we reduce the time to serve [f]getxattr requests. I changed [f]getxattr to dispatch-one and added extra logic in ec_link_has_lock_conflict() to not have any conflicts for fops with EC_MINIMUM_ONE as fop->minimum to achieve the effect described above. Modified scripts to make sure READ fop is received in EC to trigger heals. Updates gluster/glusterfs#368 Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Add default value for the redundancy optionKamal Mohanan2017-12-211-0/+1
| | | | | | | Updates #302 Change-Id: Ifccc6a64c2b2a3b3a0133e6c8042cdf61a0838ce Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
* rchecksum/fips: Replace MD5 usage to enable fips supportKotresh HR2017-12-213-4/+4
| | | | | | | | | rchecksum uses MD5 which is not fips compliant. Hence using sha256 for the same. Updates: #230 Change-Id: I7fad016fcc2a9900395d0da919cf5ba996ec5278 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* stripe, quiesce: volume option fixesAmar Tumballi2017-12-201-4/+13
| | | | | | | Updates: #302 Change-Id: I8f84260ba007f2afc7fe3b7c15c06069942ad6d1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* dht: Fill first_up_subvol before use in dht_opendirPoornima G2017-12-151-0/+5
| | | | | | | | Reported by: Sam McLeod Change-Id: Ic8f9b46b173796afd70aff1042834b03ac3e80b2 BUG: 1512437 Signed-off-by: Poornima G <pgurusid@redhat.com>
* all: Simplify component message id's definitionXavier Hernandez2017-12-143-2079/+281
| | | | | | | | | This patch creates a new way of defining message id's that is easier and less error prone because it doesn't require so many manual changes each time a new component is defined or a new message created. Change-Id: I71ba8af9ac068f5add7e74f316a2478bc991c67b Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* glusterfs: Use gcc builtin ATOMIC operator to increase/decreate refcount.Mohit Agrawal2017-12-123-37/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In glusterfs code base we call mutex_lock/unlock to take reference/dereference for a object.Sometime it could be reason for lock contention also. Solution: There is no need to use mutex to increase/decrease ref counter, instead of using mutex use gcc builtin ATOMIC operation. Test: I have not observed yet how much performance gain after apply this patch specific to glusterfs but i have tested same with below small program(mutex and atomic both) and get good difference. static int numOuterLoops; static void * threadFunc(void *arg) { int j; for (j = 0; j < numOuterLoops; j++) { __atomic_add_fetch (&glob, 1,__ATOMIC_ACQ_REL); } return NULL; } int main(int argc, char *argv[]) { int opt, s, j; int numThreads; pthread_t *thread; int verbose; int64_t n = 0; if (argc < 2 ) { printf(" Please provide 2 args Num of threads && Outer Loop\n"); exit (-1); } numThreads = atoi(argv[1]); numOuterLoops = atoi (argv[2]); if (1) { printf("\tthreads: %d; outer loops: %d;\n", numThreads, numOuterLoops); } thread = calloc(numThreads, sizeof(pthread_t)); if (thread == NULL) { printf ("calloc error so exit\n"); exit (-1); } __atomic_store (&glob, &n, __ATOMIC_RELEASE); for (j = 0; j < numThreads; j++) { s = pthread_create(&thread[j], NULL, threadFunc, NULL); if (s != 0) { printf ("pthread_create failed so exit\n"); exit (-1); } } for (j = 0; j < numThreads; j++) { s = pthread_join(thread[j], NULL); if (s != 0) { printf ("pthread_join failed so exit\n"); exit (-1); } } printf("glob value is %ld\n",__atomic_load_n (&glob,__ATOMIC_RELAXED)); exit(0); } time ./thr_count 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 1m10.288s user 0m57.269s sys 3m31.565s time ./thr_count_atomic 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 0m20.313s user 1m20.558s sys 0m0.028 Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/dht: fix memory leaks in rebalanceSusant Palai2017-12-111-12/+19
| | | | | | | | | | | | From code reading it was found that in gf_defrag_process_dir, GF_FREE was called directly on dir_dfmeta->equeue leading to leaks of memory for list of entries read from all the local subvols in case of a failure. This patch frees the entries read from all the local subvols. Change-Id: If5e8f557372a8fc2af86628b401e8de1b54986a1 BUG: 1430305 Signed-off-by: Susant Palai <spalai@redhat.com>
* dht: Send an event when disks get fullAnkit raj2017-12-091-4/+22
| | | | | | | | | | Send an event if DHT determines that a subvol is getting full or running out of inodes. Change-Id: Ie026f4ee1832b5df1e80b16cb949b2cc31a25d6f Bug: 1440659 Signed-off-by: Ankit raj <anraj@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht/crypt/tier: Fix use of booleans as integersShyamsundarR2017-12-062-16/+6
| | | | | | BUG: 1520974 Change-Id: I19ea40c888e88a7a4ac271168ed1820c2075be93 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* cluster/dht: Fix build error due to switch statement on a booleanShreyas Siravara2017-12-051-16/+5
| | | | | Change-Id: Idf672b435e389baada732f609398404479306909 BUG: 1520974
* cluster/ec: Fix bugs in stripe-cache featureAshish Pandey2017-12-052-1/+4
| | | | | | | | | | | | | | 1 - This patch fixes a bug in ec_update_stripe() that prevented some stripes to be updated after a write. 2 - This patch also include code modification for the case in which a file does not exist and we write on unaligned offset and user size, the last stripe on which "end" will fall should also be cached. Change-Id: I069cb4be1c8d59c206e3b35a6991e1fbdbc9b474 BUG: 1520758 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/dht: populate inode in dentry for single subvolume dhtRaghavendra G2017-12-022-1/+69
| | | | | | | | | | ... in readdirp response if dentry points to a directory inode. This is a special case where the entire layout is stored in one single subvolume and hence no need for lookup to construct the layout Change-Id: I44fd951e2393ec9dac2af120469be47081a32185 BUG: 1492625 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: don't overfill the buffer in readdir(p)Raghavendra G2017-12-021-3/+18
| | | | | | | | | | | | | | | | | | | | Superflous dentries that cannot be fit in the buffer size provided by kernel are thrown away by fuse-bridge. This means, * the next readdir(p) seen by readdir-ahead would have an offset of a dentry returned in a previous readdir(p) response. When readdir-ahead detects non-monotonic offset it turns itself off which can result in poor readdir performance. * readdirp can be cpu-intensive on brick and there is no point to read all those dentries just to be thrown away by fuse-bridge. So, the best strategy would be to fill the buffer optimally - neither overfill nor underfill. Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84 BUG: 1492625 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* Tier: Stop tierd for detach starthari gowtham2017-12-011-3/+10
| | | | | | | | | | | | | | | | | | | Problem: tierd was stopped only after detach commit This makes the detach take a longer time. The detach demotes the files to the cold brick and if the promotion frequency is hit, then the tierd starts to promote files to hot tier again. Fix: stop tierd after detach start so the files get demoted faster. Note: the is_tier_enabled was not maintained properly. That has been fixed too. some code clean up has been done. Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: I532f7410cea04fbb960105483810ea3560ca149b BUG: 1446381
* dht: coverity fix in dht-rebalance.ckarthik-us2017-11-301-1/+0
| | | | | | | | | | | Fixed UNUSED_VALUE warning in dht_migrate_file. Issue ID: 526 From: http://download.gluster.org/pub/gluster/glusterfs/ static-analysis/master/glusterfs-coverity/2017-11-30-eb013e4c Change-Id: I37395e8ce7088742501424fcce918f0ee8ab4f3d BUG: 789278 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/ec: EC DISCARD doesn't punch hole properlySunil Kumar Acharya2017-11-281-2/+4
| | | | | | | | | | | | | | Problem: DISCARD operation on EC volume was punching hole of lesser size than the specified size in some cases. Solution: EC was not handling punch hole for tail part in some cases. Updated the code to handle it appropriately. BUG: 1516206 Change-Id: If3e69e417c3e5034afee04e78f5f78855e65f932 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>