summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* features/locks: fcntl(3) on F_GETLK must return first conflicting lockv3.3.0.5rhs-40v3.3.0.5rhs-39Krishnan Parthasarathi2012-12-172-1/+84
| | | | | | | | | | | - Added test program, getlk_owner.c to capture the bug when regressed. Change-Id: Id6055a1e64609b9701560e50a9767f387ddadce7 BUG: 869724 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1993 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Add "afr.readdir-failover=off" option the rebalance processshishir gowda2012-12-172-7/+28
| | | | | | | | | | | | | | | By failing over readdir (default behaviour), rebalance could get duplicate files, as readdir would re-read from offset 0. Rebalance should not attempt to migrate these files again. Additionally, we need to handle these cases as failure in rebalance crawl. BUG: 859387 Change-Id: I77c5c14176bb4d9e593efd6d4739fbc8233bd0c5 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1991 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Provide option to disable readdir failoverPranith Kumar K2012-12-175-25/+42
| | | | | | | | | | | | | | | | | In a replica pair unlike files, directories may not have their content in same order, so readdir for same (offset, size) may not give same entries on both the sobvolumes of replica pair. Switching over from one subvolume to another may not be a good idea sometimes. It may lead to duplicate entries or fewer entries or both. This patch provides a way to disable readdir-failover so that applications like rebalance can retry if they want to. Change-Id: I02e5762e7f8a5847eaf54356e5d6b5f49fe6c609 BUG: 859387 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1989 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep / gsyncd: play nicely with peer multiplexing when setting a checkpointNiels de Vos2012-12-171-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From upstream commit 01217e4e16677b13c7febc66e4e4ca3f0025739b: > The gsyncd invocation that instruments the "geo-rep config" command is > multiplexed over peers to ensure the uniformity of configuration. > In general, that works well, but checkpoint setting is a special case, > because (unlike other instances of config-set) it is logged (as recording > of checkpoint events is part of the feature). > > Problem is that the path components leading to the log file are > created only on the original node, where gsyncd was started. > Therefore the logging attempt will fail on the other nodes. > > Fix: ignore if opening the logfile on behalf of checkpoint setting > fails with ENOENT. > > Change-Id: I677f3f081bf4b9e3ba4d25d58979d86931e6beb4 > BUG: 881997 > Signed-off-by: Csaba Henk <csaba@redhat.com> > Reviewed-on: http://review.gluster.org/4248 > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Tested-by: Christos Triantafyllidis <ctrianta@redhat.com> > Reviewed-by: Christos Triantafyllidis <ctrianta@redhat.com> > Reviewed-by: Anand Avati <avati@redhat.com> Change-Id: I83b2cb7f78cf8613b78d3c8ff8e7b3828050cfc3 BUG: 881736 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1929 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* geo-replication: catch select.error on select()Niels de Vos2012-12-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From upstream commit 15bf92d53c72774e2fd7aba146644a2e460e543f: > tailer() in resource.py does not correctly catch exceptions from > select(). select() can raise an instance of the select.error class and > the current expression only catches ValueError (and the instance will > have reference called selecterror). > > The geo-rep log contains a call trace like this: > > E [syncdutils:190:log_raise_exception] <top>: FAIL: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 216, in twrap > > tf(*aa) > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 123, in tailer > > poe, _ ,_ = select([po.stderr for po in errstore], [], [], 1) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 276, in select > > return eintr_wrap(oselect.select, oselect.error, *a) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 269, in eintr_wrap > > return func(*a) > > error: (9, 'Bad file descriptor') > > BUG: 880308 > Change-Id: I2babe42918950d0e9ddb3d08fa21aa3548ccf7c5 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/4233 > Reviewed-by: Peter Portante <pportant@redhat.com> > Reviewed-by: Csaba Henk <csaba@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> BUG: 880308 Change-Id: Iece1f50c0064853669d1dd4a777f77f10e2fd0dc Upstream-bug: 886808 (changed after upstream merge) Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1927 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* fuse: have setxattr on geo-rep related xattrs take effectNiels de Vos2012-12-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | From upstream commit 6e3244a131b6d25141bef0cbc59968d3271f8ea3: > In http://review.gluster.com/3687 setxattr was made to a noop for > geo-rep special clients, with the exception of some special ones, > relevant to geo-rep. These exceptions were all in trusted namespace. > > That's no good, because with a mountbroker (unprivileged) setup, > the relevant attributes are in system namespace. So here we > just let setxattr through for any geo-rep related xattr, regardless > of namespace. > > Change-Id: I261141293b7db955a2e8b2405b4510cb10a42694 > BUG: 848447 > Signed-off-by: Csaba Henk <csaba@redhat.com> > Reviewed-on: http://review.gluster.com/3821 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Venky Shankar <vshankar@redhat.com> > Reviewed-by: Anand Avati <avati@redhat.com> BUG: 883827 Change-Id: I86a044d52ad3e679b21ff3832ee6536c5c6809fb Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1925 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: fail fix-layout if any of the subvol is downshishir gowda2012-12-175-35/+47
| | | | | | | | | | | | | | | | If any subvolume is down, and a layout is re-written and hash values change, entry names in the downed subvol can be reused in the other subvol which got the same hash range. when the downed subvol is brought back up, duplicate entried might appear Also separated handling of ENOSPC and ENOTCONN error. Change-Id: I1a49a689f6891a32128adcfb92dc46f39eaddec7 BUG: 860599 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1898 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: if create returns EXIST, donot set gfid/xattrsshishir gowda2012-12-171-0/+4
| | | | | | | | | Change-Id: I9f2b75b10bde428d36d6516aa09c18e590d17ed9 BUG: 864801 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1896 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* protocol/client: Conditional logging in client3_1_unlink_cbkVenkatesh Somyajulu2012-12-171-1/+4
| | | | | | | | | Change-Id: Ic6f4e276a5ab6906e4b3ad28e9b8c7eed52b3080 BUG: 861925 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1985 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : Edited log message in afr_sh_entry_expunge_entry_cbkVenkatesh Somyajulu2012-12-171-1/+2
| | | | | | | | | Change-Id: Ic5256650652416e3a043b9e4640748ce1fa50e83 BUG: 860246 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1986 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr Changed the message's log level from Error to DebugVenkatesh Somyajulu2012-12-171-3/+3
| | | | | | | | | Change-Id: I64ca577839fc25952025651873ab60a2fcc3702c BUG: 859411 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1984 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Cluster/afr: Fix output for gluster volume heal vn info healedVenkatesh Somyajula2012-12-178-17/+129
| | | | | | | | | | | | | | | | | | | | | Problem: Whenever gluster volume heal vol full command is executed, the entries stored in the circual buffer for sh->healed are added in the dictionary in the _crawl_post_sh_action function irrespective of whether actual self heal (due to non-zero values in chage log) takes place or not. Fix: Value of key (actual-sh-done) will be set to 1 whenever self heal takes place due to non-zero change log values and if for some FOP self heal daemon finds that no self heal required after examining the pending matrix, the value will be 0. Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc BUG: 863068 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1902 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* afr: make flush non-transactionalBrian Foster2012-12-173-136/+38
| | | | | | | | | | | | | | | | | | Flush is historically a transaction to ensure all previous writes were complete. This is no longer required as write-behind has learned to make flush a barrier operation (re: conversation w/ Avati). Flush taking a full file lock causes VMs running on afr volumes to stall when a migration occurs and self-heal is in progress. Make afr_flush() a non-transactional operation. BUG: 874045 Change-Id: Ie287b79e7f300df88aca6030e2d80311772746bf Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1912 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* afr: use data trylock mode in read/write self-heal trigger pathsBrian Foster2012-12-171-1/+8
| | | | | | | | | | | | | | | | | | | | | | Self-heal data lock contention between clients and glustershd instances can lead to long wait and user response times if the client ends up pending its lock on glustershd self-heal of a large file. We have reports of guest vm instances going completely unresponsive during self-heal of virtual disk images. Optimize the read/write self-heal trigger codepath (i.e., afr_open_fd_fix()) to trylock for self-heal and skip the self-heal otherwise to minimize the likelihood of a running/active guest of competing with glustershd on arrival of a brick. Note that lock contention is still possible from the client (e.g., via lookup). BUG: 874045 Change-Id: I077e2c0aaa424b80734a471284173bda8871cdc3 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1911 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* perf/io-threads: least-rate-limit least priority throttlingBrian Foster2012-12-173-2/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'least-rate-limit' io-threads translator option enables throttling of least priority operations. This is initially intended as a debug/diagnostic tool for users who might experience overloaded servers via background activity (i.e., self-heal). least-rate-limit defines the maximum number of least priority operations the io-threads translator will dequeue in one second. If the specified rate limit is met, the worker threads sleep for the minimal amount of time before the next least priority operation becomes available (or until a new request arrives). The requests/second metric is generic and relative to a variety of factors involved with a background operation (server, storage, etc.). The most recent measured rate ("cached least rate") is added to the io-threads state dump content (kill -USR1) to serve as a reference point to throttle background activity under particular conditions. [This backport drops the iot_priv_dump() bits as they do not exist in downstream.] BUG: 853680 Change-Id: If7d28439372a2ea1a64e92e4a4b13826840a5248 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1909 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* afr: support self-heal data trylock mechanismBrian Foster2012-12-174-7/+14
| | | | | | | | | | | | | | | Introduce a block flag to support an optional blocking or non-blocking mode in the self-heal data locking mechanism. All callers are modified to use blocking mode, which is the current default behavior (no change in behavior is introduced by this commit). BUG: 874045 Change-Id: I89bd2e698bd3db898c3ad57b55cf5c38e822e136 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1910 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Rebase commit #2Vijay Bellur2012-12-125-1/+280
| | | | | | | | Change-Id: Ie983d0b9862cc1401187532ed896e57bd3488e2b BUG: 871323 Reviewed-on: https://code.engineering.redhat.com/gerrit/1893 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* protocol/client: add an option to filter O_DIRECT flag in openAmar Tumballi2012-12-123-2/+28
| | | | | | | | | | | | | | | | | with the option, the idea is all client-side caching will be disabled, where as on server side process, the fd will be treated as a regular fd, thus helping the performance better. "gluster volume set <VOLNAME> remote-dio enable" would set this option in client protocol volumes. Change-Id: I08f3d1f6fed6da58501b5b94e5572216593c2847 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 856156 Reviewed-on: https://code.engineering.redhat.com/gerrit/1685 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1885
* storage/posix: Handle undefined symbol when aio is not availablePranith Kumar K2012-12-121-0/+10
| | | | | | | | | | Change-Id: I47b93a5e72f06bda016b5b9ab820cbc8f99fab28 BUG: 871323 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/182 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1884
* fuse: make the default background queue length as 512Amar Tumballi2012-12-121-2/+2
| | | | | | | | | | | | | should help VM hosting performance when there are more VMs are hosted from a single data store. Change-Id: I0f2df352e410e10845cfade5f27fe1b0b5b06250 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 859589 Reviewed-on: https://code.engineering.redhat.com/gerrit/1504 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1883
* glusterd: Made volume reset recognize options in <domain>.<specifier> formatKrutika Dhananjay2012-12-121-0/+11
| | | | | | | | | | Change-Id: Id057606c2882584310119a1e7dd8674943857841 BUG: 866565 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/178 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1882
* protocols: Suppress getxattr log when errno is ENOENTPranith Kumar K2012-12-122-2/+6
| | | | | | | | | | Change-Id: I4c170464cb9aa013588d615c2916bf87c370e9dc BUG: 861015 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/162 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1881
* storage/posix: Make rchecksum O_DIRECT friendlyPranith Kumar K2012-12-122-24/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When posix-aio is enabled to perform aio fd is set with O_DIRECT whenever possible in read, writev fops. Rchecksum does not take this into account. If either offset/size/memory-buf passed to pread in rchecksum fop is not aligned, pread fails with EINVAL. Fix: Before doing pread necessary O_DIRECT manipulation is done when aio is enabled. Memory buffer passed to pread is now page-aligned. Test: 1) Create replica volume with aio enabled. 2) dd if=/dev/urandom of=a bs=1M count=1 3) kill one of the bricks in the replica pair 4) dd if=/dev/urandom of=a bs=1M count=1 5) bring back the brick. Self-heal succeeds after the change. The test above checks both rchecksum, writev fops that were changed in this patch. Change-Id: I5126e20ca1d6aeb71d4d66d14de277729fc8e89f BUG: 866459 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/156 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1880
* cluster/afr: check transaction type for eager-lock after it is setPranith Kumar K2012-12-121-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: Icb1464bc572cf0be73bdd4d5803a2326b5d22655 BUG: 865321 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/85 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1879
* init.d: stop only 'glusterd' process on '/etc/init.d/glusterd stop'Amar Tumballi2012-12-121-13/+0
| | | | | | | | | | | | | | | | | | earlier it used to stop even brick processes and gluster NFS server process, which is not a expected behavior for command '/etc/init.d/glusterd stop' Change-Id: Ibc092cdf2693b3b2ae491d32ce3f0113854149c8 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 865382 Reviewed-on: http://review.gluster.com/2919 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/133 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1878
* cluster/distribute: cli support for setting directory-layout-spreadshishir gowda2012-12-121-0/+1
| | | | | | | | | | | | | | | 'gluster volume set <volname> subvols-per-directory <value> ' will control to how many (value) subvolume's the directories layout will be spread. Change-Id: I0aed937f6bbc66629e36b6a856432e51b180747c BUG: 865669 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/122 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1877
* cluster/afr: Wake up post-op on non-co-operative transactionPranith Kumar K2012-12-121-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The problem is observed when kernel untar is done. One file untar happens every second. The reason for this is, setattr lock is blocked on the prev fd data-transaction full-lock (because of eager-lock). Because of post-op-delay the post-op (xattrop + unlock) of the prev data-transaction happens after 1 sec. Until this the setattr is blocked resulting in performance problems in untar. Fix: Whenever an loc data, meta-data transaction comes, it should wakeup the prev-post-op on the same process' fd. Tests: The performance problem in untar went away. I put a breakpoint in client_finodelk for a 2G file dd and the inodelk is hit only 4 times. This confirms that the change does not affect post-op-delay in a -ve way. Change-Id: I32e272727f8ea03ae8768509695bbae183aff17d BUG: 853679 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/83 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1876
* features/marker: use buf->ia_gfid in all the lookup callbacksRaghavendra Bhat2012-12-122-10/+23
| | | | | | | | | | | | | | | | * In general use buf->ia_gfid for gfid instead of inode's gfid in the callbacks of the fops where new inode is created (such as create, mkdir, mknod, symlink). In the callback path inode would not be having the gfid within it, if it is not yet linked to the inode table which happens in protocol/server. Change-Id: Ie2e5ce6d25181e13d32c1ab99ee488a55fe64117 BUG: 848318 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/64 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1875
* linux-aio: fixes while setting O_DIRECT flagAmar Tumballi2012-12-123-40/+60
| | | | | | | | | | | | | | | | | | | | | | | Linux AIO needs O_DIRECT to be set for effective operation. O_DIRECT in turn has constraints on when it can work (offset, size alignment) So use O_DIRECT (unless instructed by application) only when offset and size alignments match. Else, io_submit() will happen over non-O_DIRECT fd, effectively blocking till the completion of the IO. Also fix a multithreading bug where detection/setting of O_DIRECT for a request was not atomic with io_submit() of that request. Change-Id: I190017e8bc78217429aff0714dca224cbe6f251d BUG: 859406 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/4006 Tested-by: Amar Tumballi <amarts@redhat.com> Original-Author: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/61 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1874
* Rebase commit #1Vijay Bellur2012-12-121-0/+519
| | | | | | | | Change-Id: Iad1acb3fb744d0a30498bddbee32e64fa0413f66 BUG: 858469 Reviewed-on: https://code.engineering.redhat.com/gerrit/1873 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/marker: if parent inode is NULL, then get it by inode_parentRaghavendra Bhat2012-12-122-4/+13
| | | | | | | | | | | | | | | | | | | * If parent inode is NULL (nameless lookups which uses gfid for looking up the inode), then try to get it by inode_parent, instead of returning which results in the inode's contribution not being added to the list. * Prevent exceesive logging while adding the inode's contribution to the list if the operation fails. (Check if the inode's gfid is null which indicates that the inode is not yet linked to the inode table and hence addition of its contribution to the list can fail). BUG: 851953 Change-Id: I4539b0534894e9d9cf5036c12fbf591ecad586bb Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/35 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/138
* cluster/dht: set conf->defrag to NULL after freeing the defrag structureRaghavendra Bhat2012-12-121-2/+3
| | | | | | | | | | | | | | Also no need to free the xlator object after rebalance is over, as the process is about to be killed. Change-Id: Id13cc74edf367660eef96ce215878e4dac7b4ba1 BUG: 862981 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/53 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1872
* logging: log ENOENT errors in DEBUG mode instead of ERROR or INFORaghavendra Bhat2012-12-122-2/+4
| | | | | | | | | | Change-Id: I08a34e58892a8b2a2fdecc606bed8db292d36332 BUG: 851953 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/36 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/137
* build: Install virt group file as a config file.Vijay Bellur2012-12-122-2/+3
| | | | | | | | | | | | | Additionally copy virt group file to /var/lib/glusterd/groups/ during RPM install. Change-Id: Ie0bedafc4354ac278adfb5cd8a1c1db61512d6a8 BUG: 861369 Reviewed-on: https://code.engineering.redhat.com/gerrit/42 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1871
* build: Install glusterfs-logrotate as config fileVijay Bellur2012-12-122-2/+5
| | | | | | | | Change-Id: I8255eb4249503eac0add87444da934256faffc01 BUG: 860037 Reviewed-on: https://code.engineering.redhat.com/gerrit/1870 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* protocol/client: Remember the gfid of opened fdPranith Kumar K2012-12-123-112/+107
| | | | | | | | | | | | | | | This is needed when the fresh lookup triggers self-heal, gfid won't be present in inode yet. Similar situation happens with Rebalance as it does not perform inode_link. Added similar fix for re-opendir. Removed inode from fdctx and removed some duplication of code. Change-Id: I87679df7171bc6a25c4396af3a3fc04534a65c9c BUG: 859387 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1581 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* write-behind: fix off-by-one bug in wb_requests_overlap()Anand Avati2012-12-121-7/+6
| | | | | | | | | | and backport an upstream review comment Change-Id: If683ee051cc3bd969417d69705bd63343650b541 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1869 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Add libaio-devel in BuildRequires for server packageVijay Bellur2012-12-121-0/+1
| | | | | | | Change-Id: I4230815f44c8367942036f255eecb8d59722c10b Reviewed-on: https://code.engineering.redhat.com/gerrit/1868 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* performance/write-behind: Add missing memory accounting typeVijay Bellur2012-12-121-0/+1
| | | | | | | Change-Id: I578b41b721d1a4aca679e637082737dfcf6a3194 Reviewed-on: https://code.engineering.redhat.com/gerrit/1867 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* write-behind: implement causal ordering and other cleanupAnand Avati2012-12-121-2321/+1212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rules of causal ordering implemented: - If request A arrives after the acknowledgement (to the app, i.e, STACK_UNWIND) of another request B, then request B is said to have 'caused' request A. - (corollary) Two requests, which at any point of time, are unacknowledged simultaneously in the system can never 'cause' each other (wb_inode->gen is based on this) - If request A is caused by request B, AND request A's region has an overlap with request B's region, then then the fulfillment of request A is guaranteed to happen after the fulfillment of B. - FD of origin is not considered for the determination of causal ordering. - Append operation's region is considered the whole file. Other cleanup: - wb_file_t not required any more. - wb_local_t not required any more. - O_RDONLY fd's operations now go through the queue to make sure writes in the requested region get fulfilled before getting processed. - O_SYNC fd's operations now go through the queue to make sure previously acknowledged writes on the file (via other fds) are fulfilled before getting processed. - Option to not honor O_SYNC is now removed. - Option to ignore O_DIRECT is added (useful when running a VM and the drive appears with NCQ/TCQ or WCE=1 for the guest.) - Option to disable_first_nbytes is removed (as the cause of the bug which required this was diagnosed to be missing TCP_NODELAY.) - General cleanup and better conformance to coding style and convention. Change-Id: Ib44fb72da3727246b4a85174cb568c2f0231f6de BUG: 857673 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1866 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* group-virt: virt template file addedAmar Tumballi2012-12-123-1/+12
| | | | | | | | Change-Id: Ic124e44c3d3068d78e075c9a66ead9ab66ecb241 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1865 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* fuse: make the 'gid-timeout' default value as 2sec instead of 0Amar Tumballi2012-12-121-1/+1
| | | | | | | | | | | done for the performance benefits. Change-Id: I4788800fb911ac571c4ff636db5d09e95b335a6e BUG: 858469 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1864 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Option to set brick(of a volume)'s root dir's uid/gidKrishnan Parthasarathi2012-12-122-5/+45
| | | | | | | | Change-Id: I529d4cd949477a436a5b571b69da9f1c8b33ee8f BUG: 858469 Reviewed-on: https://code.engineering.redhat.com/gerrit/1863 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cli: Fix double free in cli_add_key_groupKrishnan Parthasarathi2012-12-121-0/+2
| | | | | | | | | Change-Id: I3c2f030ac7c53913612a3fbac5e582c47b005621 BUG: 851237 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1862 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* core/statedump: statedump enahancementsRaghavendra Bhat2012-12-121-0/+1
| | | | | | | | | | | | | | | | | * append timestamp to the statedump filename to prevent old files getting over written * Add start and end markers to statedump to indicate beginning and finishing of statedump information * Make glusterfs take options through /tmp/glusterdump.options file and treating those options with higher prioriry * do not dump the entire inode table in the statedump. Instead just dump the ltable and the fdtable Change-Id: I9a56a5be9970b58d08de509916f88aa2be56d864 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1861 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* fuse: make background queue length configurableAmar Tumballi2012-12-126-56/+164
| | | | | | | | | | | | | | | * also make 'congestion_threshold' an option * make 'congestion_threshold' as 75% of background queue length if not explicitely specified * in glusterfsd.c, moved all the fuse option dictionary setting code to separate function Change-Id: Ie1680eefaed9377720770a09222282321bd4132e Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 845214 Reviewed-on: https://code.engineering.redhat.com/gerrit/1860 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cli: Added special key "group" for bulk volume set.Krishnan Parthasarathi2012-12-122-33/+146
| | | | | | | | | | | | | | | | | | gluster volume set VOLNAME group group_name - where group_name is a file under /var/lib/glusterd/groups containing one key, value pair per line as below, key1=value1 key2=value2 [...] - the command sets key1 to value1 and so on. Change-Id: Ic4c8dedb98d013b29a74e57f8ee7c1d3573137d2 BUG: 851237 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1859 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: post-op-delay supportAnand Avati2012-12-126-1/+173
| | | | | | | | | | | | | | | post-op-delay introduces an artificial delay between the OP and POST-OP-CHANGELOG phases of a write transaction to increase the probability of changelog-piggyback and eager-locking to work more efficiently. Change-Id: I865ca4b68512c44818719c7e388952f15d53e6c2 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1858 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: cleanup lk_owner and PID messAnand Avati2012-12-123-41/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically PID (frame->root->pid) was used by the locks translator to identify a locker (and make decisions about which locks contend or cooperate/merge). Since the introduction of lock_owner parameter the usage of PID (for locks) was deprecated and is now unused. This patch nukes the usage of PID in AFR The usage of lk_owner has also ended up being a mess, because of the differentiation required between ->lk() and ->inodelk(), (->lk() needs to be identified by the process (roughly) and ->inodelk() needs to be identified by the transaction) and also because of optimizations like eager locking (locks are no more identified by the transaction as they now get inherited by the next transaction). The scheme (and technique) now is: - All FOPs (the third phase of the transaction) happen with the lk_owner which is set by the topmost layer (FUSE, NFS etc.) - All entrylks are issued with lk_owner set to the frame->root address. - Inodelks which will not be subject to eager locking are issued with lk_owner set to frame->root. - Inodelks which are subject to eager locking are issued with lk_owner set to the address of fd_t (which are the only type of frames which get subject to the eager locking optimization) - At the start of the transaction, the transaction frame's lk_owner is set to the either frame->root or fd_t (and never unmodified) depending on the type of transaction. - Just before the third phase (FOP phase) the set lk_owner is "saved" away and overwritten by the lk_owner submitted by the top layer (FUSE or NFS) - Right after the third phase, the saved lk_owner is "restored" to resume the transaction into the POST-OP and eventually UNLOCK using the same lk_owner which was used during the LOCK phase. Change-Id: I6ab8e4d6b65ae4185fa85ad3fded8e9188b2f929 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1857 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* hooks: Modified samba hook scripts to handle user.cifsKrishnan Parthasarathi2012-12-122-4/+147
| | | | | | | | | Change-Id: I079636e2be4bc097df33355b6a60c0e04d69ef57 BUG: 877992 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1856 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>