| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bd-xlator can not be built successfully on certain Debian
distributions due to a missing declaration of lvm_lv_from_name(). This
function is available for linking, but it does not exist in the header
file.
This change adds a detection for lvm_lv_from_name() in both the library
for linking, and the declaration in the header file. If the 1st is
missing, the bd-xlator can not be built, and if only the 2nd one is
missing, we'll declare lvm_lv_from_name() ourselves. This makes it
possible to build the bd-xlator on the affected Debian distributions
too.
Change-Id: If1845f6b6d676793677ebbcc6daf9ff12f7c3fd6
BUG: 976946
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/5260
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/5000
Problem:
rpc_transport object, which is part of rpc_clnt, is destroyed
prematurely. This is because, rpc_transport object is ref'd by socket
layer and rpc layer. These ref's, until the synctask'izing of
operations, were unref'd sequentially in the epoll thread.
With more threads at play, the sequential unref guarantee is off.
Fix:
Shutting down the transport before proceeding with cleaning up of
rpc_clnt object would serialize the unref's on the rpc_transport object
and thus eliminating the race.
Also, we don't store the address of brickinfo in brick's rpc notify
function, to avoid the possibility of referring a freed brickinfo.
Instead we use a string based id to 'reach' the corresponding brickinfo.
Change-Id: If2739e2eeaee1e8b071ab2b6754b7ea0f81cfceb
BUG: 962619
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5214
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/5175 (upstream)
Usage: gluster system:: uuid get
This is needed since we generate uuid of a node in a lazy manner. ie, we
generate a uuid for the node only on the first volume or peer operation,
when the node needs an external identity. With this command, we can
force[1] the uuid generation, without a volume or peer operation performed.
[1]: Querying for uuid (or uuid get), forces uuid to come into
existence.
Change-Id: I62c8b6754117756aa4d773dd48af4ddeb1a1d878
BUG: 971661
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5204
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/5177
store being glusterd's persistent store under /var/lib/glusterd/
Change-Id: I1c01a09a8ce4a73ea612f05e7f14d4ab39ad1628
BUG: 971796
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5212
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Nfs xlator never does open on a file for performing writes,
afr does not perform changelog wakeup for this fd so operations
which do metadata operations as soon as the data operations are
completed perceive a delay od 'post-op-delay-secs'.
Fix:
Perform changelog wakeup on anon-fd if the fd with same pid is
not present in inode-list.
Note:
This approach is a short-term fix. A proper fix needs a new domain
for taking metadata locks so that data/metadata locks don't compete
with each other.
BUG: 966018
Change-Id: Ia9188a253e7943801b665e1b9205e2f551952d87
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5067
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a backport of Ia5a5d40bcea7bfb320ef7096af1e035b8847d4ff
BUG: 960055
Change-Id: Ibf3547a775d7ca2f3a097c880cdf38ffafb322da
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/5139
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is support for discovering a filename in a given directory
which has a case insensitive match of a given name. It is implemented
as a virtual extended attribute on the directory where the required
filename is specified in the key.
E.g:
sh# getfattr -e "text" -n user.glusterfs.get_real_filename:FiLe-B /mnt/samba/patchy
getfattr: Removing leading '/' from absolute path names
# file: mnt/samba/patchy
user.glusterfs.get_real_filename:FiLe-B="file-b"
In reality, there can be multiple "answers" as the backend filesystem is
case sensitive and there can be multiple files which can strcasecamp()
successfully. In this case we pick the first matched file from the first
responding server.
If a matching file does not exist, we return ENOENT (and NOT ENODATA).
This way the caller can differentiate between "unsupported" glusterfs
API and file not existing.
This API is used by Samba VFS to perform efficient discovery of the real
filename without doing a full scan at the Samba level.
Change-Id: I53054c4067cba69e585fd0bbce004495bc6e39e8
BUG: 953694
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5163
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not let inode linking to happen only in lookup(). While
that works, it is inefficient.
Change-Id: I51bbfb6255ec4324ab17ff00566375f49d120c06
BUG: 953694
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5162
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for negative xattr caching. For this, we need
to fetch xattrs in every opportunity (including readdirplus)
in order to treat missing key in cached dict as negative entry.
This is crucial to detect missing ACL xattrs in Samba workload.
Change-Id: I918a2ef4ab804724256f7546b15e808332ed518d
BUG: 953694
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5160
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cache needs to be pruned on write and [f]truncate. The lack of this
is causing Samba ping-pong test to return wierd 'data increment' values
during startup.
Change-Id: I9cd6a839bcd02de738d78638211b78f382f58e0a
BUG: 953694
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5158
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not fetching ACLs in readdirplus can potentially result in spurious
wrong ACL decisions (which magically go away on a lookup() which
populates the ACLs)
Change-Id: Ided38b4d868fab482b477ce51b4878289ef9eed0
BUG: 953694
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5156
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When taking blocking entrylks, afr orders the entrylks based on
uuid_compare of gfids of parent dirs, if they are equal then it orders
them based on the basenames. While this approach works fine, the
implementation assumes loc->gfids to be populated at the time of
the comparison, but loc may have gfid in loc->inode->gfid instead
of loc->gfid which was leading to order mismatches and dead-locks.
Fix:
Implemented loc_gfid which gives gfid by checking both loc->gfid,
loc->inode->gfid. Used this for ordering the blocking entrylks.
Change-Id: I2743fcaff3d670fbeb6b8e0a496f106a3585dde1
BUG: 965987
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5063
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of patch on master branch, under review at
http://review.gluster.org/4952
Volume op-versions calculations now take into account if an option,
a. enables/disables an xlator, or
b. is a boolean option.
This prevents op-versions from being updated when a feature is disabled.
BUG: 954256
Change-Id: Ic68032b9e55a3f0191f8fc3ecd6b5ced385ad943
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5094
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of patch on master branch, under review at
http://review.gluster.org/4866
This patch enables the open-behind by default only when the op-version
allows it. Also the volume op-version calculations take account of this
enablement.
BUG: 954256
Change-Id: Ie739bc23ba90ec2f009feecef28187912a37487c
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5095
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a backport of change
fa227c0 glusterd: Set op-version on startup based on install status
from master.
If the current installation of glusterfs doesn't have a stored
op-version and is,
a. a fresh install, then set op-version to maximum
b. an upgrade from release which didn't have op-version support, set it
to minimum.
The install status is detected using the peer-uuid.
If both peer-uuid and op-version are not present in the store, the
installation is fresh.
If peer-uuid is present, but op-version is absent in the store, the
installation has been upgraded from a version which didn't support
op-versions.
By setting the initial op-version as above, we can ensure that
a. features are not enabled accidentally during upgrades
b. a fresh install starts with all possible features enabled.
BUG: 954256
Change-Id: I5cdd0c63fd16ecfa2fede99684da6fd6167823a8
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5001
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a backport of change
9153855 glusterd: Introduce volume op-versions
from master.
Each volume is now associated with two op-versions,
* op_version - the op-version of the highest op-versioned feature enabled
* client_op_version - the op-version of the highest op-versioned feature
enabled which affects the clients only.
These two op-versions are generated dynamically and kept updated during
runtime. Glusterd now uses the respective volumes' client-op-version during
getspec requests.
To achieve the above a new field in the vme table is introduced,
client_option, this boolean field tells if the option is a client side
option.
BUG: 907311
Change-Id: I59af02644a714e1c54fc89f1ead5aa551bba7ee7
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/4957
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch backports the following changes from the master branch
99fe09f glusterd: Moved the volume entry table to a separate file.
e306d08 glusterd: Changing the volume entry table's representation.
eac54f6 glusterd: Added option description, and validation function fields.
bcb4235 glusterd: Added validation function for performance cache max and min size.
8897d08 glusterd: Added validation function for quota-timeout.
4579609 glusterd: Added validation function for stripe-block-size.
6788bad glusterd: Fix some options in vme table
549231d glusterd: Added the validation function for subvols-per-directory
9636e63 glusterd: Added description for nfs.transport-type option in volume set help.
Change-Id: I4a64ad94f17df4b45a3a32262a83e2c35fb5f7da
BUG: 907311
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/4956
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When open() with O_DIRECT happens, write-behind was being disabled for the
fd irrespective of strict_O_DIRECT option. This commit disables write-behind
only when strict_O_DIRECT is enabled.
Change-Id: Ieef180e52910c3bf64d46b26b0e5dc3b8542f6d2
BUG: 923556
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/4697
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/5108
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd could deadlock after a peer-detach command as follows,
1) glusterd_friend_cleanup function 'flushes' out messages in the rpc
layer's queue, that haven't received a response. At this point, glusterd
has already acquired the big lock.
2) The side-effect of flushing out the messages is that the
corresponding call backs are called.
Call backs themselves are executed after acquiring the big lock. This
results in the big lock being acquired in a nested manner (in the same
thread), which causes
a deadlock.
This can also happen during brick/NFS/SHD disconnect in volume-stop.
Change-Id: Iab3aad143cd8ebbab53ea0b69687f0e7627dc8a9
BUG: 965533
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5084
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I475af7f8ffd5e5d8adbd2a74af20e56ad7751f69
BUG: 958108
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4916
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/5077
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We observed that the number of write requests thus inodelks
are increasing very rapidly to thousands without write-behind
in the graph.
Change-Id: I901a6a820eb7b21b413d33e1a0a3420c7f4746a8
BUG: 928341
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4736
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I8aa4f90ba7e1eecf3f978be04f8550049275464f
BUG: 765785
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5028
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The restarting of bricks has been deffered until the cluster 'stabilizes'
itself volumes' view. Since glusterd_spawn_daemons is executed everytime
a peer 'joins' the cluster, it may inadvertently restart bricks that
were taken offline for say, maintenance purposes. This fix avoids that.
Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba
BUG: 960190
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5022
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following commits were cherry-picked from master,
044f8ce syncop: Remove task from synclock's waitq before 'wake'
cb6aeed glusterd: Give up big lock before performing any RPC
46572fe Revert "glusterd: Fix spurious wakeups in glusterd syncops"
5021e04 synctask: implement barriers around yield, not the other way
4843937 glusterd: Syncop callbks should take big lock too
Change-Id: I5ae71ab98f9a336dc9bbf0e7b2ec50a6ed42b0f5
BUG: 948686
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/4938
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5021
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I2ae30f08965b26a21db541f87de78772cb17135a
BUG: 962362
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/4996
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
BUG: 927648
Change-Id: Ic016e9d1f090372329a8a2e530dac5fc6ed6c5ae
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/4874
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before Anonymous fds are available, afr had to queue up
transactions if the file is not opened on one of its
subvolumes. This happens until the attempt to open the
file either succeeds or fails. These attempts happen
until the file is successfully opened on the subvolume.
Now client xlator uses anonymous fds to perform the fops
if the fd used for the fop is not 'opened'.
Fops will be successful even when the file is not opened
so there is no need to queue up the transactions anymore in afr.
Open is attempted on the subvolume where it is not
opened independent of the fop.
Change-Id: I6d59293023e2de41c606395028c8980b83faca3f
BUG: 953887
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4868
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
posix_fill_readdir() is a multi-step function which performs many
readdir() calls, and expects the directory cursor to have not
"seeked away" elsewhere between two successive iterations. Usually
this is not a problem as each opendir() from an application has its
own backend fd, and there is nobody else to "seek away" the directory
cursor. However in case of NFS's use of anonymous fd, the same fd_t
is shared between all NFS readdir requests, and two readdir loops can
be executing in parallel on the same dir dragging away the cursor in
a chaotic manner.
The fix in this patch is to lock on the fd around the loop. Another
approach could be to reimplement posix_fill_readdir() with a single
getdents() call, but that's for another day.
Change-Id: Ia42e9c7fbcde43af4c0d08c20cc0f7419b98bd3f
BUG: 948086
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4774
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-on: http://review.gluster.org/4963
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Historic bug - posix_writev() has been inspecting pfd->flushwrites for
performing fsync() after write, instead of @flags for O_SYNC|O_DSYNC.
pfd->flushwrites was never set anywhere and is unused completely. This
is behavior from the time before anonymous FD where open() had @wbflags
param. This is a leftover from that cleanup.
Change-Id: Id9bfe562a60db4eb3bd0a7705bdba91f2df2f3ec
BUG: 916372
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4738
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4962
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
With the present implementation, eager-lock is issued for
any fd fop. eager-lock is being transferred to metadata
transactions. But the lk-owner is set to local->fd address
only for DATA transactions, but for METADATA transactions
it is frame->root. Because of this unlock on the eager-lock fails
and rebalance hangs.
Fix:
Enable eager-lock for fd DATA transactions
This is a backport of change If30df7486a0b2f5e4150d3259d1261f81473ce8a
http://review.gluster.org/#/c/4588/
BUG: 916226
Change-Id: Id41ac17f467c37e7fd8863e0c19932d7b16344f8
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/4899
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On readv error io-cache frame->local is not set to NULL
so the local is mem_put in STACK_DESTROY as well. This
patch sets frame->local to NULL in all cases.
BUG: 955751
Change-Id: I4a7340189efe02473452986b5870b02fcfa9038e
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4886
Reviewed-by: Raghavendra G <raghavendra@gluster.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are primarily three lists that are part of glusterd process,
that are concurrently accessed. Namely, priv->volumes, priv->peers
and volinfo->bricks_list.
Big-lock approach
-----------------
WHAT IS IT?
Big lock is a coarse-grained lock which protects all three
lists, mentioned above, from racy access.
HOW DOES IT WORK?
At any given point in time, glusterd's thread(s) are in execution
_iff_ there is a preceding, inbound network event. Of course, the
sigwaiter thread and timer thread are exceptions.
A network event is an external trigger to glusterd, via the epoll
thread, in the form of POLLIN and POLLERR.
As long as we take the big-lock at all such entry points and yield
it when we are done, we are guaranteed that all the network events,
accessing the global lists, are serialised.
This amounts to holding the big lock at
- all the handlers of all the actors in glusterd. (POLLIN)
- all the cbks in glusterd. (POLLIN)
- rpc_notify (DISCONNECT event), if we access/modify
one of the three lists. (POLLERR)
In the case of synctask'ized volume operations, we must remember that,
if we held the big lock for the entire duration of the handler,
we may block other non-synctask rpc actors from executing.
For eg, volume-start would block in PMAP SIGNIN, if done incorrectly.
To prevent this, we need to yield the big lock, when we yield the
synctask, and reacquire on waking up of the synctask.
BUG: 948686
Change-Id: I429832f1fed67bcac0813403d58346558a403ce9
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/4835
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd syncops perform a barrier_wake whenever rpc_clnt_submit returned -1.
This is based on the wrong assumption that the cbkfn wasn't called.
This would result in one more wakeup than there ought to be.
BUG: 948686
Change-Id: I839fd218a81255fe50c2047d67461d45360e894d
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/4834
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If for some reason glusterd_get_brick_root() fails,
it frees the gf_strdup'ed *mount_point in its own error path,
and returns -1.
Unfortunately it already had assigned that pointer value
to the output argument, the caller function
glusterd_add_brick_detail() sees a non-NULL pointer,
and free() again: segfault.
Could be fixed with a one-liner (*mount_point = NULL)
in the error path, but I think glusterd_get_brick_root()
should only assign to the output argument once all checks passed,
so I use a local temporary pointer, which increases the patch a bit.
Change-Id: I3f3035f01e80a5e9bdf2da895e4cf7baa3dfbd2f
BUG: 919352
Signed-off-by: Lars Ellenberg <lars@linbit.com>
Reviewed-on: http://review.gluster.org/4646
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-on: http://review.gluster.org/4841
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is needed to support automated testing of cluster-communication
features such as probing and quorum. In order to use this, you need to
do the following preparatory steps.
* Copy /var/lib/glusterd to another directory for each virtual host
* Ensure that each virtual host has a different UUID in its glusterd.info
Now you can start each copy of glusterd with the following xlator-options.
* management.transport.socket.bind-address=$ip_address
* management.working-directory=$unique_working_directory
You can use 127.x.y.z addresses for binding without needing to assign
them to interfaces explicitly. Note that you must use addresses, not
names, because of some stuff in the socket code that's not worth fixing
just for this usage, but after that you can use names in /etc/hosts
instead.
At this point you can issue CLI commands to a specific glusterd using
the --remote-host option. So far probe, volume create/start/stop,
mount, and basic I/O all seem to work as expected with multiple
instances.
Change-Id: I1beabb44cff8763d2774bc208b2ffcda27c1a550
BUG: 913555
Original-author: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/4838
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
cherry-pick from:
refs/changes/16/4816/1; http://review.gluster.org/#/c/4816/
BUG: 951551
Change-Id: I3de5bd86d4238a60a0a85ba2e15d9c131969b210
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/4817
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I2911d3ac80825310f84c5ba6bd7890e65e1ee219
BUG: 950048
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/4643
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Data self-heal may choose sink iatt to set mtimes.
This happens because after syncing of data is done
self-heal does one more xattrops/fstat to determine
sources sinks to set the inode-ctx. Since this is done
after data syncing and erase of xattrs, old source and
old sink are now sources, but the mtimes of them differ.
Old code just takes the first source from the list and
update mtimes, which could be sink before the self-heal
started.
Fix:
Set mtime from 'sources before syncing'.
Change-Id: Id769e1b99aa4f041eaee775f64cbf2c57b799723
BUG: 918437
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4658
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-on: http://review.gluster.org/4663
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backporting fix http://review.gluster.org/#/c/4668/
When subvols-per-directory is < available subvols, then there are layouts
which are not populated. This leads to incorrect identification of holes or
overlaps. We need to ignore layouts, which have err == 0, and start == stop.
In the current scenario (start == stop == 0).
Additionally, in layout-merge, treat missing xattrs as err = 0. In case of
missing layouts, anomalies will reset them.
For any other valid subvoles, err != 0 in case of layouts being zeroed out.
Also reverted back dht_selfheal_dir_xattr, which does layout calculation only
on subvols which have errors.
BUG: 921408
Change-Id: I75a8edcb92af5b53b3253c9addd7a812e9242836
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4800
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backporting Avati's fix http://review.gluster.org/4711
The scheme to encode brick d_off and brick id into global d_off has
two approaches. Since both brick d_off and global d_off are both 64-bit
wide, we need to be careful about how the brick id is encoded.
Filesystems like XFS always give a d_off which fits within 32bits. So
we have another 32bits (actually 31, in this scheme, as seen ahead) to
encode the brick id - which is typically plenty.
Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off,
as the d_off is calculated based on a hash function value. This leaves
us no "unused" bits to encode the brick id.
However both these filesystmes (EXT4 more importantly) are "tolerant" in
terms of the accuracy of the value presented back in seekdir(). i.e, a
seekdir(val) actually seeks to the entry which has the "closest" true
offset.
This "two-prong" scheme exploits this behavior - which seems to be the
best middle ground amongst various approaches and has all the advantages
of the old approach:
- Works against XFS and EXT4, the two most common filesystems out there.
(which wasn't an "advantage" of the old approach as it is borken against
EXT4)
- Probably works against most of the others as well. The ones which would
NOT work are those which return HUGE d_offs _and_ NOT tolerant to
seekdir() to "closest" true offset.
- Nothing to "remember in memory" or evict "old entries".
- Works fine across NFS server reboots and also NFS head failover.
- Tolerant to seekdir() to arbitrary locations.
Algorithm:
Each d_off can be encoded in either of the two schemes. There is no
requirement to encode all d_offs of a directory or a reply-set in
the same scheme.
The topmost bit of the 64 bits is used to specify the "type" of encoding
of this particular d_off. If the topmost bit (bit-63) is 1, it indicates
that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0,
it indicates that the "small" d_off encoding scheme is used.
The goal of the "small" d_off encoding is to stay as dense as possible
towards the lower bits even in the global d_off.
The goal of the HUGE d_off encoding is to stay as accurate (close) as
possible to the "true" d_off after a round of encoding and decoding.
If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick
ID (call it "n").
SMALL d_off
===========
Encoding
--------
If the top n + 1 bits are free in a brick offset, then we leave the
top bit as 0 and set the remaining bits based on the old formula:
hi_mask = 0xffffffffffffffff
hi_mask = ~(hi_mask >> (n + 1))
if ((hi_mask & d_off_brick) != 0)
do_large_d_off_encoding ()
d_off_global = (d_off_brick * N) + brick_id
Decoding
--------
If the top bit in the global offset is 0, it indicates that this
is the encoding formula used. So decoding such a global offset will
be like the old formula:
if ((d_off_global & 0x8000000000000000) != 0)
do_large_d_off_decoding()
d_off_brick = (d_off_global % N)
brick_id = d_off_global / N
HUGE d_off
==========
Encoding
--------
If the top n + 1 bits are NOT free in a given brick offset, then we
set the top bit as 1 in the global offset. The low n bits are replaced
by brick_id.
low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N))
d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id
if (d_off_global == 0xffffffffffffffff)
discard_entry();
Decoding
--------
If the top bit in the global offset is set 1, it indicates that
the encoding formula used is above. So decoding would look like:
hi_mask = (0xffffffffffffffff << n)
low_mask = ~(hi_mask)
d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff)
brick_id = global_d_off & low_mask
If "losing" the low n bits in this decoding of d_off_brick looks
"scary", we need to realize that till recently EXT4 used to only
return what can now be expressed as (d_off_global >> 32). The extra
31 bits of hash added by EXT recently, only decreases the probability
of a collision, and not eliminate it completely, anyways. In a way,
the "lost" n bits are made up by decreasing the probability of
collision by sharding the files into N bricks / EXT directories
-- call it "hash hedging", if you will :-)
Change-Id: I9551c581c3f3d4c9e719764881036d554f60c557
Thanks-to: Zach Brown <zab@redhat.com>
BUG: 838784
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4799
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tests/basic/quota.t covers test case for this.
Patch is only for 3.4 branch, http://review.gluster.org/4495 fixes the issue
in master.
Change-Id: I92674f5413441cc896245d5b3d0925f44ce8b2d3
BUG: 919998
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/4680
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We needed to zero out the layout range, before we re-calculate the range.
When spread-count is issued, we would end up with stale ranges in the layout.
Replaced dht_selfheal_dir_xattr with dht_fix_dir_xattr, which correctly resets
the un-used (after re-cal) layouts.
Change-Id: I1a900d15df07335f59356bd23182ccec34381ab2
BUG: 884455
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4648
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Maintain a list of writes (either written behind or SYNC) which are
currently "in progress" (i.e, STACK_WIND'ed towards server) and hold
off any new STACK_WIND of write (either written behind or SYNC) which
overlaps with any of the "in progress" writes.
This is a guarantee which AFR's eager-lock depends upon (though not
strictly a write-behind requirement)
Change-Id: Icedd0b51b440366a906dc9223d62b7fd6ef2ca03
BUG: 857673
Original-author: Anand Avati <avati@redhat.com>
Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4642
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ibd963f78707b157fc4c9729aa87206cfd5ecfe81
BUG: 913662
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/4638
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Brick processes listen on all the interfaces on a given port.
When multiple glusterds run on one machine, glusterd assumes
that it 'owns' the ports on that machine. This can lead to the
different glusterd instances to step on each other's ports.
This fix ensures that brick processes listen only on the its
host IP when glusterd has bind-address option set.
Change-Id: I4c1b05643c64d3098bf56e977e768e611ffce0f5
BUG: 913662
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/4637
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 765473
Change-Id: I1ddd6ef9f5361aed96f97aa1344823836c6ddecb
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Reviewed-on: http://review.gluster.org/4630
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is illegal to call xdr_string() with a NULL string. Linux
just retruns false, NetBSD gets a SIGSEGV when xdr_string()
calls strlen(NULL)
BUG: 916439
Change-Id: Ia958470ada6e8e55a86d439922ec942d038f5f13
Original-author: Emmanuel Dreyfus <manu@netbsd.org>
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4629
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fd based operations such as readv checked only for data split brain
instead of complete split-brain (i.e both data + metadata) assuming that
open would have done the complete split-brain check. However open-behind
would have unwound open, without winding to afr thus preventing the complete
split-brain check and some appliations will be able to read the contents
of the file even though the file has metadata split-brain. So let all
the fd based fops do a defensive check of complete split-brain.
Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28
BUG: 846240
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4620
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the issue of task-id tests failing randomly. The condition used to
check rebalance/remove-brick was running was wrong, which could lead to the
task-id for these tasks to not be displayed even when the actual commit hadn't
occured.
BUG: 857330
Change-Id: I0f86c6bbe7acec586ee0ea6e663369ea26171904
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/4617
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* requests coming in as root are converted to nfsnobody
* with open-behind some acl checks wont happen and nfsnobody
can read the file "whose owner is root and other users do not
have permission to read the file". This is becasue open-behind
does not send the open to the brick and sends success to the
application, thus the acl related tests on the file wont happen
which would have prevented the file from being opened.
Change-Id: I12a3e6b2a12884d00bb81f2779074fed09b1b2e4
BUG: 887145
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4619
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|