| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement reconfigure() for NFS xlator so that volume set/reset wont
restart the NFS server process. But few options can not be reconfigured
dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be
restarted.
Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c
BUG: 1027409
Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com>
Reviewed-on: http://review.gluster.org/6236
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
.. in the systems with non-trusted server
This new functionality can be useful in various cloud technologies.
It is implemented via a special encryption/crypt translator,which
works on the client side and performs encryption and authentication;
1. Class of supported algorithms
The crypt translator can support any atomic symmetric block cipher
algorithms (which require to pad plain/cipher text before performing
encryption/decryption transform (see glossary in atom.c for
definitions). In particular, it can support algorithms with the EOF
issue (which require to pad the end of file by extra-data).
Crypt translator performs translations
user -> (offset, size) -> (aligned-offset, padded-size) ->server
(and backward), and resolves individual FOPs (write(), truncate(),
etc) to read-modify-write sequences.
A volume can contain files encrypted by different algorithms of the
mentioned class. To change some option value just reconfigure the
volume.
Currently only one algorithm is supported: AES_XTS.
Example of algorithms, which can not be supported by the crypt
translator:
1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA;
2. Symmetric block cipher algorithms with inline MACs for data
authentication.
2. Implementation notes.
a) Atomic algorithms
Since any process in a stackable file system manipulates with local
data (which can be obsoleted by local data of another process), any
atomic cipher algorithm without proper support can lead to non-POSIX
behavior. To resolve the "collisions" we introduce locks: before
performing FOP->read(), FOP->write(), etc. the process should first
lock the file.
b) Algorithms with EOF issue
Such algorithms require to pad the end of file with some extra-data.
Without proper support this will result in losing information about
real file size. Keeping a track of real file size is a responsibility
of the crypt translator. A special extended attribute with the name
"trusted.glusterfs.crypt.att.size" is used for this purpose. All files
contained in bricks of encrypted volume do have "padded" sizes.
3. Non-trusted servers and
Metadata authentication
We assume that server, where user's data is stored on is non-trusted.
It means that the server can be subjected to various attacks directed
to reveal user's encrypted personal data. We provide protection
against such attacks.
Every encrypted file has specific private attributes (cipher algorithm
id, atom size, etc), which are packed to a string (so-called "format
string") and stored as a special extended attribute with the name
"trusted.glusterfs.crypt.att.cfmt". We protect the string from
tampering. This protection is mandatory, hardcoded and is always on.
Without such protection various attacks (based on extending the scope
of per-file secret keys) are possible.
Our authentication method has been developed in tight collaboration
with Red Hat security team and is implemented as "metadata loader of
version 1" (see file metadata.c). This method is NIST-compliant and is
based on checking 8-byte per-hardlink MACs created(updated) by
FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the
following unique entities:
. file (hardlink) name;
. verified file's object id (gfid).
Every time, before manipulating with a file, we check it's MACs at
FOP->open() time. Some FOPs don't require a file to be opened (e.g.
FOP->truncate()). In such cases the crypt translator opens the file
mandatory.
4. Generating keys
Unique per-file keys are derived by NIST-compliant methods from the
a) parent key;
b) unique verified object-id of the file (gfid);
Per-volume master key, provided by user at mount time is in the root
of this "tree of keys".
Those keys are used to:
1) encrypt/decrypt file data;
2) encrypt/decrypt file metadata;
3) create per-file and per-link MACs for metadata authentication.
5. Instructions
Getting started with crypt translator
Example:
1) Create a volume "myvol" and enable encryption:
# gluster volume create myvol pepelac:/vols/xvol
# gluster volume set myvol encryption on
2) Set location (absolute pathname) of your master key:
# gluster volume set myvol encryption.master-key /home/me/mykey
3) Set other options to override default options, if needed.
Start the volume.
4) On the client side make sure that the file /home/me/mykey exists
and contains proper per-volume master key (that is 256-bit AES
key). This key has to be in hex form, i.e. should be represented
by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}.
The key should start at the beginning of the file. All symbols at
offsets >= 64 are ignored.
5) Mount the volume "myvol" on the client side:
# glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt
After successful mount the file which contains master key may be
removed. NOTE: Keeping the master key between mount sessions is in
user's competence.
**********************************************************************
WARNING! Losing the master key will make content of all regular files
inaccessible. Mount with improper master key allows to access content
of directories: file names are not encrypted.
**********************************************************************
6. Options of crypt translator
1) "master-key": specifies location (absolute pathname) of the file
which contains per-volume master key. There is no default location
for master key.
2) "data-key-size": specifies size of per-file key for data encryption
Possible values:
. "256" default value
. "512"
3) "block-size": specifies atom size. Possible values:
. "512"
. "1024"
. "2048"
. "4096" default value;
7. Test cases
Any workload, which involves the following file operations:
->create();
->open();
->readv();
->writev();
->truncate();
->ftruncate();
->link();
->unlink();
->rename();
->readdirp().
8. TODOs:
1) Currently size of IOs issued by crypt translator is restricted
by block_size (4K by default). We can use larger IOs to improve
performance.
Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee
BUG: 1030058
Signed-off-by: Edward Shishkin <edward@redhat.com>
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4667
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* migrate all the fd's on an inode to newer subvol after rebalance
* use the migration in progress flag in inode, so all the operations
on the inode can make use of it
Change-Id: Ib807a46e927a1062688fc15119c916797c52a350
BUG: 1013456
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/5891
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* inode_ctx_reset{0,1,2}() for reseting value1, value2, and both respectively
* inode_ctx_get0() - to get the first value only
* inode_ctx_set0() - to set the first value only
* inode_ctx_get1() - to get the second value only
* inode_ctx_set1() - to set the second value only
Change-Id: I4dfbdac81d6a3f4e5784e060c76edabb1692ce03
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/5890
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Special xattr names "clone" & "snapshot" can be used to create full and
linked clone of the LV images. GFID of destination posix file (to be
mapped) is passed as a value to the xattr. Destination posix file must
exist before running this operation.
These operations form a basis for offloading storage related operations
from QEMU to GlusterFS.
Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file"
Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file"
Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file"
Example:
setfattr -n clone -v <gfid-of-dest-file> /media/source
setfattr -n snapshot -v <gfid-of-dest-file> /media/source
setfattr -n merge -v "/media/sn" /media/sn
Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5626
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Volume option bd-aio controls AIO feature for BD xlator. Code taken from
posix-aio.c
Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5748
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make changes to distributed xlator to work with BD xlator. Unlike files,
a block device can't be removed when its opened. So some part of the
code were moved down to avoid this situation. Also before truncating a
BD file its BD_XATTR should be set otherwise truncate will result in
truncating posix file. So file is created with needed BD_XATTR and
truncate is invoked. Also enables BD xlator in stripe volume type.
Change-Id: If127516e261fac5fc5b137e7fe33e100bc92acc0
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5235
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current BD xlator (block backend) has a few limitations such as
* Creation of directories not supported
* Supports only single brick
* Does not use extended attributes (and client gfid) like posix xlator
* Creation of special files (symbolic links, device nodes etc) not
supported
Basic limitation of not allowing directory creation is blocking
oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM
creates multi-level directories when GlusterFS is used as storage
backend for storing VM images.
To overcome these limitations a new BD xlator with following
improvements is suggested.
* New hybrid BD xlator that handles both regular files and block device
files
* The volume will have both POSIX and BD bricks. Regular files are
created on POSIX bricks, block devices are created on the BD brick (VG)
* BD xlator leverages exiting POSIX xlator for most POSIX calls and
hence sits above the POSIX xlator
* Block device file is differentiated from regular file by an extended
attribute
* The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a
posix file to Logical Volume (LV).
* When a client sends a request to set BD_XATTR on a posix file, a new
LV is created and mapped to posix file. So every block device will
have a representative file in POSIX brick with 'user.glusterfs.bd'
(BD_XATTR) set.
* Here after all operations on this file results in LV related
operations.
For example opening a file that has BD_XATTR set results in opening
the LV block device, reading results in reading the corresponding LV
block device.
When BD xlator gets request to set BD_XATTR via setxattr call, it
creates a LV and information about this LV is placed in the xattr of the
posix file. xattr "user.glusterfs.bd" used to identify that posix file
is mapped to BD.
Usage:
Server side:
[root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2
It creates a distributed gluster volume 'bdvol' with Volume Group vg1
using posix brick /storage/vg1_info in host1 and Volume Group vg2 using
/storage/vg2_info in host2.
[root@host1 ~]# gluster volume start bdvol
Client side:
[root@node ~]# mount -t glusterfs host1:/bdvol /media
[root@node ~]# touch /media/posix
It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick
[root@node ~]# mkdir /media/image
[root@node ~]# touch /media/image/lv1
It also creates regular posix file 'lv1' in either host1:/vg1 or
host2:/vg2 brick
[root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1
[root@node ~]#
Above setxattr results in creating a new LV in corresponding brick's VG
and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size'
[root@node ~]# truncate -s5G /media/image/lv1
It results in resizig LV 'lv1'to 5G
New BD xlator code is placed in xlators/storage/bd directory.
Also add volume-uuid to the VG so that same VG can't be used for other
bricks/volumes. After deleting a gluster volume, one has to manually
remove the associated tag using vgchange <vg-name> --deltag
<trusted.glusterfs.volume-id:<volume-id>>
Changes from previous version V5:
* Removed support for delayed deleting of LVs
Changes from previous version V4:
* Consolidated the patches
* Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR.
Changes from previous version V3:
* Added support in FUSE to support full/linked clone
* Added support to merge snapshots and provide information about origin
* bd_map xlator removed
* iatt structure used in inode_ctx. iatt is cached and updated during
fsync/flush
* aio support
* Type and capabilities of volume are exported through getxattr
Changes from version 2:
* Used inode_context for caching BD size and to check if loc/fd is BD or
not.
* Added GlusterFS server offloaded copy and snapshot through setfattr
FOP. As part of this libgfapi is modified.
* BD xlator supports stripe
* During unlinking if a LV file is already opened, its added to delete
list and bd_del_thread tries to delete from this list when a last
reference to that file is closed.
Changes from previous version:
* gfid is used as name of LV
* ? is used to specify VG name for creating BD volume in volume
create, add-brick. gluster volume create volname host:/path?vg
* open-behind issue is fixed
* A replicate brick can be added dynamically and LVs from source brick
are replicated to destination brick
* A distribute brick can be added dynamically and rebalance operation
distributes existing LVs/files to the new brick
* Thin provisioning support added.
* bd_map xlator support retained
* setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and
setfattr -n user.glusterfs.bd -v "thin" creates thin LV
* Capability and backend information added to gluster volume info (and
--xml) so
that management tools can exploit BD xlator.
* tracing support for bd xlator added
TODO:
* Add support to display snapshots for a given LV
* Display posix filename for list-origin instead of gfid
Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/4809
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Remove bd_map xlator and CLI related changes.
Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5747
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
file or directory" log messages
Problem:
Messages were getting logged very frequently at log level INFO.
[2013-03-01 11:34:28.029222] I [server3_1-fops.c:1541:server_open_cbk] vol-server: 993888: OPEN (null) (--) ==> -1 (No such file or directory)
[2013-03-01 11:34:28.031579] I [server3_1-fops.c:252:server_inodelk_cbk] vol-server: 993896: INODELK (null) (--) ==> -1 (No such file or directory)
[2013-03-01 11:34:28.034041] I [server3_1-fops.c:252:server_inodelk_cbk] vol-server: 993914: INODELK (null) (--) ==> -1 (No such file or directory)
[2013-03-01 11:34:28.040435] I [server3_1-fops.c:1338:server_flush_cbk] vol-server: 993938: FLUSH -2 (--) ==> -1 (No such file or directory)
Solution:
Moved them to DEBUG log level if error number equlas to ENOENT else to ERROR log level.
It will help in decreasing the size of log files at INFO log level.
For server_open_cbk and for some other functions we already have a patch, below is the URL for it.
URL- http://review.gluster.org/6241
This patch solves logging problem for functions server_inodelk_cbk and server_flush_cbk.
Change-Id: I57372e851371e466f1674726015e28378b826f5f
BUG: 1029372
Signed-off-by: Vikhyat Umrao<vumrao@redhat.com>
Reviewed-on: http://review.gluster.org/6252
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* When a writev call occurs, the client compresses the data before
sending it to server. On the server, compressed data is decompressed.
Similarly, when a readv call occurs, the server compresses the data
before sending it to client. On the client, the compressed data is
decompressed. Thus the amount of data sent over the wire is minimized.
* Compression/Decompression is done using Zlib library.
* During normal operation, this is the format of data sent over wire :
<compressed-data> + trailer(8)
The trailer contains the CRC32 checksum and length of original
uncompressed data. This is used for validation.
HOW TO USE
----------
Turning on compression xlator:
gluster volume set <vol_name> compress on
Configurable options:
gluster volume set <vol_name> compress.compression-level 8
gluster volume set <vol_name> compress.min-size 50
Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805
BUG: 923540
Original-author : Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Prashanth Pai <nullpai@gmail.com>
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: http://review.gluster.org/3251
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bricks log
Problem:
Messages were getting logged very frequently at log level INFO.
One of the log file snippet -
[2013-10-27 00:05:01.501355] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846575: STAT (null) (--) ==> -1 (No such file or directory)
[2013-10-27 00:05:01.505101] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846577: STAT (null) (--) ==> -1 (No such file or directory)
[2013-10-27 00:05:01.507299] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846578: STAT (null) (--) ==> -1 (No such file or directory)
[2013-10-20 19:50:35.554563] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714687: OPEN <gfid:01c70ca0-1952-4e82-abee-a07205757d8e>
(01c70ca0-1952-4e82-abee-a07205757d8e) ==> -1 (No such file or directory)
[2013-10-20 19:50:35.555520] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714697: OPEN <gfid:01c70ca0-1952-4e82-abee-a07205757d8e>
(01c70ca0-1952-4e82-abee-a07205757d8e) ==> -1 (No such file or directory)
[2013-10-20 19:50:35.558292] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714712: OPEN (null) (--) ==> -1 (No such file or directory)
Solution:
Moved them to DEBUG log level if error number equlas to ENOENT else to ERROR log level.
It will help in decreasing the size of log files at INFO log level.
Change-Id: I23d74320c9c21bbce4805c20295556cc2cc0a8d8
BUG: 808073
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Reviewed-on: http://review.gluster.org/6241
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current coroutine model, mapping synctasks 1-1 with qemu internal
Coroutines, has some unresolved raciness issues. This problem usually
manifests as lifecycle mismatches between top-level (gluster created)
synctasks and the subsequently created internal coroutines from that
context. Qemu's internal queueing (and locking) can cause situations
where the top-level synctask is destroyed before the internal scheduler
has released references to memory, leading to use after free crashes
and asserts.
Simplify the coroutine model to use a single synctask as a coroutine
processor and rely on the existing native ucontext coroutine
implementation. The syncenv thread is donated to qemu and ensures a
single top-level coroutine is processed at a time. Qemu now has
complete control over coroutine scheduling.
BUG: 986775
Change-Id: I38223479a608d80353128e390f243933fc946fd6
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/6110
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add basic backing image support to the block-format mechanism. This
is a functionality checkpoint that enables the raw mechanism
required to support client driven "snapshot" and "clone" requests.
This change enhances the block-format setxattr command to support
an additional and optional backing image reference. For example:
setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:<bimg>" ./newimage
... where <bimg> refers to the backing image for unallocated blocks
in newimage. <bimg> can be provided in one of two formats:
- a gfid string in the following format (assuming a valid gfid):
<gfid:00000000-0000-0000-0000-000000000000>
- or a filename that must be resident in the same directory as the
new clone file being formatted. E.g.,
setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:baseimg" ./newimage
This latter format is more restrictive, simply provided for
convenience or until something more refined is available.
This change makes no assumptions about the backing image file and
affords no additional protection. It is up to the user/client to
recognize the relationship between the files and manage them
appropriately (i.e., no writes to the backing image, etc.).
BUG: 986775
Change-Id: I7aff7bdc59b85a6459001a6bfeae4db6bf74f703
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/5967
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
the specified range. This fop will be useful when a whole file needs to
be initialized with zero (could be useful for zero filled VM disk image
provisioning or during scrubbing of VM disk images).
Client/application can issue this FOP for zeroing out. Gluster server
will zero out required range of bytes ie server offloaded zeroing. In
the absence of this fop, client/application has to repetitively issue
write (zero) fop to the server, which is very inefficient method because
of the overheads involved in RPC calls and acknowledgements.
WRITESAME is a SCSI T10 command that takes a block of data as input and
writes the same data to other blocks and this write is handled
completely within the storage and hence is known as offload . Linux ,now
has support for SCSI WRITESAME command which is exposed to the user in
the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
implement this fop. Thus zeroing out operations can be completely
offloaded to the storage device , making it highly efficient.
The fop takes two arguments offset and size. It zeroes out 'size' number
of bytes in an opened file starting from 'offset' position.
This patch adds zerofill support to the following areas:
- libglusterfs
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
Client applications can exloit this fop by using glfs_zerofill introduced in
libgfapi.FUSE support to this fop has not been added as there is no system call
for this fop.
Changes from previous version 3:
* Removed redundant memory failure log messages
Changes from previous version 2:
* Rebased and fixed build error
Changes from previous version 1:
* Rebased for latest master
TODO :
* Add zerofill support to trace xlator
* Expose zerofill capability as part of gluster volume info
Here is a performance comparison of server offloaded zeofill vs zeroing
out using repeated writes.
[root@llmvm02 remote]# time ./offloaded aakash-test log 20
real 3m34.155s
user 0m0.018s
sys 0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20
real 4m23.043s
user 0m2.197s
sys 0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;
real 4m28.363s
user 0m0.021s
sys 0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25
real 5m34.278s
user 0m2.957s
sys 0m18.808s
The argument log is a file which we want to set for logging purpose and
the third argument is size in GB .
As we can see there is a performance improvement of around 20% with this
fop.
Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
BUG: 1028673
Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5327
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the current context "replica_cnt" is used just to know whether the
specific key exists or not by calling "dict_get_int32", which we can
replace by "dict_get ()". And changing the log message as it is more
appropriate to say "migration of data" rather than "rebalance".
This patch refactors commit 51c6fa7a354826744de98 against BZ 961669
reviewed on : http://review.gluster.org/5566
Change-Id: I48eae206a28d4083975e64407ed8fe4539f9c24b
BUG: 1027270
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Original patch: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/6001
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: susant palai <spalai@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I641d028165da7b8501bd372c62d2df89a9d4db1f
BUG: 1027174
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/6237
Reviewed-by: poornima g <pgurusid@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
nfs3_stat_to_fattr3() missed a NULL check.
FIX:
(1) Added a NULL check.
(2) In all fop cbk path, if the op_ret is -1 and op_errno is 0,
then handle it as a special case. Set the NFS3 status as
NFS3ERR_SERVERFAULT instead of NFS3_OK.
(3) The other component of FIX would be in DHT module and
is on the way.
Change-Id: I6f03c9a02d794f8b807574f2755094dab1b90c92
BUG: 1010241
Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com>
Reviewed-on: http://review.gluster.org/6026
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
remove server_ctx and locks_ctx from client_ctx directly and store as
into discrete entities in the scratch_ctx
hooking up dump will be in phase 3
BUG: 849630
Change-Id: I94cea328326db236cdfdf306cb381e4d58f58d4c
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/5678
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is (arguably) a hack to work around a bug in libvirt which is not
well behaved wrt to using TCP ports in the unreserved space between
49152-65535. (See RFC 6335)
Normally glusterd starts and binds to the first available port in range,
usually 49152. libvirt's live migration also tries to use ports in this
range, but has no fallback to use (an)other port(s) when the one it wants
is already in use.
Change-Id: Id8fe35c08b6ce4f268d46804bbb6dddab7a6b7bb
BUG: 1018178
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/6210
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ic3f9d30d75df0bbbdf8fe28446fabe62d90fa854
BUG: 1024666
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6194
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
... so that subsequent volume commands don't block waiting forever,
for the lock to be released.
Change-Id: I24b5ec47f6982900ab74ff1b492d523f31ecfb7f
BUG: 1022055
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/6122
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quota volume reset command without "force"
option fixed, doesn't fail anymore. It resets
unprotected fields and not the protected ones.
Also, an appropriate message is provided to the user
for the following cases :
1. only unprotected fields are reset, "force" option
should be used to reset protected fields.
2. Both protected and unprotected fields are reset.
3. No field was reset, "force" option required.
Test case for the same also added.
Change-Id: I24e8f1be87b79ccd81bf6f933e00608b861c7a16
BUG: 1022905
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/6135
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a limit on the total number of outstanding RPC requests
from a given cient. Once the limit is reached the client socket
is removed from POLL-IN event polling.
Change-Id: I8071b8c89b78d02e830e6af5a540308199d6bdcd
BUG: 1008301
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6114
Reviewed-by: Santosh Pradhan <spradhan@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For better NFS performance, make the default I/O size to 1MB, same as
kernel NFS. Also refactor the description for read-size, write-size
and readdir-size (i.e. it must be a multiple of 1KB but min value
is 4KB and max supported value is 1MB). On slower network, rsize/wsize
can be adjusted to 16/32/64-KB through nfs.read-size or nfs.write-size
respectively.
Change-Id: I142cff1c3644bb9f93188e4e890478177c9465e3
BUG: 1009223
Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com>
Reviewed-on: http://review.gluster.org/6103
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sayan Saha has previously approved changing everthing to dual license
but somehow we have missed changing these files.
I am explicitly not updating the copyright dates as nothing else that's
copyrightable has changed in these files with the license change
Change-Id: Ia965eeb7168447d69e28e939ad95ee388873b6e4
BUG: 951549
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/6128
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brick force.
Expectation with force is that user is aware of the consequences of
sanity checks not being triggered.
Change-Id: I79dfeed16a23829a7217cef33ab83f9f0ffae336
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
BUG: 1007509
Reviewed-on: http://review.gluster.org/5746
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterd changes:
With this patch, glusterd creates a socket file in
DATADIR/run/glusterd.socket , and listen on it for cli requests. It
listens for 2 rpc programs on the socket file,
- The glusterd cli rpc program, for all cli commands
- A reduced glusterd handshake program, just for the 'system:: getspec'
command
The location of the socket file can be changed with the glusterd option
'glusterd-sockfile'.
To retain compatibility with the '--remote-host' cli option, glusterd
also listens for the cli requests on port 24007. But, for the sake of
security, it listens using a reduced cli rpc program on the port. The
reduced rpc program only contains read-only procs used for 'volume
(info|list|status)', 'peer status' and 'system:: getwd' cli commands.
CLI changes:
The gluster cli now uses the glusterd socket file for communicating with
glusterd by default. A new option '--gluster-sock' has been added to
allow specifying the sockfile used to connect. Using the '--remote-host'
option will make cli connect to the given host & port.
Tests changes:
cluster.rc has been modified to make use of socket files and use
different log files for each glusterd.
Some of the tests using cluster.rc have been fixed.
Change-Id: Iaf24bc22f42f8014a5fa300ce37c7fc9b1b92b53
BUG: 980754
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5280
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Suggested by Anand Avati in BD xlator code review.
Change-Id: I31c353a26dfdeb3d0023c3f7e03ed25461d13c16
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
BUG: 837495
Reviewed-on: http://review.gluster.org/6077
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch avoids giving more info to the user about the
internal heuristic employed in afr, for quota sizes.
Change-Id: Ice3a164399f09b6967500ec0c17dc340e7ae9aba
BUG: 1016683
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6098
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Bail out of fops if split brain has been detected during lookup
Change-Id: Id387dbb1a25eec4a121dedceadc6069bdea24b5d
BUG: 1010834
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/5988
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I101e329cce4c1305615c63ebcb42355f6c3e85e0
BUG: 918052
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/6084
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two glusterfs clients return inconsistent errnos when the bricks of the volume
were down. Consider two gluster mounts. Mount 1 was done when the bricks were
online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
command instead of the mount script).
For any request, mount 1 will return ENOTCONN, where as mount 2 will return
ENOENT.
This happens because for the 2nd mount, a fuse would send a lookup on '/' for
any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
aggregating. So, fuse returned ENOENT, even though the errno should have been
ENOTCONN.
Change-Id: I4b7a6d84ce5153045a807fccc01485afe0377117
BUG: 1019095
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/6072
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gettimeofday() returns the current wall clock time and timezone.
Using these functions in order to measure the passage of time
(how long an operation took) therefore seems like a no-brainer.
This time suffer's from some limitations:
a. They have a low resolution: “High-performance” timing by
definition, requires clock resolutions into the microseconds
or better.
b. They can jump forwards and backwards in time: Computer
clocks all tick at slightly different rates, which causes
the time to drift. Most systems have NTP enabled which
periodically adjusts the system clock to keep them in sync
with “actual” time. The adjustment can cause the clock to
suddenly jump forward (artificially inflating your timing
numbers) or jump backwards (causing your timing calculations
to go negative or hugely positive). In such cases timer
thread could go into an infinite loop.
From 'man gettimeofday':
----------
..
..
The time returned by gettimeofday() is affected by discontinuous
jumps in the system time (e.g., if the system administrator manually
changes the system time). If you need a monotonically increasing
clock, see clock_gettime(2).
..
..
----------
Rationale:
For calculating interval timing for Timer thread, all that’s
needed should be clock as a simple counter that increments
at a stable rate.
This is necessary to avoid the jumps which are caused by using
"wall time", this counter must be monotonic that can never
“tick” backwards, ever.
Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb
BUG: 1017993
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/6070
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently to know the number of files to be healed, either user
has to go to backend and check the number of entries present in
indices/xattrop directory. But if a volume consists of large
number of bricks, going to each backend and counting the number
of entries is a time-taking task. Otherwise user can give
gluster volume heal vol-name info command but with this
approach if no. of entries are very hugh in the indices/
xattrop directory, it will comsume time.
So as a feature, new command is implemented.
Command 1: gluster volume heal vn statistics heal-count
This command will get the number of entries present in
every brick of a volume. The output displays only entries
count.
Command 2: gluster volume heal vn statistics heal-count
replica 192.168.122.1:/home/user/brickname
Here if we are concerned with just one replica.
So providing any one of the brick of a replica will get
the number of entries to be healed for that replica only.
Example:
Replicate volume with replica count 2.
Backend status:
--------------
[root@dhcp-0-17 xattrop]# ls -lia | wc -l
1918
NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy
entries so actual no. of entries to be healed are
1916.
[root@dhcp-0-17 xattrop]# pwd
/home/user/2ty/.glusterfs/indices/xattrop
Command output:
--------------
Gathering count of entries to be healed on volume volume3 has been successful
Brick 192.168.122.1:/home/user/22iu
Status: Brick is Not connected
Entries count is not available
Brick 192.168.122.1:/home/user/2ty
Number of entries: 1916
Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290
BUG: 1015990
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/6044
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"gluster volume heal volumename statistics" command gives the summary
of the afr crawl done based on the entries present in the xattrop
directory. Whenever afr crawls are attempted, the beginning time of
crawl, end time of crawl, no of files healed, heal-failed count and
number of files in split brain are shown along with the type of the
crawl. If crawl is already in progress then it will give the number
of files healed, heal failed count and number of files in split-brain
from the beginning of the crawl and instead of telling the end time of
the crawl, "CRAWL IN PROGRESS" message will be shown.
Output format:
command: "gluster volume heal volume-name statistics"
Output:
Gathering afr crawl statistics crawl statistics on volume volume-name
has been successful
------------------------------------------------
Crawl statistics for brick no 0
Hostname of brick 192.168.122.248
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
------------------------------------------------
Crawl statistics for brick no 1
Hostname of brick 192.168.122.1
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
--------------------------------------------------
Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9
BUG: 949400
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4790
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quota size xattrs are not maintained by afr. There is a
possibility that they differ even when both the directory
changelog xattrs suggest everything is fine. So if there is at
least one 'source' check among the sources which has the maximum
quota size. Otherwise check among all the available ones for
maximum quota size. This way if there is a source and stale copies
it always votes for the 'source'.
Change-Id: Ia222379cbafa7043dd03f533c105860f2c7b8b0d
BUG: 1016683
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6052
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Varun Shastry <vshastry@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bdrv_init() is intended to be invoked only once. If invoked again
after initialization (i.e., due to graph changes), the block driver
registration code can reinsert entries into bdrv_drivers,
effectively corrupting the NULL terminated linked list. This
manifests as an infinite loop for callers attempt to traverse the
list (to locate a driver).
BUG: 986775
Change-Id: I2f95bda3c753dece4648cdad009d0e2a2d16f644
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/6021
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
oVirt's Gluster Integration needs an inexpensive command that can be
executed every 10 seconds to monitor async tasks and their parameters,
for all volumes.
The solution involves adding a 'tasks' sub-command to 'volume status'
to fetch only the async task IDs, type and other relevant parameters.
Only the originator glusterd participates in this command as all the
information needed is available on all the nodes. This is to make the
command suitable for being executed every 10 seconds.
Change-Id: I1edc607baf29b001a5585079dec681d7c641b3d1
BUG: 1012346
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/6006
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bug: 1015184
Issue: showmount timesout on fetching export list. Socket writev
function is failing.
Cause: XDR encoding of export list is failing. The calling function
without checking the error returned by xdr_serialize_exports
function is going ahead and writting into the socket causing
the NFS process to hang. xdr_serialize_exports function returns
-1 on error and message length on success.
Fix: Caller should check if the function is returning -1 (error) or
not before proceeding.
Change-Id: Ic3a5a9356e47b2ac938dd3e429cf2b71c0a0c715
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/6030
Reviewed-by: Santosh Pradhan <spradhan@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 764655
Change-Id: I2aaec9de617b0616525ad30f82ac6f75a6446d33
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/6036
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds node uuid in rebalance/remove-brick status xml output.
Output XML will look like
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
<opRet>0</opRet>
<opErrno>0</opErrno>
<opErrstr/>
<volRebalance>
<op>3</op>
<nodeCount>1</nodeCount>
<node>
<nodeName>localhost</nodeName>
==>> <id>883626f8-4d29-4d02-8c5d-c9f48c5b2445</id>
<files>0</files>
<size>0</size>
<lookups>0</lookups>
<failures>0</failures>
<status>3</status>
<statusStr>completed</statusStr>
</node>
<aggregate>
<files>0</files>
<size>0</size>
<lookups>0</lookups>
<failures>0</failures>
<status>3</status>
<statusStr>completed</statusStr>
</aggregate>
</volRebalance>
</cliOutput>
Change-Id: I5a1d4f9043b33b9e88150647a243ddb16154e843
BUG: 1012296
Signed-off-by: Bala.FA <barumuga@redhat.com>
Reviewed-on: http://review.gluster.org/6005
Reviewed-by: Kaushal M <kaushal@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'-' can be present in a volume. This may lead to domain
collisions in future.
Tests:
Checked in gdb that domain comes with ':' separator:
Breakpoint 1, pl_common_inodelk (frame=0x7fdabcce51a4,
this=0x8bde20, volume=0x8b50d0 "r2-replicate-0:self-heal",
inode=0x7fdab822f0e8, cmd=6, flock=0x7fdabc76eee4,
loc=0x7fdabc76ede4, fd=0x0, xdata=0x7fdabc6e0ab0) at inodelk.c:597
Change-Id: I4456ae35ac8bf21e6361c34e9ad437f744a2e84b
BUG: 993981
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6025
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I8ff6b48ef41fd6e9ea68c42dfb9878f8a08ed627
BUG: 1010874
Signed-off-by: Ajeet Jha <ajha@redhat.com>
Reviewed-on: http://review.gluster.org/5989
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
connection_cleanup creates a frame solely so that it can be copied.
But the process of copying creates a new frame, and there's nothing
apparently magic about the one being copied; we can eliminate an
unnecessary create and destroy.
Which in the grand scheme of things probably isn't the worst thing
we do, but it's low hanging fruit.
Change-Id: I4a23b84a53e086137b7d4167ad8c20b673d1ffe5
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/6019
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
"unlock failed on 1 unlock" seems meaningless in the log message. Improved it.
Change-Id: If67d3f9d4aa5310d0b6728a6c89fa58a5cc93d12
BUG: 1012947
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/6012
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The volume op-versions are calculated during a volume set/reset, reading a
volume from disk and importing a volume during probe or volume sync. The
calculation of the volume op-version depends on the clusters op-version as some
features are enabled automatically depending on the clusters op-version. We
also don't store the volume op-versions persistently and don't export the
volume op-versions during sync. Due to this, there can occur cases which will
lead to inconsistencies in volumes in different peers. One such case is below,
Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N.
The cluster has two volumes V1 and V2, which have volume op-versions N (since
volume op-version cannot be greater than cluster op-version). We have,
Cluster-op-version = N
V1 op-version = N
V2 op-version = N
A set operation on V1 causes the clusters op-version to be bumped up to N+1.
Assume that there exist some features that are automatically enabled on
op-version N+1. The op-version of V2 remains at N as no operation has been
performed on it. So,
Cluster op-version = N+1
V1 op-version = N+1
V2 op-version = N
Now, we probe a new peer P4. On the new peer we will have the following
op-versions,
Cluster op-version = N+1
V1 op-version = N+1
V2 op-version = N+1
This happens because we don't send volume op-versions during the sync after
probe. P4 will freshly calculate the op-version of V2 (assuming features have
been auto enabled due to the cluster op-version being N+1) as N+1.
Another case is when glusterd on a peer restarts. Assume P3 was restarted,
glusterd will recalculate the volume op-versions during the restore state.
Again, op-version of V2 will be calculated as N+1 assuming auto enabled
features. This will lead to inconsistency in the volume representation in
memory and on disk, as glusterd will assume the volume contains auto enabled
features, but the volfiles don't contain them as they were not regenrated.
These kind of issues can be solved by calculating the volume op-version only
when features are enabled and disabled (ie. during volume set/reset),
persisting the volume-op-versions and exporting/importing them.
Change-Id: I52de0668c92628622e85f4588fb28829a7231132
BUG: 1005043
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5568
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterd would not store all the volumes when a global options were set.
When setting a global option, like 'nfs.*' options, glusterd used to
modify the volinfo for all the volumes, but would store only the volinfo
for the named volume. This lead to mismatch in the persisted and the
in-memory representation of those volumes, which lead to problems like
peers being rejected because of volume mismatches.
Change-Id: I8bca10585e34b7135cb32af0055dbd462b3fb9b5
BUG: 1012400
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/6007
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I27f5f7cd54115d7b236b42f6beaaa05a8b379dd7
BUG: 1010153
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/5978
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
change in encoding method of changelog was critical section for
"fop dispatch thread", "roll-over thread" and "reconfigure dispatch thread".
In this patch the "encoding-method" is changed by the reconfigure dispatch thread
lazily during handle_change, which solves the concurrency among the racing
threads.
BUG: 1002940
Change-Id: I78c3e8887efa46d0fcc60755cdf4243031cfa3eb
Signed-off-by: Ajeet Jha <ajha@redhat.com>
Reviewed-on: http://review.gluster.org/5844
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
|