| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevent mistaking the "compress" options for storage (at rest)
compression. The cdc-xlator is implemented to support compressing of
network traffic (READ and WRITE FOPs).
URL: http://www.gluster.org/community/documentation/index.php/Features/On-Wire_Compression_+_Decompression
Master-Change-Id: I9fedf4106dcb226d135ab92e4b533aff284881d7
Master-Reviewed-on: http://review.gluster.org/6765
Change-Id: Ib882af855b36df93fac46236c349c33dd4c3ced4
BUG: 1053670
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/6773
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A dictionary was added to store additional information of a rebalance
process, like the bricks being removed in case of a rebalance started
by remove-brick. This dictionary wasn't being stored/restored or synced
during volume sync, leading to errors like a volume status command
failing. These issues have been fixed in this patch. The rebalance dict
is now stored/restored and also exported/imported during volume sync.
Also, this makes sure that the rebalance dict is only create on
remove-brick start. This adds a bricks decommissioned status to the
information imported/exported during volume sync.
BUG: 1040809
Change-Id: I46cee3e4e34a1f3266a20aac43368854594f01bc
Signed-off-by: Kaushal M <kaushal@redhat.com>
Backport-of: http://review.gluster.org/6492
Reviewed-on: http://review.gluster.org/6524
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/6522
Change-Id: Ib316897dcbd0748bfb3bfcda186b9fe30c07f80f
BUG: 1038051
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6570
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/6521
Add glusterd_volinfo_remove(..) which removes @volinfo from the list
of volumes in the cluster and performs an unref on @volinfo
Change-Id: I5f546ca58f61bc334ab1bab4c51c4a21e1f66161
BUG: 1038051
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6569
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/6525
kill(2) returns -1 with errno set to ESRCH when the pid of the process
being killed doesn't exist. Failing glusterd_brick_stop on a stopped
brick could result in volume-stop failing, in commit phase.
This fix prevents that from happening.
Change-Id: I00f46fa06e489a671efbb8e4119f545f8ccea329
BUG: 1038051
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6568
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/6531
Trying to stop rebalance process via RPC using the GD_SYNCOP macro,
could lead to glusterd crashing. In case of an implicit volume update,
which happens when a peer comes back up, the stop function would be
called in the epoll thread. This would lead to glusterd crashing as the
epoll thread doesn't have synctasks for the GD_SYNCOP macro to make use
of.
Instead of using the RPC method, we now terminate the rebalance process
by kill(). The rebalance process has been designed to be resistant to
interruption, so this will not lead to any data corruption.
Also, when checking for stale rebalance task, make sure that the old
task-id is not null.
Change-Id: I54dd93803954ee55316cc58b5877f38d1ebc40b9
BUG: 1044327
Signed-off-by: Kaushal M <kaushal@redhat.com>
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6567
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/5512
rpc:
- On a RPC_TRANSPORT_CLEANUP event, rpc_clnt_notify calls the registered
notifyfn with a RPC_CLNT_DESTROY event. The notifyfn should properly
cleanup the saved mydata on this event.
- Break the reconnect chain when an rpc client is disabled. This will
prevent new disconnect events which can lead to crashes.
glusterd:
- Added support for RPC_CLNT_DESTROY in glusterd_brick_rpc_notify
- Use a common glusterd_rpc_clnt_unref() function throught glusterd in
place of rpc_clnt_unref(). This function correctly gives up the
big-lock before performing the unref.
Change-Id: I93230441c5089039643fc9f5632477ef1b695348
BUG: 962619
Signed-off-by: Kaushal M <kaushal@redhat.com>
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6566
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/6492
A dictionary was added to store additional information of a rebalance
process, like the bricks being removed in case of a rebalance started
by remove-brick. This dictionary wasn't being stored/restored or synced
during volume sync, leading to errors like a volume status command
failing. These issues have been fixed in this patch. The rebalance dict
is now stored/restored and also exported/imported during volume sync.
Also, this makes sure that the rebalance dict is only create on
remove-brick start. This adds a bricks decommissioned status to the
information imported/exported during volume sync.
Change-Id: I56fed23dc2de80a96648055fe705e9c3ffd55227
BUG: 1040809
Signed-off-by: Kaushal M <kaushal@redhat.com>
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6565
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/6334
Glusterd will now correctly copy existing rebalance information when a
volinfo is updated during volume sync. If the existing rebalance
information was stale, then any existing rebalance process will be
termimnated. A new rebalance process will be started only if there is no
existing rebalance process. The rebalance process will not be started if
the existing rebalance session had completed, failed or been stopped.
Change-Id: I68c5984267c188734da76770ba557662d4ea3ee0
BUG: 1036464
Signed-off-by: Kaushal M <kaushal@redhat.com>
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6564
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/6230
Previously, glusterd used to just send back the local status of a task
in a 'volume status [tasks]' command. As the rebalance operation is
distributed and asynchronus, this meant that different peers could give
different status values for a rebalance or remove-brick task.
With this patch, all the peers will send back the tasks status as a part
of the 'volume status' commit op, and the origin peer will aggregate
these to arrive at a final status for the task.
The aggregation is only done for rebalance or remove-brick tasks. The
replace-brick task will have the same status on all the peers (see
comment in glusterd_volume_status_aggregate_tasks_status() for more
information) and need not be aggregated.
The rebalance process has 5 states,
NOT_STARTED - rebalance process has not been started on this node
STARTED - rebalance process has been started and is still running
STOPPED - rebalance process was stopped by a 'rebalance/remove-brick
stop' command
COMPLETED - rebalance process completed successfully
FAILED - rebalance process failed to complete successfully
The aggregation is done using the following precedence,
STARTED > FAILED > STOPPED > COMPLETED > NOT_STARTED
The new changes make the 'volume status tasks' command a distributed
command as we need to get the task status from all peers.
The following tests were performed,
- Start a remove-brick task and do a status command on a peer which
doesn't have the brick being removed. The remove-brick status was
given correctly as 'in progress' and 'completed', instead of 'not
started'
- Start a rebalance task, run the status command. The status moved to
'completed' only after rebalance completed on all nodes.
Also, change the CLI xml output code for rebalance status to use the
same algorithm for status aggregation.
Change-Id: Ifd4aff705aa51609a612d5a9194acc73e10a82c0
BUG: 1027094
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
http://review.gluster.org/6230
Reviewed-on: http://review.gluster.org/6562
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iff305023577ff92a8f43f24dafcf201f86805769
BUG: 1038051
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/6424
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the endpoint of an RPC is not connected, the callback is called
synchronously within rpc_clnt_submit(). Since callbacks typically
hold the big lock, give up the big lock before calling rpc_clnt_submit
and acquire it freshly after the call.
Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b
BUG: 1037849
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6415
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... which may be grouped under the following categories:
1. Fix incorrect cli exit status for 'quota list' cmd
2. Print appropriate error message on quota parse errors in cli
Authored by: Anuradha Talur <atalur@redhat.com>
3. glusterd: Improve quota validation during stage-op
4. Fix peer probe issues resulting from quota conf checksum mismatches
5. Enhancements to CLI output in the event of quota command failures
Authored by: Kaushal Madappa <kmadappa@redhat.com>
7. Move aux mount location from /tmp to /var/run/gluster
Authored by: Krishnan Parthasarathi <kparthas@redhat.com>
8. Fix performance issues in quota limit-usage
Authored by: Krutika Dhananjay <kdhananj@redhat.com>
Note: Some functions that were used in earlier version of quota,
that aren't called anymore have been removed.
Change-Id: I963d4145f3ecdfe30c61bfa8920baccb33d2d4bd
BUG: 969461
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/6386
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
Quota directory limit configuration is stored in the xattrs. When a new brick
is added these 'limit-set' xattrs have to be created to the directory in the
new brick. This is done by the dht directory healing when the directory is
created in the new brick. Since 'root' directory is already created DHT doesn't
heal the limit-set xattr root.
Solution:
When the add-brick command is issued run the below hook script to heal the
'limit-set' xattr. The hook script does the following only if limit is
configured on root.
1. Create an auxiliary mount.
2. getxattr 'limit-set' on the root
3. setxattr the same value on the root
But this script needs the volume to be started to make the auxiliary mount.
To handle the case when the add-brick is issued when the volume was stopped,
symlink is created by the 'master' script to the corresponding location and
these two are by default disabled.
So, a 'master' script is added in the add-brick/pre. When add-brick command is
issued, it enables one of the scripts mentioned above based on the condition,
if volume is started - enable add-brick/post script
else - enable start/post script
After the actual script completes its job, it disables itself.
Note:
The enabling and disabling of the script is based on the glusterd's logic, that
it only runs the scripts which starts its name with 'S'. So,
Enable - symlink the file to 'S'*
Disable - unlink the symlink.
Change-Id: I2d3947a4d686c54417ec95f530af3bdd3444f4e2
BUG: 969461
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/6104
Reviewed-by: Brian Foster <bfoster@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously node-name is set to point to node-uuid which could cause
memory leak. This is fixed by having memory copy of node-uuid.
BUG: 1012296
Change-Id: I3b638ec289d5b167c6e752ef1ba41f41efacb9da
Signed-off-by: Bala.FA <barumuga@redhat.com>
Reviewed-on: http://review.gluster.org/6330
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
re-work.
Following are the cli commands that are new/re-worked:
======================================================
volume quota <VOLNAME> {enable|disable|list [<path> ...]|remove <path>| default-soft-limit <percent>} |
volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} |
volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>}
volume status [all | <VOLNAME> [nfs|shd|<BRICK>|quotad]] [detail|clients|mem|inode|fd|callpool]
volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history]
glusterd changes:
=================
* Quota limits are now set as extended attributes by glusterd from
the aux mount created by the cli.
* The gfids of the directories on which quota limits are set
for a given volume are stored in
/var/lib/glusterd/vols/<volname>/quota.conf file in binary format,
and whose cksum and version is stored in
/var/lib/glusterd/vols/<volname>/quota.cksum.
Original-author: Krutika Dhananjay <kdhananj@redhat.com>
Original-author: Krishnan Parthasarathi <kparthas@redhat.com>
BUG: 969461
Change-Id: If32bba36c67f9c2a30417af9c6389045b2b7c13b
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/6003
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
what?
=====
The following is an attempt to generate the paths of a file when
only its gfid is known.
To find the path of a directory, the symlink handle to the
directory maintained in the ".glusterfs" backend directory is
read. The symlink handle is generated using the gfid of the
directory. It (handle) contains the directory's name and parent
gfid, which are used to recursively construct the absolute path as
seen by the user from the mount point.
A similar approach cannot be used for a regular file or a symbolic
link since its hardlink handle, generated using its gfid, doesn't
contain its parent gfid and basename. So xattrs are set to store
the parent gfids and the number of hardlinks to a file or a
symlink having the same parent gfid. When an user/application
requests for the paths of a regular file or a symlink with
multiple hardlinks, using the parent gfids stored in the xattrs,
the paths of the parent directories are generated as mentioned
earlier. The base names of the hardlinks (with the same parent
gfid) are determined by matching the actual backend inode numbers
of each entry in the parent directory with that of the hardlink
handle.
Xattr is set on a regular file, link, and symbolic link as
follows, Xattr name : trusted.pgfid.<pargfidstr> Xattr value :
<number of hardlinks to a regular file/symlink with the same
parentgfid>
If a regular file, hard link, symbolic link is created then an
xattr in the above format is set in the backend.
how to use?
===========
This functionality can be used through getxattr interface. Two
keys - glusterfs.ancestry.dentry and glusterfs.ancestry.path - enable
usage of this functionality. A successful getxattr will have the
result stored under same keys. Values will be,
glusterfs.ancestry.dentry:
--------------------------
A linked list of gf-dirent structures for all possible paths from
root to this gfid. If there are multiple paths, the linked-list
will be a series of paths one after another. Each path will be a
series of dentries representing all components of the path. This
key is primarily for internal usage within glusterfs.
glusterfs.ancestry.path:
------------------------
A string containing all possible paths from root to this gfid.
Multiple hardlinks of a file or a symlink are displayed as a colon
seperated list (this could interfere with path components
containing ':').
e.g. If there is a file "file1" in root directory with two hardlinks,
"/dir2/link2tofile1" and "/dir1/link1tofile1", then
[root@alpha gfsmntpt]# getfattr -n glusterfs.ancestry.path -e text
file1
glusterfs.ancestry.path="/file1:/dir2/link2tofile1:/dir1/link1tofile1"
Thanks Amar, Avati and Venky for the inputs.
Original Author: Ramana Raja <rraja@redhat.com>
BUG: 990028
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Change-Id: I0eaa9101e333e0c1f66ccefd9e95944dd4a27497
Reviewed-on: http://review.gluster.org/5951
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 955548
Change-Id: Iae410712e7e6d7a76cd537c77f1919e3b4cdf6bb
Signed-off-by: Bala.FA <barumuga@redhat.com>
Reviewed-on: http://review.gluster.org/6328
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
BUG: 1028673
Change-Id: I9ba8e3e6cf2f888640b4d2a2eb934a27ff903c42
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/6290
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the Coverity issue introduced in RFE: NFS volume set/reset
commit i.e. http://review.gluster.org/6236
Change-Id: I817b9da03a3ce7f5511303faea0c50dfdad60ff4
BUG: 1027409
Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com>
Reviewed-on: http://review.gluster.org/6307
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gluster was starting rebalance processes on peers where it wasn't
required in two cases.
- For a normal rebalance command on a volume, rebalance processes were
started on all peers instead of just the peers which contain bricks of
the volume
- For rebalance process being restarted by a volume sync, caused by a
new peer being probed or a peer restarting, rebalance processes were
started on all peers, for both a normal rebalance and for remove-brick
needing rebalance.
This patch adds a new check before starting rebalance process in the
above two cases.
- For rebalance process required by a rebalance command, each peer will
check if it contains atleast one brick of the volume
- For rebalance process required by a remove-brick command, each peer
will check if it contains atleast one of the bricks being removed
Change-Id: I512da16994f0d5482889c3a009c46dc20a8a15bb
BUG: 1031887
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/6301
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I2210f1ac7de04c6025c0ec02d998b626d41466ae
BUG: 1028672
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6303
Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds <peerid> tag to bricks and nfs/shd like services to
volume status xml output.
BUG: 955548
Change-Id: I9aaa9266e4d56f632235eaeef565e92d757c0694
Signed-off-by: Bala.FA <barumuga@redhat.com>
Reviewed-on: http://review.gluster.org/6162
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement reconfigure() for NFS xlator so that volume set/reset wont
restart the NFS server process. But few options can not be reconfigured
dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be
restarted.
Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c
BUG: 1027409
Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com>
Reviewed-on: http://review.gluster.org/6236
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
.. in the systems with non-trusted server
This new functionality can be useful in various cloud technologies.
It is implemented via a special encryption/crypt translator,which
works on the client side and performs encryption and authentication;
1. Class of supported algorithms
The crypt translator can support any atomic symmetric block cipher
algorithms (which require to pad plain/cipher text before performing
encryption/decryption transform (see glossary in atom.c for
definitions). In particular, it can support algorithms with the EOF
issue (which require to pad the end of file by extra-data).
Crypt translator performs translations
user -> (offset, size) -> (aligned-offset, padded-size) ->server
(and backward), and resolves individual FOPs (write(), truncate(),
etc) to read-modify-write sequences.
A volume can contain files encrypted by different algorithms of the
mentioned class. To change some option value just reconfigure the
volume.
Currently only one algorithm is supported: AES_XTS.
Example of algorithms, which can not be supported by the crypt
translator:
1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA;
2. Symmetric block cipher algorithms with inline MACs for data
authentication.
2. Implementation notes.
a) Atomic algorithms
Since any process in a stackable file system manipulates with local
data (which can be obsoleted by local data of another process), any
atomic cipher algorithm without proper support can lead to non-POSIX
behavior. To resolve the "collisions" we introduce locks: before
performing FOP->read(), FOP->write(), etc. the process should first
lock the file.
b) Algorithms with EOF issue
Such algorithms require to pad the end of file with some extra-data.
Without proper support this will result in losing information about
real file size. Keeping a track of real file size is a responsibility
of the crypt translator. A special extended attribute with the name
"trusted.glusterfs.crypt.att.size" is used for this purpose. All files
contained in bricks of encrypted volume do have "padded" sizes.
3. Non-trusted servers and
Metadata authentication
We assume that server, where user's data is stored on is non-trusted.
It means that the server can be subjected to various attacks directed
to reveal user's encrypted personal data. We provide protection
against such attacks.
Every encrypted file has specific private attributes (cipher algorithm
id, atom size, etc), which are packed to a string (so-called "format
string") and stored as a special extended attribute with the name
"trusted.glusterfs.crypt.att.cfmt". We protect the string from
tampering. This protection is mandatory, hardcoded and is always on.
Without such protection various attacks (based on extending the scope
of per-file secret keys) are possible.
Our authentication method has been developed in tight collaboration
with Red Hat security team and is implemented as "metadata loader of
version 1" (see file metadata.c). This method is NIST-compliant and is
based on checking 8-byte per-hardlink MACs created(updated) by
FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the
following unique entities:
. file (hardlink) name;
. verified file's object id (gfid).
Every time, before manipulating with a file, we check it's MACs at
FOP->open() time. Some FOPs don't require a file to be opened (e.g.
FOP->truncate()). In such cases the crypt translator opens the file
mandatory.
4. Generating keys
Unique per-file keys are derived by NIST-compliant methods from the
a) parent key;
b) unique verified object-id of the file (gfid);
Per-volume master key, provided by user at mount time is in the root
of this "tree of keys".
Those keys are used to:
1) encrypt/decrypt file data;
2) encrypt/decrypt file metadata;
3) create per-file and per-link MACs for metadata authentication.
5. Instructions
Getting started with crypt translator
Example:
1) Create a volume "myvol" and enable encryption:
# gluster volume create myvol pepelac:/vols/xvol
# gluster volume set myvol encryption on
2) Set location (absolute pathname) of your master key:
# gluster volume set myvol encryption.master-key /home/me/mykey
3) Set other options to override default options, if needed.
Start the volume.
4) On the client side make sure that the file /home/me/mykey exists
and contains proper per-volume master key (that is 256-bit AES
key). This key has to be in hex form, i.e. should be represented
by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}.
The key should start at the beginning of the file. All symbols at
offsets >= 64 are ignored.
5) Mount the volume "myvol" on the client side:
# glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt
After successful mount the file which contains master key may be
removed. NOTE: Keeping the master key between mount sessions is in
user's competence.
**********************************************************************
WARNING! Losing the master key will make content of all regular files
inaccessible. Mount with improper master key allows to access content
of directories: file names are not encrypted.
**********************************************************************
6. Options of crypt translator
1) "master-key": specifies location (absolute pathname) of the file
which contains per-volume master key. There is no default location
for master key.
2) "data-key-size": specifies size of per-file key for data encryption
Possible values:
. "256" default value
. "512"
3) "block-size": specifies atom size. Possible values:
. "512"
. "1024"
. "2048"
. "4096" default value;
7. Test cases
Any workload, which involves the following file operations:
->create();
->open();
->readv();
->writev();
->truncate();
->ftruncate();
->link();
->unlink();
->rename();
->readdirp().
8. TODOs:
1) Currently size of IOs issued by crypt translator is restricted
by block_size (4K by default). We can use larger IOs to improve
performance.
Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee
BUG: 1030058
Signed-off-by: Edward Shishkin <edward@redhat.com>
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4667
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Special xattr names "clone" & "snapshot" can be used to create full and
linked clone of the LV images. GFID of destination posix file (to be
mapped) is passed as a value to the xattr. Destination posix file must
exist before running this operation.
These operations form a basis for offloading storage related operations
from QEMU to GlusterFS.
Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file"
Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file"
Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file"
Example:
setfattr -n clone -v <gfid-of-dest-file> /media/source
setfattr -n snapshot -v <gfid-of-dest-file> /media/source
setfattr -n merge -v "/media/sn" /media/sn
Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5626
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Volume option bd-aio controls AIO feature for BD xlator. Code taken from
posix-aio.c
Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5748
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current BD xlator (block backend) has a few limitations such as
* Creation of directories not supported
* Supports only single brick
* Does not use extended attributes (and client gfid) like posix xlator
* Creation of special files (symbolic links, device nodes etc) not
supported
Basic limitation of not allowing directory creation is blocking
oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM
creates multi-level directories when GlusterFS is used as storage
backend for storing VM images.
To overcome these limitations a new BD xlator with following
improvements is suggested.
* New hybrid BD xlator that handles both regular files and block device
files
* The volume will have both POSIX and BD bricks. Regular files are
created on POSIX bricks, block devices are created on the BD brick (VG)
* BD xlator leverages exiting POSIX xlator for most POSIX calls and
hence sits above the POSIX xlator
* Block device file is differentiated from regular file by an extended
attribute
* The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a
posix file to Logical Volume (LV).
* When a client sends a request to set BD_XATTR on a posix file, a new
LV is created and mapped to posix file. So every block device will
have a representative file in POSIX brick with 'user.glusterfs.bd'
(BD_XATTR) set.
* Here after all operations on this file results in LV related
operations.
For example opening a file that has BD_XATTR set results in opening
the LV block device, reading results in reading the corresponding LV
block device.
When BD xlator gets request to set BD_XATTR via setxattr call, it
creates a LV and information about this LV is placed in the xattr of the
posix file. xattr "user.glusterfs.bd" used to identify that posix file
is mapped to BD.
Usage:
Server side:
[root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2
It creates a distributed gluster volume 'bdvol' with Volume Group vg1
using posix brick /storage/vg1_info in host1 and Volume Group vg2 using
/storage/vg2_info in host2.
[root@host1 ~]# gluster volume start bdvol
Client side:
[root@node ~]# mount -t glusterfs host1:/bdvol /media
[root@node ~]# touch /media/posix
It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick
[root@node ~]# mkdir /media/image
[root@node ~]# touch /media/image/lv1
It also creates regular posix file 'lv1' in either host1:/vg1 or
host2:/vg2 brick
[root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1
[root@node ~]#
Above setxattr results in creating a new LV in corresponding brick's VG
and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size'
[root@node ~]# truncate -s5G /media/image/lv1
It results in resizig LV 'lv1'to 5G
New BD xlator code is placed in xlators/storage/bd directory.
Also add volume-uuid to the VG so that same VG can't be used for other
bricks/volumes. After deleting a gluster volume, one has to manually
remove the associated tag using vgchange <vg-name> --deltag
<trusted.glusterfs.volume-id:<volume-id>>
Changes from previous version V5:
* Removed support for delayed deleting of LVs
Changes from previous version V4:
* Consolidated the patches
* Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR.
Changes from previous version V3:
* Added support in FUSE to support full/linked clone
* Added support to merge snapshots and provide information about origin
* bd_map xlator removed
* iatt structure used in inode_ctx. iatt is cached and updated during
fsync/flush
* aio support
* Type and capabilities of volume are exported through getxattr
Changes from version 2:
* Used inode_context for caching BD size and to check if loc/fd is BD or
not.
* Added GlusterFS server offloaded copy and snapshot through setfattr
FOP. As part of this libgfapi is modified.
* BD xlator supports stripe
* During unlinking if a LV file is already opened, its added to delete
list and bd_del_thread tries to delete from this list when a last
reference to that file is closed.
Changes from previous version:
* gfid is used as name of LV
* ? is used to specify VG name for creating BD volume in volume
create, add-brick. gluster volume create volname host:/path?vg
* open-behind issue is fixed
* A replicate brick can be added dynamically and LVs from source brick
are replicated to destination brick
* A distribute brick can be added dynamically and rebalance operation
distributes existing LVs/files to the new brick
* Thin provisioning support added.
* bd_map xlator support retained
* setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and
setfattr -n user.glusterfs.bd -v "thin" creates thin LV
* Capability and backend information added to gluster volume info (and
--xml) so
that management tools can exploit BD xlator.
* tracing support for bd xlator added
TODO:
* Add support to display snapshots for a given LV
* Display posix filename for list-origin instead of gfid
Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/4809
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Remove bd_map xlator and CLI related changes.
Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5747
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* When a writev call occurs, the client compresses the data before
sending it to server. On the server, compressed data is decompressed.
Similarly, when a readv call occurs, the server compresses the data
before sending it to client. On the client, the compressed data is
decompressed. Thus the amount of data sent over the wire is minimized.
* Compression/Decompression is done using Zlib library.
* During normal operation, this is the format of data sent over wire :
<compressed-data> + trailer(8)
The trailer contains the CRC32 checksum and length of original
uncompressed data. This is used for validation.
HOW TO USE
----------
Turning on compression xlator:
gluster volume set <vol_name> compress on
Configurable options:
gluster volume set <vol_name> compress.compression-level 8
gluster volume set <vol_name> compress.min-size 50
Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805
BUG: 923540
Original-author : Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Prashanth Pai <nullpai@gmail.com>
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: http://review.gluster.org/3251
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the current context "replica_cnt" is used just to know whether the
specific key exists or not by calling "dict_get_int32", which we can
replace by "dict_get ()". And changing the log message as it is more
appropriate to say "migration of data" rather than "rebalance".
This patch refactors commit 51c6fa7a354826744de98 against BZ 961669
reviewed on : http://review.gluster.org/5566
Change-Id: I48eae206a28d4083975e64407ed8fe4539f9c24b
BUG: 1027270
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Original patch: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/6001
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: susant palai <spalai@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is (arguably) a hack to work around a bug in libvirt which is not
well behaved wrt to using TCP ports in the unreserved space between
49152-65535. (See RFC 6335)
Normally glusterd starts and binds to the first available port in range,
usually 49152. libvirt's live migration also tries to use ports in this
range, but has no fallback to use (an)other port(s) when the one it wants
is already in use.
Change-Id: Id8fe35c08b6ce4f268d46804bbb6dddab7a6b7bb
BUG: 1018178
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/6210
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
... so that subsequent volume commands don't block waiting forever,
for the lock to be released.
Change-Id: I24b5ec47f6982900ab74ff1b492d523f31ecfb7f
BUG: 1022055
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/6122
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quota volume reset command without "force"
option fixed, doesn't fail anymore. It resets
unprotected fields and not the protected ones.
Also, an appropriate message is provided to the user
for the following cases :
1. only unprotected fields are reset, "force" option
should be used to reset protected fields.
2. Both protected and unprotected fields are reset.
3. No field was reset, "force" option required.
Test case for the same also added.
Change-Id: I24e8f1be87b79ccd81bf6f933e00608b861c7a16
BUG: 1022905
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/6135
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a limit on the total number of outstanding RPC requests
from a given cient. Once the limit is reached the client socket
is removed from POLL-IN event polling.
Change-Id: I8071b8c89b78d02e830e6af5a540308199d6bdcd
BUG: 1008301
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6114
Reviewed-by: Santosh Pradhan <spradhan@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brick force.
Expectation with force is that user is aware of the consequences of
sanity checks not being triggered.
Change-Id: I79dfeed16a23829a7217cef33ab83f9f0ffae336
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
BUG: 1007509
Reviewed-on: http://review.gluster.org/5746
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterd changes:
With this patch, glusterd creates a socket file in
DATADIR/run/glusterd.socket , and listen on it for cli requests. It
listens for 2 rpc programs on the socket file,
- The glusterd cli rpc program, for all cli commands
- A reduced glusterd handshake program, just for the 'system:: getspec'
command
The location of the socket file can be changed with the glusterd option
'glusterd-sockfile'.
To retain compatibility with the '--remote-host' cli option, glusterd
also listens for the cli requests on port 24007. But, for the sake of
security, it listens using a reduced cli rpc program on the port. The
reduced rpc program only contains read-only procs used for 'volume
(info|list|status)', 'peer status' and 'system:: getwd' cli commands.
CLI changes:
The gluster cli now uses the glusterd socket file for communicating with
glusterd by default. A new option '--gluster-sock' has been added to
allow specifying the sockfile used to connect. Using the '--remote-host'
option will make cli connect to the given host & port.
Tests changes:
cluster.rc has been modified to make use of socket files and use
different log files for each glusterd.
Some of the tests using cluster.rc have been fixed.
Change-Id: Iaf24bc22f42f8014a5fa300ce37c7fc9b1b92b53
BUG: 980754
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5280
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gettimeofday() returns the current wall clock time and timezone.
Using these functions in order to measure the passage of time
(how long an operation took) therefore seems like a no-brainer.
This time suffer's from some limitations:
a. They have a low resolution: “High-performance” timing by
definition, requires clock resolutions into the microseconds
or better.
b. They can jump forwards and backwards in time: Computer
clocks all tick at slightly different rates, which causes
the time to drift. Most systems have NTP enabled which
periodically adjusts the system clock to keep them in sync
with “actual” time. The adjustment can cause the clock to
suddenly jump forward (artificially inflating your timing
numbers) or jump backwards (causing your timing calculations
to go negative or hugely positive). In such cases timer
thread could go into an infinite loop.
From 'man gettimeofday':
----------
..
..
The time returned by gettimeofday() is affected by discontinuous
jumps in the system time (e.g., if the system administrator manually
changes the system time). If you need a monotonically increasing
clock, see clock_gettime(2).
..
..
----------
Rationale:
For calculating interval timing for Timer thread, all that’s
needed should be clock as a simple counter that increments
at a stable rate.
This is necessary to avoid the jumps which are caused by using
"wall time", this counter must be monotonic that can never
“tick” backwards, ever.
Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb
BUG: 1017993
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/6070
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently to know the number of files to be healed, either user
has to go to backend and check the number of entries present in
indices/xattrop directory. But if a volume consists of large
number of bricks, going to each backend and counting the number
of entries is a time-taking task. Otherwise user can give
gluster volume heal vol-name info command but with this
approach if no. of entries are very hugh in the indices/
xattrop directory, it will comsume time.
So as a feature, new command is implemented.
Command 1: gluster volume heal vn statistics heal-count
This command will get the number of entries present in
every brick of a volume. The output displays only entries
count.
Command 2: gluster volume heal vn statistics heal-count
replica 192.168.122.1:/home/user/brickname
Here if we are concerned with just one replica.
So providing any one of the brick of a replica will get
the number of entries to be healed for that replica only.
Example:
Replicate volume with replica count 2.
Backend status:
--------------
[root@dhcp-0-17 xattrop]# ls -lia | wc -l
1918
NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy
entries so actual no. of entries to be healed are
1916.
[root@dhcp-0-17 xattrop]# pwd
/home/user/2ty/.glusterfs/indices/xattrop
Command output:
--------------
Gathering count of entries to be healed on volume volume3 has been successful
Brick 192.168.122.1:/home/user/22iu
Status: Brick is Not connected
Entries count is not available
Brick 192.168.122.1:/home/user/2ty
Number of entries: 1916
Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290
BUG: 1015990
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/6044
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"gluster volume heal volumename statistics" command gives the summary
of the afr crawl done based on the entries present in the xattrop
directory. Whenever afr crawls are attempted, the beginning time of
crawl, end time of crawl, no of files healed, heal-failed count and
number of files in split brain are shown along with the type of the
crawl. If crawl is already in progress then it will give the number
of files healed, heal failed count and number of files in split-brain
from the beginning of the crawl and instead of telling the end time of
the crawl, "CRAWL IN PROGRESS" message will be shown.
Output format:
command: "gluster volume heal volume-name statistics"
Output:
Gathering afr crawl statistics crawl statistics on volume volume-name
has been successful
------------------------------------------------
Crawl statistics for brick no 0
Hostname of brick 192.168.122.248
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
------------------------------------------------
Crawl statistics for brick no 1
Hostname of brick 192.168.122.1
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
--------------------------------------------------
Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9
BUG: 949400
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4790
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
oVirt's Gluster Integration needs an inexpensive command that can be
executed every 10 seconds to monitor async tasks and their parameters,
for all volumes.
The solution involves adding a 'tasks' sub-command to 'volume status'
to fetch only the async task IDs, type and other relevant parameters.
Only the originator glusterd participates in this command as all the
information needed is available on all the nodes. This is to make the
command suitable for being executed every 10 seconds.
Change-Id: I1edc607baf29b001a5585079dec681d7c641b3d1
BUG: 1012346
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/6006
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds node uuid in rebalance/remove-brick status xml output.
Output XML will look like
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
<opRet>0</opRet>
<opErrno>0</opErrno>
<opErrstr/>
<volRebalance>
<op>3</op>
<nodeCount>1</nodeCount>
<node>
<nodeName>localhost</nodeName>
==>> <id>883626f8-4d29-4d02-8c5d-c9f48c5b2445</id>
<files>0</files>
<size>0</size>
<lookups>0</lookups>
<failures>0</failures>
<status>3</status>
<statusStr>completed</statusStr>
</node>
<aggregate>
<files>0</files>
<size>0</size>
<lookups>0</lookups>
<failures>0</failures>
<status>3</status>
<statusStr>completed</statusStr>
</aggregate>
</volRebalance>
</cliOutput>
Change-Id: I5a1d4f9043b33b9e88150647a243ddb16154e843
BUG: 1012296
Signed-off-by: Bala.FA <barumuga@redhat.com>
Reviewed-on: http://review.gluster.org/6005
Reviewed-by: Kaushal M <kaushal@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I8ff6b48ef41fd6e9ea68c42dfb9878f8a08ed627
BUG: 1010874
Signed-off-by: Ajeet Jha <ajha@redhat.com>
Reviewed-on: http://review.gluster.org/5989
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The volume op-versions are calculated during a volume set/reset, reading a
volume from disk and importing a volume during probe or volume sync. The
calculation of the volume op-version depends on the clusters op-version as some
features are enabled automatically depending on the clusters op-version. We
also don't store the volume op-versions persistently and don't export the
volume op-versions during sync. Due to this, there can occur cases which will
lead to inconsistencies in volumes in different peers. One such case is below,
Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N.
The cluster has two volumes V1 and V2, which have volume op-versions N (since
volume op-version cannot be greater than cluster op-version). We have,
Cluster-op-version = N
V1 op-version = N
V2 op-version = N
A set operation on V1 causes the clusters op-version to be bumped up to N+1.
Assume that there exist some features that are automatically enabled on
op-version N+1. The op-version of V2 remains at N as no operation has been
performed on it. So,
Cluster op-version = N+1
V1 op-version = N+1
V2 op-version = N
Now, we probe a new peer P4. On the new peer we will have the following
op-versions,
Cluster op-version = N+1
V1 op-version = N+1
V2 op-version = N+1
This happens because we don't send volume op-versions during the sync after
probe. P4 will freshly calculate the op-version of V2 (assuming features have
been auto enabled due to the cluster op-version being N+1) as N+1.
Another case is when glusterd on a peer restarts. Assume P3 was restarted,
glusterd will recalculate the volume op-versions during the restore state.
Again, op-version of V2 will be calculated as N+1 assuming auto enabled
features. This will lead to inconsistency in the volume representation in
memory and on disk, as glusterd will assume the volume contains auto enabled
features, but the volfiles don't contain them as they were not regenrated.
These kind of issues can be solved by calculating the volume op-version only
when features are enabled and disabled (ie. during volume set/reset),
persisting the volume-op-versions and exporting/importing them.
Change-Id: I52de0668c92628622e85f4588fb28829a7231132
BUG: 1005043
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5568
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterd would not store all the volumes when a global options were set.
When setting a global option, like 'nfs.*' options, glusterd used to
modify the volinfo for all the volumes, but would store only the volinfo
for the named volume. This lead to mismatch in the persisted and the
in-memory representation of those volumes, which lead to problems like
peers being rejected because of volume mismatches.
Change-Id: I8bca10585e34b7135cb32af0055dbd462b3fb9b5
BUG: 1012400
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/6007
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I27f5f7cd54115d7b236b42f6beaaa05a8b379dd7
BUG: 1010153
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/5978
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Incorrect NFS ACL encoding causes "system.posix_acl_default"
setxattr failure on bricks on XFS file system. XFS (potentially
others?) doesn't understand when the 0x10 prefix is added to the
ACL type field for default ACLs (which the Linux NFS client adds)
which causes setfacl()->setxattr() to fail silently. NFS client
adds NFS_ACL_DEFAULT(0x1000) for default ACL.
FIX:
Mask the prefix (added by NFS client) OFF, so the setfacl is not
rejected when it hits the FS.
Original patch by: "Richard Wareing"
Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396
BUG: 1009210
Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com>
Reviewed-on: http://review.gluster.org/5980
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I8d670a228d3c1282aa7d70b151f166d04abc40e5
BUG: 764890
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/5909
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I1841864273fc4242de15fbfcf76fd5de40269f28
BUG: 1006249
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/5889
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While a gluster command holding lock is in execution,
any other gluster command which tries to run will fail to
acquire the lock. As a result command#2 will follow the
cleanup code flow, which also includes unlocking the held
locks. As both the commands are run from the same node,
command#2 will end up releasing the locks held by command#1
even before command#1 reaches completion.
Now we call the unlock routine in the code path, of the cluster
has been locked during the same transaction.
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Change-Id: I7b7aa4d4c7e565e982b75b8ed1e550fca528c834
BUG: 1008172
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/5937
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|