| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Variable search_unhashed is of type boolean, change it to integer type.
This fixes the warning increment of a boolean expression.
BUG: 1531987
Change-Id: Ibf153f6a9ad704da38bff346b6a21a71323ed9bb
Signed-off-by: Varsha Rao <varao@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dht_(f)xattrop implementation did not implement
migration phase1/phase2 checks which could cause issues
with rebalance on sharded volumes.
This does not solve the issue where fops may reach the target
out of order.
Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c
BUG: 1471031
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In glusterfs code base we call mutex_lock/unlock to take
reference/dereference for a object.Sometime it could be
reason for lock contention also.
Solution: There is no need to use mutex to increase/decrease ref
counter, instead of using mutex use gcc builtin ATOMIC
operation.
Test: I have not observed yet how much performance gain after apply
this patch specific to glusterfs but i have tested same
with below small program(mutex and atomic both) and
get good difference.
static int numOuterLoops;
static void *
threadFunc(void *arg)
{
int j;
for (j = 0; j < numOuterLoops; j++) {
__atomic_add_fetch (&glob, 1,__ATOMIC_ACQ_REL);
}
return NULL;
}
int
main(int argc, char *argv[])
{
int opt, s, j;
int numThreads;
pthread_t *thread;
int verbose;
int64_t n = 0;
if (argc < 2 ) {
printf(" Please provide 2 args Num of threads && Outer Loop\n");
exit (-1);
}
numThreads = atoi(argv[1]);
numOuterLoops = atoi (argv[2]);
if (1) {
printf("\tthreads: %d; outer loops: %d;\n",
numThreads, numOuterLoops);
}
thread = calloc(numThreads, sizeof(pthread_t));
if (thread == NULL) {
printf ("calloc error so exit\n");
exit (-1);
}
__atomic_store (&glob, &n, __ATOMIC_RELEASE);
for (j = 0; j < numThreads; j++) {
s = pthread_create(&thread[j], NULL, threadFunc, NULL);
if (s != 0) {
printf ("pthread_create failed so exit\n");
exit (-1);
}
}
for (j = 0; j < numThreads; j++) {
s = pthread_join(thread[j], NULL);
if (s != 0) {
printf ("pthread_join failed so exit\n");
exit (-1);
}
}
printf("glob value is %ld\n",__atomic_load_n (&glob,__ATOMIC_RELAXED));
exit(0);
}
time ./thr_count 800 800000
threads: 800; outer loops: 800000;
glob value is 640000000
real 1m10.288s
user 0m57.269s
sys 3m31.565s
time ./thr_count_atomic 800 800000
threads: 800; outer loops: 800000;
glob value is 640000000
real 0m20.313s
user 1m20.558s
sys 0m0.028
Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
| |
BUG: 1520974
Change-Id: I19ea40c888e88a7a4ac271168ed1820c2075be93
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: tierd was stopped only after detach commit
This makes the detach take a longer time. The detach
demotes the files to the cold brick and if the promotion
frequency is hit, then the tierd starts to promote files to
hot tier again.
Fix: stop tierd after detach start so the files get
demoted faster.
Note: the is_tier_enabled was not maintained properly.
That has been fixed too. some code clean up has been done.
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Change-Id: I532f7410cea04fbb960105483810ea3560ca149b
BUG: 1446381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Sometime test case ./tests/bugs/bug-1371806_1.t is failing on
centos due to race condition between fresh lookup and setxattr fop.
Solution: In selfheal code path we do save mds on inode_ctx, it was not
serialize with lookup unwind. Due to this behavior after lookup
unwind if mds is not saved on inode_ctx and if any subsequent
setxattr fop call it has failed with ENOENT because
no mds has found on inode ctx.To resolve it save mds on
inode ctx has been serialize with lookup unwind.
BUG: 1498966
Change-Id: I8d4bb40a6cbf0cec35d181ec0095cc7142b02e29
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
| |
..
the brick file system does not support fallocate.
Change-Id: Id76cda2d8bb3b223b779e5e7a34f17c8bfa6283c
BUG: 1488103
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces the space used from four bytes to one, and allows
new code to use familiar C99 types/values interoperably with our
old cruft. It does *not* change current declarations or code;
that will be left for a separate - much larger - patch.
Updates: #80
Change-Id: I5baedd17d3fb05b38f0d8b8bb9dd62824475842e
Signed-off-by: Jeff Darcy <jdarcy@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Comparing the uuid string of the local node against that stored in the
local_subvol information is inefficient, especially as it is
done for every file to be migrated. The code has now been changed
to set the value of info to 1 if the nodeuuid is that of the node
making the comparison so this becomes an integer comparison.
Change-Id: I7491d59caad3b71dbf5facc94dcde0cd53962775
BUG: 1451434
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In a distributed volume custom extended attribute value for a directory
does not display correct value after stop/start or added newly brick.
If any extended(acl) attribute value is set for a directory after stop/added
the brick the attribute(user|acl|quota) value is not updated on brick
after start the brick.
Solution: First store hashed subvol or subvol(has internal xattr) on inode ctx and
consider it as a MDS subvol.At the time of update custom xattr
(user,quota,acl, selinux) on directory first check the mds from
inode ctx, if mds is not present on inode ctx then throw EINVAL error
to application otherwise set xattr on MDS subvol with internal xattr
value of -1 and then try to update the attribute on other non MDS
volumes also.If mds subvol is down in that case throw an
error "Transport endpoint is not connected". In dht_dir_lookup_cbk|
dht_revalidate_cbk|dht_discover_complete call dht_call_dir_xattr_heal
to heal custom extended attribute.
In case of gnfs server if hashed subvol has not found based on
loc then wind a call on all subvol to update xattr.
Fix: 1) Save MDS subvol on inode ctx
2) Check if mds subvol is present on inode ctx
3) If mds subvol is down then call unwind with error ENOTCONN and if it is up
then set new xattr "GF_DHT_XATTR_MDS" to -1 and wind a call on other
subvol.
4) If setxattr fop is successful on non-mds subvol then increment the value of
internal xattr to +1
5) At the time of directory_lookup check the value of new xattr GF_DHT_XATTR_MDS
6) If value is not 0 in dht_lookup_dir_cbk(other cbk) functions then call heal
function to heal user xattr
7) syncop_setxattr on hashed_subvol to reset the value of xattr to 0
if heal is successful on all subvol.
Test : To reproduce the issue followed below steps
1) Create a distributed volume and create mount point
2) Create some directory from mount point mkdir tmp{1..5}
3) Kill any one brick from the volume
4) Set extended attribute from mount point on directory
setfattr -n user.foo -v "abc" ./tmp{1..5}
It will throw error " Transport End point is not connected "
for those hashed subvol is down
5) Start volume with force option to start brick process
6) Execute getfattr command on mount point for directory
7) Check extended attribute on brick
getfattr -n user.foo <volume-location>/tmp{1..5}
It shows correct value for directories for those
xattr fop were executed successfully.
Note: The patch will resolve xattr healing problem only for fuse mount
not for nfs mount.
BUG: 1371806
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I4eb137eace24a8cb796712b742f1d177a65343d5
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add EBADF handling for dht_fremovexattr and dht_fsetxattr.
Change-Id: Ide0d5812dae79655d2565157e5baabcd753b4309
BUG: 1476665
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17999
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT fd based fops used to check if the fd was open
on the cached subvol before winding the call. However,
this introduced a performance regression of about
30% for reads.
This check was introduced to handle cases where files
were migrated while IOs were happening. As this is not
the common case, dht will now check if the fd is
open on the cached subvol only if the call fails
with EBADF.
This will prevent a performance hit where a rebalance
is not running.
Change-Id: I2035a858d63c3fcd22bb634055bbb0ad01686808
BUG: 1476665
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17976
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A brief about how hardlink migration works:
- Different hardlinks (to the same file) may hash to different bricks,
but their cached subvol will be same. Rebalance picks up the first hardlink,
calculates it's hash(call it TARGET) and set the hashed subvolume as an xattr
on the data file.
- Now all the hardlinks those come after this will fetch that xattr and will
create linkto files on TARGET (all linkto files for the hardlinks will be hardlink
to each other on TARGET).
- When number of hardlinks on source is equal to the number of hardlinks on
TARGET, the data migration will happen.
RACE:1
Since rebalance is multi-threaded, the first lookup (which decides where the TARGET
subvol should be), can be called by two hardlink migration parallely and they may end
up creating linkto files on two different TARGET subvols. Hence, hardlinks won't be
migrated.
Fix: Rely on the xattr response of lookup inside gf_defrag_handle_hardlink since it
is executed under synclock.
RACE:2
The linkto files on TARGET can be created by other clients also if they are doing
lookup on the hardlinks. Consider a scenario where you have 100 hardlinks. When
rebalance is migrating 99th hardlink, as a result of continuous lookups from other
client, linkcount on TARGET is equal to source linkcount. Rebalance will migrate data
on the 99th hardlink itself. On 100th hardlink migration, hardlink will have TARGET as
cached subvolume. If it's hash is also the same, then a migration will be triggered from
TARGET to TARGET leading to data loss.
Fix: Make sure before the final data migration, source is not same as destination.
RACE:3
Since a hardlink can be migrating to a non-hashed subvolume, a lookup from other
client or even the rebalance it self, might delete the linkto file on TARGET leading
to hardlinks never getting migrated.
This will be addressed in a different patch in future.
Change-Id: If0f6852f0e662384ee3875a2ac9d19ac4a6cea98
BUG: 1469964
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17755
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The earlier approach of using the number of files
to determine when the rebalance would complete did
not work well when file sizes differed widely.
The new approach now gets the total data size and
uses that information to determine how long
the rebalance is expected to take.
Change-Id: I84e80a0893efab72ff06130e4596fa71c9c8c868
BUG: 1467209
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17668
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an fd is opened on a file, the file is migrated
and the cached subvol is updated in the inode_ctx
before an fd based fop is sent, the fop is sent to
the dst subvol on which the fd is not opened.
This causes the FOP to fail with EBADF.
Now, every fd based fop will check to see that the fd
has been opened on the dst subvol before winding it down.
Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e
BUG: 1465075
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17630
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rebalance used to get the file count in the beginning
and not update it. This caused estimates to fail
if the number changed during the rebalance.
The rebalance now updates the file count periodically.
Change-Id: I1667ee69e8a1d7d6bc6bc2f060fad7f989d19ed4
BUG: 1464110
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17607
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a similar issue fixed for rename in https://review.gluster.org/#/c/16016/
For hardlinks, if cached and hashed subvolumes are different, then it will first
create linkto file in hashed using root permission, but actually hardlink creation
fails with EACESS and stale linkto file is never removed.All the followup hardlink
calls with file name will result ESTALE because linktofile creation fails with EEXIST
and follow up lookup on linkto file returns gfid-mismatching(old linkto file) and
finally fails with ESTALE
Steps to produce :
(From link/00.t test from posix-testsuite)
Steps executed in script
* create a file "abc" using root
* change the ownership of file to a non root user
* create hardlink "link" for "abc" using a non root user, it fails with EACESS
* delete "abc"
* create directory "abc" using root
* again try to create hadrlink "link" for "abc" using non root user, fails with ESTALE
Also tried to fix other bugs in dht_linkfile_create_cbk() and posix_lookup.
Thanks Susant for the help in debugging the issue and suggestion for this patch.
Change-Id: I7a5a1899d3fd1fdb13578b37f9d52a084492e35d
BUG: 1452084
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-on: https://review.gluster.org/17331
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Empty directories were not being considered while
calculating rebalance estimates leading to negative
time-left values being displayed as part of the
rebalance status.
Change-Id: I48d41d702e72db30af10e6b87b628baa605afa98
BUG: 1457985
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17448
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_readdirp must unwind with list of entries only after
the entire buffer requested by kernel is filled to avoid
extra syscalls occuring when returning partially filled
buffer. Also wind readdir call to next subvol on reaching
EOD for directory on that subvol to avoid extra network call.
Change-Id: If2e1a2722f813d95457c7542bff25fef56c7a041
BUG: 1356453
Signed-off-by: Sakshi <sabansal@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://review.gluster.org/12271
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On demand migration of files i.e. migration done by clients
triggered by a setfattr was broken.
Dependency on defrag led to crash when migration was triggered from
client.
Note: This functionality is not available for tiered volumes. Migration
from tier served client will fail with ENOTSUP.
usage (But refer to the steps mentioned below to avoid any issues) :
setfattr -n "trusted.distribute.migrate-data" -v "1" <filename>
The purpose of fixing the on-demand client migration was to give a
workaround where the user has lots of empty directories compared to
files and want to do a remove-brick process.
Here are the steps to trigger file migration for remove-brick process from
client. (This is highly recommended to follow below steps as is)
Let's say it is a replica volume and user want to remove a replica pair
named brick1 and brick2. (Make sure healing is completed before you run
these steps)
Step-1: Start remove-brick process
- gluster v remove-brick <volname> brick1 brick2 start
Step-2: Kill the rebalance daemon
- ps aux | grep glusterfs | grep rebalance\/ | awk '{print $2}' | xargs kill
Step-3: Do a fresh mount as mentioned here
- glusterfs -s ${localhostname} --volfile-id rebalance/$volume-name /tmp/mount/point
Step-4: Go to one of the bricks (among brick1 and brick2)
- cd <brick1 path>
Step-5: Run the following command.
- find . -not \( -path ./.glusterfs -prune \) -type f -not -perm 01000 -exec bash -c 'setfattr -n "distribute.fix.layout" -v "1" ${mountpoint}/$(dirname '{}')' \; -exec setfattr -n "trusted.distribute.migrate-data" -v "1" ${mountpoint}/'{}' \;
This command will ignore the linkto files and empty directories. Do a fix-layout of
the parent directory. And trigger a migration operation on the files.
Step-6: Once this process is completed do "remove-brick force"
- gluster v remove-brick <volname> brick1 brick2 force
Note: Use the above script only when there are large number of empty directories.
Since the script does a crawl on the brick side directly and avoids directories those
are empty, the time spent on fixing layout on those directories are eliminated(even if the script
does not do fix-layout on empty directories, post remove-brick a fresh layout will be built
for the directory, hence not affecting application continuity).
Detailing the expectation for hardlink migartion with this patch:
Hardlink is migrated only for remove-brick process. It is highly essential
to have a new mount(step-3) for the hardlink migration to happen. Why?:
setfattr operation is an inode based operation. Since, we are doing setfattr from
fuse mount here, inode_path will try to build path from the linked dentries to the inode.
For a file without hardlinks the path construction will be correct. But for hardlinks,
the inode will have multiple dentries linked.
Without fresh mount, inode_path will always get the most recently linked dentry.
e.g. if there are three hardlinks named dir1/link1, dir2/link2, dir3/link3, on a client
where these hardlinks are looked up, inode_path will always return the path dir3/link3
if dir3/link3 was looked up most recently. Hence, we won't be able to create linkto
files for all other hardlinks on destination (read gf_defrag_handle_hardlink for more details
on hardlink migration).
With a fresh mount, the lookup and setfattr become serialized. e.g. link2 won't be
looked up until link1 is looked up and migrated. Hence, inode_path will always have the correct
path, in this case link1 dentry is picked up(as this is the most recently looked up inode) and
the path is built right.
Note: If you run the above script on an existing mount(all entries looked up), hard links may
not be migrated, but there should not be any other issue. Please raise a bug, if you find any
issue.
Tests: Manual
Change-Id: I9854cdd4955d9e24494f348fb29ba856ea7ac50a
BUG: 1450975
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17115
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Rebalance compares the node-uuid of a file against its own
to and migrates a file only if they match. However, the
current behaviour in both AFR and EC is to return
the node-uuid of the first brick in a replica set for all
files. This means a single node ends up migrating all
the files if the first brick of every replica set is on the
same node.
Fix:
AFR and EC will return all node-uuids for the replica set.
The rebalance process will divide the files to be migrated
among all the nodes by hashing the gfid of the file and
using that value to select a node to perform the migration.
This patch makes the required DHT and tiering changes.
Some tests in rebal-all-nodes-migrate.t will need to be
uncommented once the AFR and EC changes are merged.
Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c
BUG: 1366817
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17239
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current rebalance throttle options: lazy/normal/aggressive may not always be
sufficient for the purpose of throttling. In our recent test, we observed for
certain setups, normal and aggressive modes behaved similarly consuming full
disk bandwidth. So in cases like this admin should be able to tune it
down(or vice versa) depending on the need.
Along with old throttle configurations, thread counts are tuned based on number.
e.g. gluster v set vol-name cluster-rebal.throttle 5.
Admin can tune up/down between 0 and the number of cores available.
Note: For heterogenous servers, validation will fail on the old server if "number"
is given for throttle configuration.
The message looks something like this:
"volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}"
Test: Manual test by logging active thread number after reconfiguring throttle option.
testcase: tests/basic/distribute/throttle-rebal.t
Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3
BUG: 1438370
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16980
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Throttle settings "normal" and "aggressive" for rebalance
did not have performance difference.
normal mode spawns $(no. of cores - 4)/2 threads and aggressive
spawns $(no. of cores - 4) threads. Though aggressive mode has twice
the number of threads compared to that of normal mode, there was no
performance gain when switched to aggressive mode from normal mode.
RCA:
During the course of debugging the above problem, we tried assigning
migration job to migration threads spawned by rebalance, rather than
synctasks(as there is more overhead associated to manage the task
queue and threads). This gave us a significant improvement over rebalance
under synctasks. This patch does not really gurantee that there will be a
clear performance difference between normal and aggressive mode, but this
patch certainly maximized the disk utilization for 1GBfiles run.
Results:
Test enviroment:
Gluster Config:
Number of Bricks: 2 (one brick per disk(RAID-6 12 disk))
Bricks:
Brick1: server1:/brick/test1/1
Brick2: server2:/brick/test1/1
Options Reconfigured:
performance.readdir-ahead: on
server.event-threads: 4
client.event-threads: 4
1000 files with 1GB each were created/renamed such that all files will have
server1 as cached and server2 as hashed, so that all files will be migrated.
Test machines had 24 cores each.
Results with/without synctask based migration:
-----------------------------------------------
mode normal(10threads) aggressive(20threads)
timetaken 0:55:30 (h:m:s) 0:56:3 (h:m:s)
withsynctask
timetaken
with migrator 0:38:3 (h:m:s) 0:23:41 (h:m:s)
threads
From above table it can be seen that, there is a clear 2x perf gain between
rebalance with synctask vs rebalance with migrator threads.
Additionally this patch modifies the code so that caller will have the exact error
number returned by dht_migrate_file(earlier the errno meaning was overloaded). This
will help avoiding scenarios where migration failure due to ENOENT, can result in
rebalance abort/failure.
Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e
BUG: 1420166
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16427
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Design doc: https://review.gluster.org/16876
Directory creation is now synchronized with blocking inodelk of the
parent on the hashed subvolume followed by the entrylk on the hashed
subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
selfheal mkdir.
To maintain internal consistency of directories across all subvols of
dht, we need locks. Specifically we are interested in:
1. Consistency of layout of a directory. Only one writer should modify
the layout at a time. A writer (layout setting during directory heal
as part of lookup) shouldn't modify the layout while there are
readers (all other fops like create, mkdir etc., which consume
layout) and readers shouldn't read the layout while a writer is in
progress. Readers can read the layout simultaneously. Writer takes
a WRITE inodelk on the directory (whose layout is being modified)
across ALL subvols. Reader takes a READ inodelk on the directory
(whose layout is being read) on ANY subvol.
2. Consistency of directory namespace across subvols. The path and
associated gfid should be same on all subvols. A gfid should not be
associated with more than one path on any subvol. All fops that can
change directory names (mkdir, rmdir, renamedir, directory creation
phase in lookup-heal) takes an entrylk on hashed subvol of the
directory.
NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
directory, the transaction itself is a consumer of layout on
parent directory. So, the transaction is a reader of parent
layout and does an inodelk on parent directory just like any
other layout reader. So a mkdir (dir/subdir) would:
> Acquire a READ inodelk on "dir" on any subvol.
> Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
> creates directory on hashed subvol and possibly on non-hashed subvols.
> UNLOCK (entrylk)
> UNLOCK (inodelk)
NOTE2: mkdir fop while setting the layout of the directory being created
is considered as a reader, but NOT a writer. The reason is for
a fop which can consume the layout of a directory to come either
of the following conditions has to be true:
> mkdir syscall from application has to complete. In this case no
need of synchronization.
> A lookup issued on the directory racing with mkdir has to complete.
Since layout setting by a lookup is considered as a writer, only
one of either mkdir or lookup will set the layout.
Code re-organization:
All the lock related routines are moved to "dht-lock.c" file.
New wrapper function is introduced to take blocking inodelk
followed by entrylk 'dht_protect_namespace'
Updates #191
Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
Signed-off-by: Kotresh HR <khiremat@redhat.com>
BUG: 1443373
Reviewed-on: https://review.gluster.org/15472
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
criteria happens to be the same subvol containing data-file
Rebalance need to figure out a new subvol in case the hashed subvol
does not have enough space. In the process of figuring out the new subvol,
we need to ignore the source subvol, otherwise it will lead to data loss.
Test: Manual
Ran the following
sizeof /tmp/1: 1.5GB
sizeof /brick/1: 16GB
sizeof /tmp/2: 1.5GB
<start>
glusterd; gluster v create test1 vm1:/brick/1 vm1:/tmp/1;
gluster v start test1;
mount -t glusterfs vm1:test1 /mnt;
for i in {1..2000}
do
dd if=/dev/zero of=/mnt/file$i bs=1KB count=1 &> /dev/null;
done
gluster v add-brick test1 vm1:/tmp/2
gluster v set test1 min-free-disk 12GB
gluster v remove-brick test1 vm1:/tmp/1 star
<end>
file count and data were intact.
Change-Id: Ib8fc8467a3d48a7c12958824c4f0b88e160b86c1
BUG: 1441508
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17064
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
test: Manual
created files of size 1K on 2 brick(of size 1GB) setup .
added a brick of size 16GB.
set min-free-disk to 12GB(so that first two bricks won't receive any files).
removed one of the 1st brick of size 1GB.
Logs from test:
[2017-04-12 08:52:08.196484] W [MSGID: 0] [dht-rebalance.c:895:__dht_check_free_space]
0-test1-dht: Write will cross min-free-disk for file - /tile32 on subvol - test1-client-1.
Looking for new subvol.
[2017-04-12 08:52:08.196904] I [MSGID: 0] [dht-rebalance.c:925:__dht_check_free_space]
0-test1-dht: new target found - test1-client-2 for file - /tile32
- Post migration we have two files. The new destination (/brick/1) has the data file
[root@vm1 ~]# ll /brick/1/tile32
-rw-r--r--. 2 root root 0 Apr 12 14:22 /brick/1/tile32
- On the old target the linkto file is there with linkto xattr pointing to /brick/1
[root@vm1 ~]# ll /tmp/2/tile32
---------T. 2 root root 1000 Apr 12 14:22 /tmp/2/tile32
[root@vm1 ~]# getfattr -m . -de text /tmp/2/tile32
getfattr: Removing leading '/' from absolute path names
security.selinux="unconfined_u:object_r:user_tmp_t:s0"
trusted.gfid="����:Aс�#�/'b2"
trusted.glusterfs.dht.linkto="test1-client-2"
Marking ./tests/features/worm_sh.t as bad test.
Reason being, this patch failed on master branch as well and it has nothing
to do with rebalance/remove-brick.
BUG: 1441508
Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17034
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity complain about enum mismatch since we assign GF_OP_CMD_NONE
to a variable holding gf_defrag_type.
Change-Id: I63e71f552b3cc752c26c1b8705420f38908e17e6
BUG: 789278
Signed-off-by: Michael Scherer <misc@redhat.com>
Reviewed-on: https://review.gluster.org/16756
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Michael Scherer <misc@fedoraproject.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tierd is implemented by separating from rebalance process.
The commands affected:
1) Attach tier will trigger this process instead of old one
2) tier start and tier start force will also trigger this process.
3) volume status [tier] will show tier daemon as a process instead
of task and normal tier status and tier detach status works.
4) tier stop implemented.
5) detach tier implemented separately along with new detach tier
status
6) volume tier volname status will work using the changes.
7) volume set works
This patch has separated the tier translator from the legacy
DHT rebalance code. It now sends the RPCs from the CLI
to glusterd separate to the DHT rebalance code.
The daemon is now a service, similar to the snapshot daemon,
and can be viewed using the volume status command.
The code for the validation and commit phase are the same
as the earlier tier validation code in DHT rebalance.
The “brickop” phase has been changed so that the status
command can use this framework.
The service management framework is now used.
DHT rebalance does not use this framework.
This service framework takes care of :
*) spawning the daemon, killing it and other such processes.
*) volume set options , which are written on the volfile.
*) restart and reconfigure functions. Restart is to restart
the daemon at two points
1)after gluster goes down and comes up.
2) to stop detach tier.
*) reconfigure is used to make immediate volfile changes.
By doing this, we don’t restart the daemon.
it has the code to rewrite the volfile for topological
changes too (which comes into place during add and remove brick).
With this patch the log, pid, and volfile are separated
and put into respective directories.
Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399
BUG: 1313838
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/13365
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: hari gowtham <hari.gowtham005@gmail.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As with dht, dirs are present on all subvolumes,
renaming them is a compound operation and thus a
partial success + partial failure scenario is
possible, resulting in an inconsistent state.
For purposes of reproduction, such a scenario can
easily be produced by stopping the volume, edit the
volfile of a certain subvolume to get at an
"option read-only on" setting, and then restart
the volume. Thus those operations that are to make change
on the affected subvolume will fail with EROFS.
To handle such scenarios, we introduce an in-memory cache
where we record the return values obtained from the
subvolumes. At the final stage of the dir rename operation
we check if it's a partial success/fail situation. If yes,
then we perform a reverse rename op on those subvolumes
where the operation succeeded.
Change-Id: I3d05f74f53932cb984a918d252a7309c1009a51d
BUG: 1412069
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/15739
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
private
If reconfigure is executed parallely (or concurrently with dht_init),
there are races that can corrupt memory. One such race is modification
of regexes stored in conf (conf->rsync_regex_valid and
conf->extra_regex_valid) through dht_init_regex. With change [1],
reconfigure codepath can get executed parallely (with itself or with
dht_init) and this fix is needed.
Also, a reconfigure can race with any thread doing dht_layout_search,
resulting in dht_layout_search accessing regex freed up by reconfigure
(like in bz 1399134).
[1] http://review.gluster.org/15046
Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830
BUG: 1399134
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/15945
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Demote files on priority if hi-watermark has been breached and continue
to demote until the watermark drops below hi-watermark.
Monitor watermark more frequently.
Trigger demotion as soon as hi-watermark is breached.
Add cluster.tier-emergency-demote-query-limit option to limit number
of files returned from the database query for every iteration of
tier_migrate_using_query_file(). If watermark hasn't dropped below
hi-watermark during the first iteration, the next iteration will be
triggered approximately 1 second after tier_demote() returns to the
main tiering loop.
Update changetimerecorder xlator to handle query for emergency demote
mode.
Add tier-ctr-interface.h:
Move tier and ctr interface specific macros and struct definition from
libglusterfs/src/gfdb/gfdb_data_store.h to new header
libglusterfs/src/tier-ctr-interface.h
Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
BUG: 1366648
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/15158
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed a structure member that was not used
from dht_local_t. Skipped some unnecessary checks
in dht_filter_loc_subvol_key.
Change-Id: I81740b6528e063fb9cf5817e05865ff4d77aa748
BUG: 1378305
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/15542
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In a distributed-replicate volume attribute
"replica.split-brain-status" value does not display split-brain
condition though directory is in split-brain.
If directory is in split brain on mutiple replica-pairs
it does not show full list of replica pairs.
Solution: Update the dht_aggregate code to aggregate the xattr
value in this specific condition.
Fix: 1) function getChoices returns the choices from split-brain
status string.
2) function add_opt adding the choices to local buffer to
store in dictionary
3) For the key "replica.split-brain-status" function dht_aggregate
call dht_aggregate_split_brain_xattr to prepare the list.
Test: To verify the patch followed below steps
1) Create a distributed replica volume and create mount point
2) Stop heal daemon
3) Touch file and directories on mount point
mkdir test{1..5};touch tmp{1..5}
4) Down brick process on one of the replica set
pkill -9 glusterfsd
5) Change permission of dir on mount point
chmod 755 test{1..5}
6) Restart brick process on node with force option
7) kill brick process on other node in same replica set
8) Change permission of dir again on mount point
chmod 766 test{1..5}
9) Reexecute same step from 4-9 on other replica set also
10) After check heal status on server it will show dir's are
in split brain on all replica sets
11) After check the replica.split-brain-status attr on mount
point it will show wrong status of split brain.
12) After apply the patch the attribute shows correct value.
BUG: 1368312
Change-Id: Icdfd72005a4aa82337c342762775a3d1761bbe4a
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: http://review.gluster.org/15201
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add events for:
* tier attach and detach
* tier pause and resume
* tier rising and dropping hi and lo watermarks
Update eventskeygen.py with tiering events.
Update cli help with:
* attach: add optional force argument
* detach: make force available as non-optional argument on its own
Change-Id: I43990d3a8742151a4a7889bafa19cb572fe661bd
BUG: 1368336
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/15232
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: As metadata in the database fills up, querying the database
take a long time. As a result, tier migration slows down. To
counteract this, we added a way to enable the compaction methods of
the underlying database. The goal is to reduce the size of the
underlying file by eliminating database fragmentation.
NOTE: There is currently a bug where sometimes a brick will
attempt to activate compaction. This happens even compaction is already
turned on.
The cause is narrowed down to the compact_mode_switch flipping its value.
Changes: libglusterfs/src/gfdb - Added a gfdb function to compact the
underlying database, compact_db() This is a no-op if the database has
no such option.
- Added a compaction function for SQLite3 that does the following
1) Changes the auto_vacuum pragma of the database
2) Compacts the database according to the type of compaction requested
- Compaction type can be changed by changing the macro
GF_SQL_COMPACT_DEF to one of the 4 compaction types in
gfdb_sqlite3.h
It is currently set to GF_SQL_COMPACT_INCR, or incremental
vacuuming.
xlators/cluster/dht/src - Added the following command-line option to
enable SQLite3 compaction.
gluster volume set <vol-name> tier-compact <off|on>
- Added the following command-line option to change the frequency the
hot and cold tier are ordered to compact.
gluster volume set <vol-name> tier-hot-compact-frequency <int>
gluster volume set <vol-name> tier-cold-compact-frequency <int>
- tier daemon periodically sends the (new)
GFDB_IPC_CTR_SET_COMPACT_PRAGMA IPC to the CTR xlator. The IPC
triggers compaction of the database.
The inputs are both gf_boolean_t.
IPC Input:
compact_active: Is compaction currently on for the db.
compact_mode_switched: Did we flip the compaction switch recently?
IPC Output:
0 if the compaction succeeds.
Non-zero otherwise.
xlators/features/changetimerecorder/src/ - When the CTR gets the
compaction IPC, it launches a thread that will perform the
compaction. The IPC ends after the thread is launched. To avoid extra
allocations, the parameters are passed using static variables.
Change-Id: I5e1433becb9eeff2afe8dcb4a5798977bf5ba0dd
Signed-off-by: Diogenes Nunez <dnunez@redhat.com>
Reviewed-on: http://review.gluster.org/15031
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ipc is used by md-cache to communicate the list of xattrs that
it is caching, to the upcall xlator. Hence implement this in
dht, such that it winds to all the bricks if the ipc op is
GF_IPC_MDC_TARGET_UPCALL. The ips should not fail if any of
the bricks is down, as md-cache will replay the ipc late when
the brick comes back up.
Change-Id: Ica551a550c04cbb1240c0d211fe831c2e5eb6017
BUG: 1211863
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/15225
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add test to fail promotion if estimated block consumption grows
beyond hi watermark.
Skip file migrations until next cycle if tier_get_fs_stat() fails
in tier_migrate_using_query_file()
Change-Id: Ice04572fa739c09109c4433e65965197482a7beb
BUG: 1349284
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/14780
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Return the correct size of the tiered volume in statfs. It should
be the size of the cold tier, not the sum of the hot and cold tier,
because the hot tier is a cache and not an extension of the volume's
capacity. The number of free blocks, etc is the cold tier's capacity
subtracted by the sum of utilization on the hot and cold tiers. Note
if both tiers are part of the same file system this must be accounted
for as well.
The patch also fixes a pre-existing bug in the DHT/tier
translators. If statfs was taken on a file, the code only calculated
free space on the cached subvolume, not all subvolumes in the replica
group. With the fix, this is corrected, except in the case
where quota is used with the deem-statfs option set to "on".
Change-Id: I2b8bcb4511edf83f12130960aad0a609fcf8f513
BUG: 1339689
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/14536
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During locking we send lock request to cached subvol,
and normally we unlock to the cached subvol
But with parallel fresh lookup on a directory, there
is a race window where the cached subvol can change
and the unlock can go into a different subvol from
which we took lock.
This will result in a stale lock held on one of the
subvol.
So we will store the details of subvol which we took the lock
and will unlock from the same subvol
Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb
BUG: 1311002
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/13492
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With lock-migration, we need to send requests to destination
brick post migration. Once, the source brick marks the lock
structure to be already migrated, the requests will be redirected
to destination brick by dht_lk2/flush2.
Change-Id: I50b14011c5ab68c34826fb7ba7f8c8d42a68ad97
BUG: 1326085
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/13493
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I48c6f9cdda47503615ba65882acd5eedf0a70c89
BUG: 1326085
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/14024
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When we spawn promote and demote thread, query files
are build. And only query file with index 0 is picked for migration
as the first query file. This may not be suitable for scenarios,
where the file in the query are too big to move in the first cycle,
as a result file in the other query files always get missed. We need to
shuffle so that other query files also get a chance.
Fix: Remember the previous first query file and shift it by one index,
before the migration starts.
Change-Id: I704947bcf4bab6b20b1179a6d9ae4a15a3d51bd9
BUG: 1330353
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/14068
Tested-by: Joseph Fernandes
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an ongoing rebalance completion check task been
triggered by dht, there is a possibility of a race
between afr setting subvol as non-readable and dht updates
the cached subvol. In this window a write can fail with EIO.
Change-Id: I42638e6d4104c0dbe893d1bc73e1366188458c5d
BUG: 1329503
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/14049
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0bbc2c2ef115c78393f6570815a5b80316e7e4be
BUG: 1319992
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/11720
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_mkdir ()
{
first-hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any
subvol, but we choose first-hashed-subvol randomly");
{
begin:
hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
hash-range = extract hashe-range from layout of "parent";
ret = mkdir (parent/bname, hashed-subvol, hash-range);
if (ret == "hash-value doesn't fall into layout stored on
the brick (this error is returned by posix-mkdir)")
{
refresh_parent_layout ();
goto begin;
}
}
inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN",
"first-hashed-subvol");
proceed with other parts of dht_mkdir;
}
posix_mkdir (parent/bname, client-hash-range)
{
disk-hash-range = getxattr (parent, "dht-layout-key");
if (disk-hash-range != client-hash-range) {
fail-with-error ("hash-value doesn't fall into layout
stored on the brick");
return 0;
}
continue-with-posix-mkdir;
}
Similar changes need to be done for dentry operations like create,
symlink, link, unlink, rmdir, rename. These will be addressed in
subsequent patches. This patch addresses only mkdir codepath.
This change breaks stripe tests, as on some striped subvols dht layout
xattrs are not set for some reason. This results in failure of
mkdir. Since striped volumes are always created with dht, some tests
associated with stripe also fail. So, I am making following tests
changes (since stripe is out of maintainance):
* modify ./tests/basic/rpc-coverage.t to not to use striped volumes
* mark all (2) tests in tests/bugs/stripe/ as bad tests
Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25
BUG: 1323040
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/13885
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a possibility that while an rmdir is completed on
some non-hashed subvol and proceeding to others, a lookup
selfheal can recreate the same directory on those subvols
for which the rmdir had succeeded. Now the deletion of the
parent directory will fail with an ENOTEMPTY.
To fix this take blocking inodelk on the subvols before
starting rmdir. Selfheal must also take blocking inodelk
before creating the entry.
Change-Id: I168a195c35ac1230ba7124d3b0ca157755b3df96
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/13528
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Spawn a thread for background fix-layout for tier process.
2. Once the fix-layout is completed a marker xttr is set on the root of
volume to mark the completion of the background fixlayout, so that
even if the tier process is spawned again, fixlayout will not be
issued, if it was completed last time.
3. Please note that promotion of legacy files will happen eventually as
the ctr lookup heal in the fixlayout slowly heals the ctr db for legacy
files OR the ctr lookup heal happend due to a name lookup.
4. When a detach tier is successful in evacuation data from hot tier, we remove
the marker xattr is removed. So that next attach tier runs the background
tier fixlayout.
what is remaining ?
1. Instead of clearing the marker xattr of tiering fix layout at the end of detach start
clear it during detach commit. But the issue is detach commit is a glusterd operation
and the volume is not mounted in glusterd.
The reason we want to do it in detach commit is that if the admin wants to attach the
same tier again, then a background fixlayout will be triggered, which would not be needed.
2. Clearing the CTR DB of the cold bricks when there is a detach commit, as it will be having
entries which will be stale when the volume is used, with ctr off (ctr is switched off only when
we have detach commit.)
Change-Id: Ibe343572e95865325cd0eef4d0b976b626a3c0c5
BUG: 1313228
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/13491
Smoke: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Joseph Fernandes
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Directory size is meaningless. Every filesystem has its own
unpredictable way of increasing or decreasing it, based on internal data
structures and even transient conditions. Some filesystems (e.g. ext4)
never decrease it at all. Others (e.g. btrfs) don't even report it.
Very few programs look at it, and those that do are broken.
Unfortunately, one such program is GNU tar, which will complain when it
sees different values because at different times we got the value from
different DHT subvolumes. To avoid such problems, just report a
constant value.
Change-Id: Id64ce917c75b5f7ff50cb55b6e997f3b3556e7e3
BUG: 1302948
Original-author: Shyam <srangana@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13770
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix adds a paramater "tier-max_promote_size" to control wether
a file is migrated or not based on its size. By default the value
is 0, meaning all files are migrated. If set to a non-zero
value, files larger than the parameter won't be moved
in tiered volumes.
Change-Id: Ia6b88e9b2508935bef500d956f9192e59670fe00
BUG: 1313495
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13570
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cli/src/cli-cmd-parser.c (chenk)
cli/src/cli-xml-output.c (spandit)
cli/src/cli.c (chenk)
libglusterfs/src/common-utils.c (vmallika)
libglusterfs/src/gfdb/gfdb_sqlite3.c (jfernand +1)
rpc/rpc-transport/socket/src/socket.c (?)
xlators/cluster/afr/src/afr-transaction.c (?)
xlators/cluster/dht/src/dht-common.h (srangana +2)
xlators/cluster/dht/src/dht-selfheal.c (srangana +2)
xlators/debug/io-stats/src/io-stats.c (R. Wareing)
xlators/features/barrier/src/barrier.c (vshastry)
xlators/features/bit-rot/src/bitd/bit-rot-scrub.h (vshankar +1)
xlators/features/shard/src/shard.c (kdhananj +1)
xlators/mgmt/glusterd/src/glusterd-ganesha.c (skoduri)
xlators/mgmt/glusterd/src/glusterd-handler.c (atinmu)
xlators/mgmt/glusterd/src/glusterd-op-sm.h (atinmu)
xlators/mgmt/glusterd/src/glusterd-snapshot.c (spandit)
xlators/mgmt/glusterd/src/glusterd-syncop.c (atinmu)
xlators/mgmt/glusterd/src/glusterd-volgen.c (atinmu)
xlators/protocol/client/src/client-messages.h (mselvaga +1)
xlators/storage/bd/src/bd-helper.c (M. Mohan Kumar)
xlators/storage/bd/src/bd.c (M. Mohan Kumar)
xlators/storage/posix/src/posix.c (nbalacha +1)
Change-Id: I85934fbcaf485932136ef3acd206f6ebecde61dd
BUG: 1293133
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/13031
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|