| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When promotion/demotion daemon starts, it uses the same pidfile
as rebalance. This patch will introduce a different pid file
for the same.
Change-Id: Ic484c53f51e00ae6b2d697748a9600b14829e23b
BUG: 1221970
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/10792
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ic92d25db68e40ef4a4388ef42affd1b3ee5a7ec6
BUG: 1221270
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/10773
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity CID 1291727.
Guenther
Change-Id: I95f01b638f74370f0ef04383f0f9d5799abe31f5
BUG: 789278
Signed-off-by: Guenther Deschner <gd@samba.org>
Reviewed-on: http://review.gluster.org/10300
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a file is under migration, any xattrs created on it
are lost post migration of the file. This is because
the xattrs are set only on the cached subvol of the source
and as the source is under migration, it becomes a linkto file
post migration.
Change-Id: Ib8e233b519cf954e7723c6e26b38fa8f9b8c85c0
BUG: 1193636
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10212
Tested-by: NetBSD Build System
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
phase 2 of migration.
linkto xattr on source file cannot be relied to find where the data
file currently resides. This can happen if there are multiple
migrations before phase 2 detection by a client. For eg.,
* migration (M1, node1, node2) starts.
* application writes some data. DHT correctly stores the state in
inode context that phase-1 of migration is in progress
* migration M1 completes
* migration (M2, node2, node3) is triggered and completed
* application resumes writes to the file. DHT identifies it as phase-2
of migration. However, linkto xattr on node1 points to node2, but
the file is on node3. A lookup correctly identifies node3 as cached
subvol
TBD:
When we identify phase-2 of a previous migration (say M1), there
might be a migration in progress - say (M3, node3, node4). In this
case we need to send writes to both (node3, node4) not just
node3. Also, the inode state needs to correctly indicate that its in
phase-1 of migration. I'll send this as a different patch.
Change-Id: I1a861f766258170af2f6c0935468edb6be687b95
BUG: 1142423
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/10805
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
afr_read_txn() bails out if read_subvol==-1. This meant that for
directories that were in entry split-brain, FOPS like readdir, access,
stat etc were not allowed.
Fix:
Except for getxattr, all other FOPS are wound on the first up child
of afr.
Change-Id: Iacec8fbb1e75c4d2094baa304f62331c81a6f670
BUG: 1221481
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/10776
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I7ec29428b7f7ef249014f948a5d616bfb8aaf80d
BUG: 1225491
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10946
Tested-by: NetBSD Build System
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EC uses an eager lock mechanism to optimize multiple read/write
requests on the same entry or inode. This increases performance
but can have adverse results when other clients try to access the
same entry/inode.
To solve this, this patch adds a functionality to detect when this
happens and force an earlier release to not block other clients.
The method consists on requesting GF_GLUSTERFS_INODELK_COUNT and
GF_GLUSTERFS_ENTRYLK_COUNT for all fops that take a lock. When this
count is greater than one, the lock is marked to be released. All
fops already waiting for this lock will be executed normally before
releasing the lock, but new requests that also require it will be
blocked and restarted after the lock has been released and reacquired
again.
Another problem was that some operations did correctly lock the
parent of an entry when needed, but got the size and version xattrs
from the entry instead of the parent.
This patch solves this problem by binding all queries of size and
version to each lock and replacing all entrylk calls by inodelk ones
to remove concurrent updates on directory metadata. This also allows
rename to correctly update source and destination directories.
Change-Id: I2df0b22bc6f407d49f3cbf0733b0720015bacfbd
BUG: 1165041
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/10852
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ec_heal creates ec_fop_data but doesn't run ec_manager. ec_fop_data_allocate
adds this fop to ec->pending_fops, because ec_manager is not run on this heal
fop it is never removed from ec->pending_fops. When it is accessed after free
it leads to crash. It is better to not to add HEAL fops to ec->pending_fops
because we don't want graph switch to hang the mount because of a BIG
file/directory heal.
BUG: 1188145
Change-Id: I8abdc92f06e0563192300ca4abca3909efcca9c3
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10868
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a delayed lock is pending, a graph switch doesn't correctly
terminate it. This means that the update of version and size xattrs
is lost, causing EIO errors.
This patch handles GF_EVENT_PARENT_DOWN event to correctly finish
pending udpdates before completing the graph switch.
Change-Id: I394f3b8d41df8d83cdd36636aeb62330f30a66d5
BUG: 1188145
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/10787
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia1834ec23d5de615526d4d4e4d2e32aff155b7f7
BUG: 1211962
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10806
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should load libgfdb.so.0, not libgfdb.so
Change-Id: I7a0d64018ccd9893b1685de391e99b5392bd1879
BUG: 1222092
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10796
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Joseph Fernandes
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a blocking lock is requested, lock request is succeeded even when
ec->fragment number of locks are acquired successfully in non-blocking locking
phase. This will lead to fop succeeding only on the bricks where the locks are
acquired, leading to the necessity of self-heals. To prevent these un-necessary
self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase
try blocking locking phase instead.
Change-Id: I940969e39acc620ccde2a876546cea77f7e130b6
BUG: 1221145
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10770
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rebalance process determines the local subvols for the
node it is running on and only acts on files in those subvols.
If a dist-rep or dist-disperse volume is created on 2 nodes by
dividing the bricks equally across the nodes, one process might
determine it has no local_subvols.
When trying to update the commit hash, the function attempts to
lock all local subvols. On the node with no local_subvols the dht
inode lock operation fails, in turn causing the rebalance to fail.
In a dist-rep volume with 2 nodes, if brick 0 of each replica
set is on node1 and brick 1 is on node2, node2 will find that it has
no local subvols.
Change-Id: I7d73b5b4bf1c822eae6df2e6f79bd6a1606f4d1c
BUG: 1221696
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10786
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID : 1124521
Change-Id: Ie524935d636195cb6894074095b9b98fe28dbc2c
BUG: 789278
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/10348
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Sakshi Bansal
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In afr_lookup_xattr_req_prepare(), dict_copy was
done even though source dict was NULL.
Change-Id: I85a5d2823ba021e7f78c1ce13402a0f16b08cb51
BUG: 1220332
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/10755
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I646367581d8ee8a9e5966ee302b19273a0c780ff
BUG: 1220329
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10756
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The key concept here is to determine whether a directory is "clean" by
comparing its last-known-good topology to the current one for the
volume. These are stored as "commit hashes" on the directory and the
volume root respectively. The volume's commit hash changes whenever a
brick is added or removed, and a fix-layout is done. A directory's
commit hash changes only when a full rebalance (not just fix-layout)
is done on it. If all bricks are present and have a directory
commit hash that matches the volume commit hash, then we can assume
that every file is in its "proper" place. Therefore, if we look for
a file in that proper place and don't find it, we can assume it's not
on any other subvolume and *safely* skip the global (broadcast to all)
lookup.
Change-Id: Id6ce4593ba1f7daffa74cfab591cb45960629ae3
BUG: 1219637
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/7702
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a file does not exist on a brick but it does on others, there
could be problems trying to access it because there was some loc_t
structures with null 'pargfid' but 'name' was set. This forced
inode resolution based on <pargfid>/name instead of <gfid> which
would be the correct one. To solve this problem, 'name' is always
set to NULL when 'pargfid' is not present.
Another problem was caused by an incorrect management of errors
while doing incremental locking. The only allowed error during an
incremental locking was ENOTCONN, but missing files on a brick can
be returned as ESTALE. This caused an EIO on the operation.
This patch doesn't care of errors during an incremental locking. At
the end of the operation it will check if there are enough successfully
locked bricks to continue or not.
Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a
BUG: 1176062
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9407
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When freeing memory, our memory-accounting code expects to be able to
dereference from the (previously) allocated block to its owning
translator. However, as we have already found once in option
validation and twice in logging, that translator might itself have
been freed and the dereference attempt causes on of our daemons to
crash with SIGSEGV. This patch attempts to fix that as follows:
* We no longer embed a struct mem_acct directly in a struct xlator,
but instead allocate it separately.
* Allocated memory blocks now contain a pointer to the mem_acct
instead of the xlator.
* The mem_acct structure contains a reference count, manipulated in
both the normal and translator allocate/free code using atomic
increments and decrements.
* Because it's now a separate structure, we can defer freeing the
mem_acct until its reference count reaches zero (either way).
* Some unit tests were disabled, because they embedded their own
copies of the implementation for what they were supposedly testing.
Life's too short to spend time fixing tests that seem designed to
impede progress by requiring a certain implementation as well as
behavior.
Change-Id: Id929b11387927136f78626901729296b6c0d0fd7
BUG: 1211749
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/10417
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few log messages in dht directory self heal at log level INFO are useful
only for developers and these logs tend to casue excessive logs in our
log files. Hence moving the log level of such logs to DEBUG.
Change-Id: I8a543f4ddeb5c20b2978a0f7b18d8baccc935a54
BUG: 1217949
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/10281
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix adds support to view the number of promoted or demoted
files from the cli. The mechanism is isolmorphic to checking
the status of volumes being rebalanced.
gluster volume rebalance <vol> tier status
Change-Id: I1b11ca27355ceec36c488967c23531202030e205
BUG: 1213063
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10292
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- With this change, the xattr will represent if the file needs to be healed or
not. It will have different values for data/entry and metadata changes.
- inode ref leaks and dict_set_dynstr related leaks fixed
- Added support for trylock/lock based on heal-cmd execution or not
in data heal.
- Made fixes to pass regression runs
Change-Id: I9d8def4c2badde18a76b7898816fecfac113737a
BUG: 1215265
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10385
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Data self-heal:
1) Take inode lock in domain 'this->name:self-heal' on 0-0 range (full file),
So that no other processes try to do self-heal at the same time.
2) Take inode lock in domain 'this->name' on 0-0 range (full file),
3) perform fxattrop+fstat and get the xattrs on all the bricks
3) Choose the brick with ec->fragment number of same version as source
4) Truncate sinks
5) Unlock lock taken in 2)
5) For each block take full file lock, Read from sources write to the sinks, Unlock
6) Take full file lock and see if the file is still sane copy i.e. File didn't become unusable while the bricks are offline.
Update mtime to before healing
7) xattrop with -ve values of 'dirty' and difference of highest and its own
version values for version xattr
8) unlock lock acquired in 6)
9) unlock lock acquired in 1)
Change-Id: I6f4d42cd5423c767262c9d7bb5ca7767adb3e5fd
BUG: 1215265
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10384
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Metadata self-heal:
1) Take inode lock in domain 'this->name' on 0-0 range (full file)
2) perform lookup and get the xattrs on all the bricks
3) Choose the brick with highest version as source
4) Setattr uid/gid/permissions
5) removexattr stale xattrs
6) Setxattr existing/new xattrs
7) xattrop with -ve values of 'dirty' and difference of highest and its own
version values for version xattr
8) unlock lock acquired in 1)
Entry self-heal:
1) take directory lock in domain 'this->name:self-heal' on 'NULL' to prevent
more than one self-heal
2) we take directory lock in domain 'this->name' on 'NULL'
3) Perform lookup on version, dirty and remember the values
4) unlock lock acquired in 2)
5) readdir on all the bricks and trigger name heals
6) xattrop with -ve values of 'dirty' and difference of highest and its own
version values for version xattr
7) unlock lock acquired in 1)
Name heal:
1) Take 'name' lock in 'this->name' on 'NULL'
2) Perform lookup on 'name' and get stat and xattr structures
3) Build gfid_db where for each gfid we know what subvolumes/bricks have
a file with 'name'
4) Delete all the stale files i.e. the file does not exist on more than
ec->redundancy number of bricks
5) On all the subvolumes/bricks with missing entry create 'name' with same
type,gfid,permissions etc.
6) Unlock lock acquired in 1)
Known limitation: At the moment with present design, it conservatively
preserves the 'name' in case it can not decide whether to delete it. this can
happen in the following scenario:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
2) rename d1/f1 -> d2/f2 is performed but the rename is successful only on one
of the bricks (Lets say B)
3) Now name self-heal on d1 and d2 would re-create the file on both d1 and d2
resulting in d1/f1 and d2/f2.
Because we wanted to prevent data loss in the case above, the following
scenario is not healable, i.e. it needs manual intervention:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
2) We have two hard links: d1/a, d2/b and another file d3/c even before the
brick went down
3) rename d3/c -> d2/b is performed
4) Now name self-heal on d2/b doesn't heal because d2/b with older gfid will
not be deleted. One could think why not delete the link if there is
more than 1 hardlink, but that leads to similar data loss issue I described
earlier:
Scenario:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
2) We have two hard links: d1/a, d2/b
3) rename d1/a -> d3/c, d2/b -> d4/d is performed and both the operations are
successful only on one of the bricks (Lets say B)
4) Now name self-heal on the 'names' above which can happen in parallel can
decide to delete the file thinking it has 2 links but after all the
self-heals do unlinks we are left with data loss.
Change-Id: I3a68218a47bb726bd684604efea63cf11cfd11be
BUG: 1213358
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10298
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID: 1223229
The 'loc' ptr is not checked before
dereferencing, which is handled.
Change-Id: Icf668150bde190e6f1b9f58a038099338516efe8
BUG: 789278
Signed-off-by: arao <arao@redhat.com>
Reviewed-on: http://review.gluster.org/9666
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To-Do:
* Make ftruncate work even in the absence of path
* Aggregate and update ia_blocks appropriately when a file is
truncated to a lower size.
Change-Id: Ifd24c2f5e80d2c3bc921261f5481251df8948126
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10631
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change broke the build on NetBSD, FreeBSD, and MacOS X:
http://review.gluster.org/10526/
We restore the build with two fixes:
- Use POSIX-compliant sysconf(_SC_NPROCESSORS_ONLN) to get the
number of processors, instead of Linux specific get_nprocs().
That let us remove Linux-specific #include <sys/sysinfo.h>
- Only define MAX() if it is not already defined. NetBSD defines
it in <sys/param.h> which is already included
BUG: 1129939
Change-Id: I62341c670598670e47ea2f69ab94864f96588b18
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10652
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NFS-server sets EOF only in the READ reply when op_errno is set to
ENOENT. Xlators are expected to set op_errno to ENOENT when EOF is
reached, op_ret will contain the number of bytes returned by the READ.
When an NFS-client (like VMware ESXi) do a READ that exceeds the size of
the file, errno should be set to EOF and the return value contains the
number of bytes that are read (from the requested offset, until the end
of the file). Not setting EOF on a correct short READ, can result in
errors on the NFS-client.
This is not an issue with the Linux NFS-client (or VFS). Linux is smart
enough to not try to read more bytes than the file contains.
BUG: 1209298
Change-Id: Ib15538744908a6001d729288d3e18a432d19050b
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10142
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Throttle value will be "normal" by default. For throttling down,
a thread will be put in to sleep. And for throttling up,
gf_defrag_process_dir will wake up the sleeping threads.
Change-Id: I74d530e3effd6e60e6eec81ccc8ff65789fa9c13
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10526
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Provided setfattr command to set timeout for split-brain
choice.
2) If split-brain inspection/resolution is being done
from the mount for a file, ref the inode when
split-brain-choice is set.
This inode will be unconditionally unref-ed after timeout
seconds set by the user/default otherwise.
3) Updated the doc and testcase to reflect the changes.
Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1
BUG: 1209104
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/10134
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding 64 bits in "version" key of extended attributes. First 64 bits (Left)
represents Data version. Last 64 bits (right) represents Meta Data version.
Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in
3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as
the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with
old version files. For upgrades we need to tell users to complete heals and
then upgrade
BUG: 1215265
Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/10312
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: In case a defrag is null, going to out section will crash rebalance.
Change-Id: I8b3ee1ad85dc23ef0e2f2dd6f912d07216bd619f
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10582
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a read IO occurs against a file that has reached rebalance
phase 2, we redirect the IO to the destination. For tiered
volumes, when we try to reopen the file (on the destination),
the lower level DHT receives the open call and fails; it does
not have a "cached subvol". Fix is to "teach" the lower level
DHT of the new location by sending a locate before the open.
Change-Id: Ia4acb0035ff1da15f6a8f9ed54f43c76e8b98f5f
BUG: 1214048
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: root <root@gprfs018.sbu.lab.eng.bos.redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10324
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Background:
Glusterfs changelogs are stored in each brick, which records the changes
happened in that brick. Georep will run in all the nodes of master and
processes changelogs "independently".
Processing changelogs is in brick level, but all the fops will be replayed
on "slave mount" point.
Problem:
With a DHT volume, in changelog "internal fops" are NOT recorded.
For Rename case, Rename is recorded in "hashed" brick changelog.
(DHT's internal fops like creating linkto file, unlink is NOT recorded).
This lead us to inconsistent rename operations.
For example,
Distribute volume created with Two bricks B1, B2.
//Consider master volume mounted @ /mnt/master
and following operations executed:
cd /mnt/master
touch f1 // f1 falls on B1 Hash
mv f1 f2 // f2 falls on B2 Hash
// Here, Changelogs are recorded as below:
@B1
CREATE f1
@B2
RENAME f1 f2
Here, race exists between Brick B1 and B2, say B2 will get executed first.
Source file f1 itself is "NOT PRESENT", so it will go ahead and create
f2 (Current implementation).
We have this problem When rename falls in another brick and
file is unlinked in Master.
Similar kind of issue exists in following case too(multiple rename):
CREATE f1
RENAME f1 f2
RENAME f2 f1
Solution:
Instead of carrying out "changelogging" at "HASHED volume",
carry out at the "CACHED volume".
This way we have rename operations carried out where actual files are present.
So,Changelog recorded as :
@B1
CREATE f1
RENAME f1 f2
credit: sarumuga@redhat.com
PS: Some of the races as the one below are _NOT_ fixed by this patch
* f1 and f2 exist. B1 and B2 are their respective cached subvols. For
both files hashed-subvol == cached-subvol
* mv f1 f2 on master.
* B1 has change-log entry of rename f1 f2
* rebalance migrates f2 from B1 and B2
* mv f2 f1 on master.
* B2 has change-log entry of rename f2 f1
Since changelog entries (rename f1 f2) and (rename f2 f1) are processed
independently by gsyncds, which of either f1 and f2 survives on slave
is subject to race. Note that on master its file f1 with name f1 which
survived. On slave it can be either file f1 with name f1 or file f2
with name f2 based on who wins the race of processing changelog.
Change-Id: Iebc222f582613924c3a7cba37fb6d3e2d8332eda
BUG: 1141379
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10410
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the patch http://review.gluster.org/#/c/9657
the client pid set by tiering migration was getting over-
written in dht_start_rebalance_task(). Just corrected it
in dht_setxattr() before calling dht_start_rebalance_task()
and removed it from dht_start_rebalance_task().
Change-Id: I37cfa111f83a4e5d498042575c93799f60b49870
BUG: 1217937
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/10502
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we attach a tier, the hot tier becomes the hashed
subvolume. But directories may not yet have been replicated by
the fix layout process. Hence lookups to those directories
will fail on the hot subvolume. We should only go to the hashed
subvolume once the layout has been fixed. This is known if the
layout for the parent directory does not have an error. If
there is an error, the cold tier is considered the hashed
subvolume. The exception to this rules is ENOCON, in which
case we do not know where the file is and must abort.
Note we may revalidate a lookup for a directory even if the
inode has not yet been populated by FUSE. This case can
happen in tiering (where one tier has completed a lookup
but the other has not, in which case we revalidate one tier
when we call lookup the second time). Such inodes are
still invalid and should not be consulted for validation.
Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523
BUG: 1214289
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10435
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add logic in afr to work in conjunction with the arbiter xlator when a
replica 3 arbiter volume is created. More specifically, this patch:
* Enables full locks for afr data transaction for such volumes.
* Removes the upfront marking of pending xattrs at the time of pre-op
and defer it to post-op. (This is an arbiter independent change and is made for all afr transactions.)
* After pre-op stage, check if we can proceed with the fop stage without
ending up in split-brain by examining the changelog xattrs.
* Unwinds the fop with failure if only one source was available at the
time of pre-op and the fop happened to fail on particular source brick.
* Skips data self-heal if arbiter brick is the only source available.
* Adds the arbiter-count option to the shd graph.
This patch is a part of the arbiter logic implementation for 3 way AFR
details of which can be found at http://review.gluster.org/#/c/9656/
Change-Id: I9603db9d04de5626eb2f4d8d959ef5b46113561d
BUG: 1199985
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/10258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If both dicts are NULL then equal. If one of the dicts is NULL but the other
has only ignorable keys then also they are equal. If both dicts are non-null
then check if for each non-ignorable key, values are same or not. value_ignore
function is used to skip comparing values for the keys which must be present in
both the dictionaries but the value could be different.
geo-rep's stime xattr doesn't need to be present in list xattr but when
getxattr comes on stime xattr even if there aren't enough responses with the
xattr we should still give out an answer which is maximum of the stimes
available.
Change-Id: I8de2ceaa2db785b797f302f585d88e73b154167d
BUG: 1207712
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10078
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia7d43cb3b222db34ecb0e35424f1766715ed8e6a
BUG: 1188242
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/10176
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ifbba6f340adfe2b4e3ad07260fbf4a25698ad8df
BUG: 1217949
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10459
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When calling dlopen() for libgfdb, do not specify the library version
number "libgfdb.so.0.0.1", since libtool will not always create libraries
or link with that name with the full 3-digit version. For instance on
NetBSD only up to the 2-digit version is available and "libgfdb.so.0.0.1"
does not exist.
Instead, just specify "libgfdb.so" and rely on smymlinks installed by
libtool to find the relevant library.
BUG: 1129939
Change-Id: I074b1009d3622a122fdaeb4b99658bca3277e211
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10407
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow up patch for http://review.gluster.org/#/c/10080
In the above, the suggested change in
http://review.gluster.org/#/c/10080/7/xlators/cluster/dht/src/dht-rebalance.c
doesnot work. The reason it doesnt work is promotion and demotion are done in
a multithread way. Whenever a promotion or demotion thread is called, the frame
of the old sync_op thread is not carried with it. As a result the frame->root->pid
is not set.
Solution:
When the file is getting migrated, we get a tiering.migration key_value in the
xattr dict, so that we pass this dic key-value when we do syncop_setxattr()
to do data migration and set the frame->root->pid GF_CLIENT_PID_TIER_DEFRAG
in dht_setxattr() just before calling dht_start_rebalance_task().
Change-Id: I86fef2d961b32fdd2c0c69d8512cbe846b393404
BUG: 1194753
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/10266
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
problem:
1. When two threads execute in parallel in dht_getxattr_cbk
it may so happen that, both may find local->xattr to be NULL. As
a result dht_aggregate_xattr may not get executed.
2. In dht_getxattr_cbk,
thread1 thread2
T1 this_call_cnt = 2 -1
T2 this_call_cnt = 1 - 1
T3 fills local_xattr
T4 DHT_STACK_UNWIND -> local_wipe
T5 tries to dereference local
which is already freed,
leading to crash.
Solution:
for problem1: Execute critical section inside frame lock
to resolve race.
for problem2: Calculate this_call_count just before out section.
Change-Id: I9827ac8fafebb0c733a4e4f3c710b752f1cd45fa
BUG: 1215592
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10389
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current patch address two part of the design proposed.
1. Rebalance multiple files in parallel
2. Crawl only bricks that belong to the current node
Brief design explanation for the above two points.
1. Rebalance multiple files in parallel:
-------------------------------------
The existing rebalance engine is single threaded. Hence, introduced
multiple threads which will be running parallel to the crawler. The
current rebalance migration is converted to a "Producer-Consumer"
frame work.
Where Producer is : Crawler
Consumer is : Migrating Threads
Crawler: Crawler is the main thread. The job of the crawler is now
limited to fix-layout of each directory and add the files which are
eligible for the migration to a global queue in a round robin manner
so that we will use all the disk resources efficiently. Hence, the
crawler will not be "blocked" by migration process.
Producer: Producer will monitor the global queue. If any file is
added to this queue, it will dqueue that entry and migrate the file.
Currently 20 migration threads are spawned at the beginning of the
rebalance process. Hence, multiple file migration happens in parallel.
2. Crawl only bricks that belong to the current node:
--------------------------------------------------
As rebalance process is spawned per node, it migrates only the files
that belongs to it's own node for the sake of load balancing. But it
also reads entries from the whole cluster, which is not necessary as
readdir hits other nodes.
New Design:
As part of the new design the rebalancer decides the subvols
that are local to the rebalancer node by checking the node-uuid of
root directory prior to the crawler starts. Hence, readdir won't hit
the whole cluster as it has already the context of local subvols and
also node-uuid request for each file can be avoided. This makes the
rebalance process "more scalable".
Change-Id: I73ed6ff807adea15086eabbb8d9883e88571ebc1
BUG: 1171954
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/9657
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was leading to hangs when get_size_and_version fails
Change-Id: Iad9408c2dacc9a74594b8d2f94c95f402533b0f1
BUG: 1215265
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10390
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Logic for adding the 'glusterd_brickinfo->group' member and using it to
find the brick positon has been taken from http://review.gluster.org/#/c/9919.
Thanks to Jeff Darcy for that.
This patch is a part of the arbiter logic implementation for 3 way AFR
details of which can be found at http://review.gluster.org/#/c/9656/
Change-Id: Idbfe4f29ee8e098e0102def8f38b32314316b188
BUG: 1199985
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/10257
Tested-by: NetBSD Build System
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UUID strings are UUID_CANONICAL_FORM_LEN (36) bytes long
plus the trailing nul character that various function (e.g.:
uuid_unparse) will add. As a consequence, UUID strings must
be declared as UUID_CANONICAL_FORM_LEN+1 long, otherwise
we get a off-by-one overrun that corrupts the next variable
on stack.
BUG: 1129939
Change-Id: I5837ad6ca06fa17cc7ab143eedd02d8099ecca2a
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10394
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the version numbers do not match, then writes are performed only on at least
N-R bricks which have same version. But if we want to do healing of files which
are constantly modified we need to allow writes on subvols that are undergoing
heal. Data healing will mark 62nd bit while the heal is going on. When the data
transaction sees that this bit is set it needs to perform the fop on that
subvol irrespective of whether the versions match or do not match. Fop is
considered successful only if N-R non-healing bricks succeed.
Change-Id: I69a17582df397aaf6e8ca4b5e746c7ca802cbbde
BUG: 1215265
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10372
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|