| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions keep changing as new functionality is added, so copying
and pasting the code is not a good solution. This way ensures that all
fields get initialized properly no matter how much new stuff we throw in.
Change-Id: I9e9b043d2d305d31e80cf5689465555b70312756
BUG: 924488
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4710
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scheme to encode brick d_off and brick id into global d_off has
two approaches. Since both brick d_off and global d_off are both 64-bit
wide, we need to be careful about how the brick id is encoded.
Filesystems like XFS always give a d_off which fits within 32bits. So
we have another 32bits (actually 31, in this scheme, as seen ahead) to
encode the brick id - which is typically plenty.
Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off,
as the d_off is calculated based on a hash function value. This leaves
us no "unused" bits to encode the brick id.
However both these filesystmes (EXT4 more importantly) are "tolerant" in
terms of the accuracy of the value presented back in seekdir(). i.e, a
seekdir(val) actually seeks to the entry which has the "closest" true
offset.
This "two-prong" scheme exploits this behavior - which seems to be the
best middle ground amongst various approaches and has all the advantages
of the old approach:
- Works against XFS and EXT4, the two most common filesystems out there.
(which wasn't an "advantage" of the old approach as it is borken against
EXT4)
- Probably works against most of the others as well. The ones which would
NOT work are those which return HUGE d_offs _and_ NOT tolerant to
seekdir() to "closest" true offset.
- Nothing to "remember in memory" or evict "old entries".
- Works fine across NFS server reboots and also NFS head failover.
- Tolerant to seekdir() to arbitrary locations.
Algorithm:
Each d_off can be encoded in either of the two schemes. There is no
requirement to encode all d_offs of a directory or a reply-set in
the same scheme.
The topmost bit of the 64 bits is used to specify the "type" of encoding
of this particular d_off. If the topmost bit (bit-63) is 1, it indicates
that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0,
it indicates that the "small" d_off encoding scheme is used.
The goal of the "small" d_off encoding is to stay as dense as possible
towards the lower bits even in the global d_off.
The goal of the HUGE d_off encoding is to stay as accurate (close) as
possible to the "true" d_off after a round of encoding and decoding.
If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick
ID (call it "n").
SMALL d_off
===========
Encoding
--------
If the top n + 1 bits are free in a brick offset, then we leave the
top bit as 0 and set the remaining bits based on the old formula:
hi_mask = 0xffffffffffffffff
hi_mask = ~(hi_mask >> (n + 1))
if ((hi_mask & d_off_brick) != 0)
do_large_d_off_encoding ()
d_off_global = (d_off_brick * N) + brick_id
Decoding
--------
If the top bit in the global offset is 0, it indicates that this
is the encoding formula used. So decoding such a global offset will
be like the old formula:
if ((d_off_global & 0x8000000000000000) != 0)
do_large_d_off_decoding()
d_off_brick = (d_off_global % N)
brick_id = d_off_global / N
HUGE d_off
==========
Encoding
--------
If the top n + 1 bits are NOT free in a given brick offset, then we
set the top bit as 1 in the global offset. The low n bits are replaced
by brick_id.
low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N))
d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id
if (d_off_global == 0xffffffffffffffff)
discard_entry();
Decoding
--------
If the top bit in the global offset is set 1, it indicates that
the encoding formula used is above. So decoding would look like:
hi_mask = (0xffffffffffffffff << n)
low_mask = ~(hi_mask)
d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff)
brick_id = global_d_off & low_mask
If "losing" the low n bits in this decoding of d_off_brick looks
"scary", we need to realize that till recently EXT4 used to only
return what can now be expressed as (d_off_global >> 32). The extra
31 bits of hash added by EXT recently, only decreases the probability
of a collision, and not eliminate it completely, anyways. In a way,
the "lost" n bits are made up by decreasing the probability of
collision by sharding the files into N bricks / EXT directories
-- call it "hash hedging", if you will :-)
Thanks-to: Zach Brown <zab@redhat.com>
Change-Id: Ieba9a7071829d51860b7c131982f12e0136b9855
BUG: 838784
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4711
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is necessary to support "DHT over DHT" configurations, so that the
upper and lower instances of DHT don't step all over each other. Why
would we even consider such a thing? Because it gives us the ability to
do data tiering and rack-aware placement, either by themselves or as
complements to other functionality such as erasure codes or
deduplication which save space but cost performance. By setting up the
top-level DHT to place data into one of several lower-level DHT pools
based on policy instead of pure elastic hashing, we get better
performance for 90% of accesses and better storage efficiency for 90% of
data, all for relatively low effort.
Change-Id: I72e65c29edfc80babf39f7a2a00090f4588c4070
BUG: 924265
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4694
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Id6f156957e58aad06bf2602f880c7e4102b80fd1
BUG: 764890
Signed-off-by: JulesWang <w.jq0722@gmail.com>
Reviewed-on: http://review.gluster.org/4679
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We needed to zero out the layout range, before we re-calculate the range.
When spread-count is issued, we would end up with stale ranges in the layout.
Replaced dht_selfheal_dir_xattr with dht_fix_dir_xattr, which correctly resets
the un-used (after re-cal) layouts.
Change-Id: I1a900d15df07335f59356bd23182ccec34381ab2
BUG: 884455
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4647
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I03be3cb23684de4ab36cf2953002708466edd580
BUG: 765433
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: http://review.gluster.org/4601
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ie2259023b9001311a2032792639c3093054f6750
BUG: 896431
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4552
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In dht_notify, we used to create a thread to start defrag
crawls after we had heard from all child subvols.
This was in-correct, as a later event, could also trigger the
crawl again(due to the fact that all subvols had responded).
The fix is to make sure, the thread is started only once after
all subvols have responded the first time
Change-Id: Ifc2978b9dc866af2395b79911eca50ab38ff9457
BUG: 916449
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4587
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia2e6bce80710d103da9d78afdb389ea162b00686
BUG: 912564
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4590
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'gluster volume rebalance' command will be enhanced to support passing of these
options/pattern.
<pattern> is comma separated list as show below. The Precedence is from right
to left.
e.g- "*avi,*pdf:10MB,*:1KB"
The precedence is as follows:
migrate all files with size equal or greater than 1KB "*:1KB"
migrate all pdf files with size equal or greater than 10MB "*pdf:10MB"
migrate all avi files "*avi"
With this option, it is possible to choose which files to migrate.
Change-Id: I6d6d6a015bcbacf1debae2f278a2d92306fb055d
BUG: 896456
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4366
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If holes are encountered, then we do not write these to the dst,
which sometimes causes file size to be lesser than src. Data is not
corrupted, as when non-zero reads are received, we do write that data.
Calling a truncrate to give file size to prevent it from being
truncated to less than src in case the file end has holes.
Thanks to Brian Foster for providing the test case
Change-Id: I3cdd143b63ec8d797273d76189dff8b05eb9e551
BUG: 915554
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4574
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0cf8a59c81e30603aadc45393bbd49d80ae1621f
BUG: 912657
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4542
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to support the common "write to temp file then rename" idiom. In our
case this causes us to create a linkfile during the rename in (N-1)/N cases,
with a significant impact on performance e.g. for UFO which uses this idiom.
This patch allows the user to specify up to two regular expressions that
separate the permanent and transient parts of a temp-file name:
rsync-hash-regex reimplements the existing RSYNC_FRIENDLY_NAME
pattern where the temporary name for EXAMPLE is .EXAMPLE.suffix
extra-hash-regex can be used for a second pattern, active concurrently,
and can include alternatives via the POSIX extended-regex syntax
Change-Id: Ic1a6ed19324bc31fefe91ee34b8478877a9c5d2c
BUG: 912564
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4116
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
provide space between path and next string, so that we can grep
for the correct path in error logs.
Change-Id: Ide53e341864dce432406a58e8cbb2ff1480cfbde
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 815194
Reviewed-on: http://review.gluster.org/4531
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Though linkfile_create and rebalance dst file create sent a setattr
with correct ownership, there is still a race window where the linkfile
open (client open due to migration) will fail, as its ownership will be
root:root.
Change-Id: I056092da6102319efa3bb9f1795f8c97db2a3fed
BUG: 884597
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4513
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After fix http://review.gluster.org/4282 (libglusterfsterfs/syncop: do not
hold ref on the fd in cbk) was pushed, syncop_open does not take a ref anymore.
Change-Id: I5543986a4f2d965eef42ca979b39ac75439cec49
BUG: 910661
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4509
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, linkfile creation happens as root.
use uid/gid returned from _cbk (link/rename) to set the correct ownership of
the link files.
Also added test/dht.rc to implement common dht functions
Change-Id: Iecdcf52d89fed8286670ce430b9d19deebd27b8c
BUG: 884597
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4304
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since directories have presence on all subvolumes there is
no definite meaning of ->hashed_subvol or ->cached_subvol.
getxattr() code path chooses ->cached_subvol for pathinfo
extended attribute. While this makes sense of files, it makes
less sense for directories. Further if a hashed or a cached
subvolume is down, and there's a getxattr request for a
directory, we return with an errno.
This patch changes pathinfo extended attribute contents by
aggregating information from all subvolumes that are up.
Change-Id: I58adb741d63ccfd1d0239af75eb65f26f0fb384d
Signed-off-by: Venky Shankar <vshankar@redhat.com>
BUG: 856455
Reviewed-on: http://review.gluster.org/4047
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I1c9541058c7d07786539a3266ca125a6a15287d8
BUG: 859835
Signed-off-by: Anand Avati <avati@redhat.com>
Original-author: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com>
Signed-off-by: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com>
Reviewed-on: http://review.gluster.org/3967
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This method deals with the case where swapping might gain a bigger overlap
for the xlator currently under consideration, but sacrifices even more from
the xlator we're swapping with. For example:
A = 0x00000000 - 0x44444443 (new 0x00000000 - 0x55555554)
B = 0x44444444 - 0x77777776 (new 0x55555555 - 0xaaaaaaa9)
C = 0x77777777 - 0xffffffff (new 0xaaaaaaaa - 0xffffffff)
Here, the new range for B has a bigger overlap with the old C than with the
old B (0x33333333 vs. 0x22222222 to be precise) so looking only at that
might lead us to swap. However, such a swap turns the new C's overlap from
0x55555556 (vs. old C) to *zero* (vs. old B). In other words, we've gained
0x11111111 for B but lost 0x55555556 for C, so it's a bad idea.
The new algorithm accounts for all effects of the swap, so it not only avoids
bad swaps but can make some good ones that would have been missed previously.
For example, if swapping a range X with a later range Y would not increase the
overlap for X we would previously have skipped it even if the swap would
increase Y's overlap without affecting X's. This is the normal case when we're
adding a new brick (which initially has zero overlap with any old range) so
finding more good swaps is probably even more important than avoiding bad ones.
Also, the logic in dht_overlap_calc was completely broken before, causing
integer overflows instead of providing correct values, so no matter what
higher-level algorithm was in place the GIGO effect would have resulted in
bad decisions.
Change-Id: If61ed513cfcb931916c6b51da293e3efbaaf385f
BUG: 853258
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/3908
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Files were being created in subvol which had less than
min_free_disk available even in the cases where other
subvols with more space were available.
Solution:
Changed the logic to look for subvol which has more
space available.
In cases where all the subvols have lesser than
Min_free_disk available , the one with max space and
atleast one inode is available.
Known Issue: Cannot ensure that first file that is
created right after min-free-value is crossed on a
brick will get created in other brick because disk
usage stat takes some time to update in glusterprocess.
Will fix that as part of another bug.
Change-Id: If3ae0bf5a44f8739ce35b3ee3f191009ddd44455
BUG: 858488
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/4420
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In dht_mkdir_cbk, EEXIST error is treated like a true error. Because
of this the following sequence of events can happen, eventually
resulting in GFID mismatch and (and possibly leaked locks and hang,
in the presence of replicate.)
The issue exists when many clients concurrently attempt creation of
directory and subdirectory (e.g mkdir -p /mnt/gluster/dir1/subdir)
0. First mkdir happens by one client on the hashed subvolume. Only
one client wins the race. Others racing mkdirs get EEXIST. Yet
other "laggers" in the race encounter the just-created directory
in lookup() on the hash dir.
1. At least one "lagger" lookup() notices that there are missing
directories on other subvolumes (which the "winner" mkdir is yet
to create), and starts off self-heal of the directory.
2. At least on some subvolumes, self-heal's mkdir wins the race
against the "winner" mkdir and creates the directory first. This
causes the "winner" mkdir to experience EEXIST error on those
subvolumes.
3. On other subvolumes where "winner" mkdir won the race, self-heal
experiences EEXIST error, but self-heal is properly translating
that into a success (but mkdir code path is not -- which is the
bug.)
4. Both mkdir and self-heal assign hash layouts to the just created
directory. But self-heal distributes hash range across N (total)
subvolumes, whereas mkdir distributes hash range across N - M
(where M is the number of subvolumes where mkdir lost the race).
Both the clients "cache" their respective layouts in the near
future for all future creates inside them (evidence in logs)
5. During the creation of the subdirectory, two clients race again.
Ideally winner performs mkdir() on the hashed subvolume and proceeds
to create other dirs, loser experiences EEXIST error on the hashed
subvolume and backs off. But in this case, because the two clients
have different layout views of the parent directory (because of
different hash splits and assignements), the hashed subvolumes for
the new directory can end up being different. Therefore, both clients
now win the race (they were never fighting against each other on a
common server), assigning different GFIDs to the directory on their
respective (different) subvolumes. Some of the remaining subvolumes
get GFID1, others GFID2.
Conclusion/Fix:
Making mkdir translate EEXIST error as success (just the way self-heal
is already rightly doing) will bring back truth to the design claim
that concurrent mkdir/self-heals perform deterministic + idempotent
operations. This will prevent the differing "hash views" by different
clients and thereby also avoid GFID mismatch by forcing all clients
to have a "fair race", because the hashed subvolume for all will be
the same (and thereby avoiding leaked locks and hangs.)
Change-Id: I84592fb9b8a3f739a07e2afb23b33758a0a9a157
BUG: 907072
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4459
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Default_fops uses stack_wind_tail. It winds without creating the frame leading
into wrong subvol return in the cookie. To avoid the problem caused by the
same, we're getting the subvol by passing the cookie.
Change-Id: I51ee79b22c89e4fb0b89e9a0bc3ac96c5b469f8f
BUG: 893338
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/4388
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Do not do fd_ref in cbks of the fops which return a fd (such as
open, opendir, create).
Change-Id: Ic2f5b234c5c09c258494f4fb5d600a64813823ad
BUG: 885008
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4282
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The earlier logic used to check if (layout-spread-count <= subvol_cnt -
decommissioned bricks). With this if a subvol was down, and layout-spread was >
upsubvols, a mkdir ended up creating holes in the layout.
The fix is to consider only the combination of subvols which are usable (not
down or not decommissioned).
Change-Id: I61ad3bcaf4589f5a75f7887cfa595c98311ae3bb
BUG: 902610
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4412
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0db00b7334bb9707ab48bd661ac03a3ad818d6e4
BUG: 893458
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4393
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* warnings on 'void *' arguments
* warnings on empty initializations
* warnings on empty array (array[0])
Change-Id: Iae440f54cbd59580eb69f3ecaed5a9926c0edf95
BUG: 875913
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4219
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we follow a linkfile, and the lookup returns a ENOTCONN error, return
the error, as the cached subvol is down, and lookup_everywhere wont succeed,
but actually ends up clearing the linkfile, and clearing the namespace.
Change-Id: I772bf71531bc646e8fb62d3e8549a5fe0f3896da
BUG: 893378
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4383
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Used local->postparent(contains merged iatt of all succesful calls) instead
of postparent for dht ctx time update.
2. dht_inode_ctx_time_update avoided in case of opret -1.
Change-Id: Ie04a7842a41c241f911b6a3f76267b996d27fb43
BUG: 881013
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/4338
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By failing over readdir (default behaviour), rebalance could get duplicate
files, as readdir would re-read from offset 0. Rebalance should not attempt
to migrate these files again.
Additionally, we need to handle these cases as failure in rebalance crawl.
No test case provided, as we cannot determine the read child in afr.
Change-Id: If07508b4f92dacc17e0f695b48a866c7c66004be
BUG: 859387
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4300
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When subvols-per-directory option is used, with bricks addition/removal
the layouts might get distributed to other subvols, which were not part
of the layout before. We need to clean up layouts on old subvolumes, to
prevent overlaps.
Also, we need to make sure if layout-cnt is never less
than subvolume-cnt.
Change-Id: I00994a092ca0c99aedcc41bd9412d43460f88a04
BUG: 884455
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4281
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many people have asked for behavior like the old NUFA, which builds and
seems to run but was previously impossible to enable/configure in a
standard way. This change allows NUFA to be enabled instead of DHT from
the command line, with automatic selection of the local subvolume on each
host.
Change-Id: I0065938db3922361fd450a6c1919a4cbbf6f202e
BUG: 882278
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4234
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* write-behind: free the inode context in wb_forget
* distribute: in readdirp callback put the allocated context to the inode
* distribute: check if the layout is NULL before accessing it in layout_unref
Change-Id: I7698f81b85b99d06bf6b01fc1a6e51e1593b5e27
BUG: 790709
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4250
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If any subvolume is down, and a layout is re-written and hash
values change, entry names in the downed subvol can be reused
in the other subvol which got the same hash range. when the
downed subvol is brought back up, duplicate entried might appear
Also separated handling of ENOSPC and ENOTCONN error.
Change-Id: I5ed93990425a4cee70df2dab7c7c119fdc87ad56
BUG: 860663
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4000
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Identify mismatching uid/gid in lookup, and trigger a syncop
heal. uid/gid of subvol with latest ctime is trusted (local->prebuf).
Change-Id: Ib5c4bc438e7f4b1f33080e73593f40f400e997f0
BUG: 862967
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/3964
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4f518a969bbe3a11075e7c9ae10bd21bf059d5f3
BUG: 867253
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4240
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
lk is sent to only cached subvolume. Hence there is no point in
sending LOCKINFO to other children (even in case of directories).
Change-Id: Ia20fc358dfa84cee9a52d1f613564ff6f25aa0c9
BUG: 808400
Signed-off-by: Raghavendra <raghavendra@gluster.com>
Reviewed-on: http://review.gluster.org/4123
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia7ea63471f0bbd74686873f5f6f183475880f1a0
BUG: 839595
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.org/4162
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
testcase:
The changes are for removing gf_log from statedump related sections in dht and
using pthread_mutex_trylock in statedump sections. Changes are internal. So
tests were done by attaching gdb to the process and executing by manually
changing the values of some of the pointers.
Change-Id: I41fa76c1812b462cb76f5bbf2fd14de080e73895
BUG: 843822
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4117
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
->hashed_subvol is not valid (== NULL) when the subvolume
the entity hashes to is down. For directories, we need not
rely on ->hashed_subvol as we aggregate information from all
subvolumes. So, during lookup, NULL ->hashed_subvol is ingored
but logged.
Change-Id: I306e4e274fe29d60ff028add4a6c3bcd67b2f314
Signed-off-by: Venky Shankar <vshankar@redhat.com>
BUG: 856459
Reviewed-on: http://review.gluster.org/4046
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
save the a/c/mtime in inode_ctx, and dht_inode_ctx_update
checks the passed iatte, and updates the stat's time,
and inode_ctx's time accordingly. For preparent times, only
the iatt stat to be returned is updated, not the ctx.
With this, update, WIPE is removed, as we would always be passing
back the latest mtime, and hence cache times will be relevant.
TODO-handle rename WIPE calls
Change-Id: I8e4c738cd830f3fafeef789c9181f9c242ac96a2
BUG: 857791
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/3737
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Automake provides a separate variable for preprocessor flags
(*_CPPFLAGS). They are already uses in a few places, so make it
consistent and use it everywhere. Note that cflags obtained from
pkg-config often are cppflags, which is why LIBXML2_CFLAGS moves with
into AM_CPPFLAGS, for example.
Change-Id: I15feed1d18b2ca497371271c4b5876d5ec6289dd
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4029
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFLAGS
libtool will automatically add "-fPIC" to the compiler command line as
needed, so there is no need to specify it separately.
"-shared" is normally a linker flag and has an odd effect when used with
libtool --mode=compile, namely that it inhibits production of static
objects. For that however, using AC_DISABLE_STATIC is a lot simpler.
Change-Id: Ic4cba0fad18ffd985cf07f8d6951a976ae59a48f
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4027
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "-nostartfiles" is a discouraged option and is documented to
potentially result in undesired behavior. Since I see no reason why it
should be in glusterfs, remove it.
Change-Id: I56f2b08874516ebad91447b2583ca2fb776bb7ab
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4018
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some -D flags are present in all files, so collect them.
This adds -D${GF_HOST_OS} to some compiler command lines,
but this should not be a problem.
Change-Id: I1aeb346143d4984c9cc4f2750c465ce09af1e6ca
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4013
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eg: changed recieved to received
Change-Id: I360fcb99c97c8a0222e373fee20ea2fccfb938db
BUG: 860543
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/3998
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jeff Darcy wrote:
> AFAICT, the fix-layout code doesn't do the same rotation that the
> new-directory code does. Therefore, the new bricks always claim
> completely predictable hash ranges for every directory, leading to
> either a 0-1-2-3 pattern or a 1-0-2-3 pattern. In other words, a
> file whose hash falls into the second quarter of the range will always
> be assigned to brick 2, and a file whose hash falls into the fourth
> quarter will always be assigned to brick 3. The rest will be split
> according to the original pattern. Put still another way, instead of
> same-named files in different directories being spread across N bricks,
> they might be spread across only two bricks (bad) or totally
> concentrated on one brick (worse) regardless of N.
The current dht_fix_layout_of_directory() code, in an attempt to
maximize overlap of new layout with existing layout (to minimize
movement of data) fails to do a good job of randomizing new assignment
even when it could do a better job. In an example where we expand
from 2 nodes to 4 nodes, the current possibilities are limited in the
following way -
(theoretical hash range: 00 - 99)
OLD 1
-----
server1: 00 - 49
server2: 50 - 99
NEW 1
-----
server1: 00 - 24
server2: 50 - 74
server3: 25 - 49
server4: 75 - 99
OLD 2
-----
server1: 50 - 99
server2: 00 - 49
NEW 2
------
server1: 50 - 74
server2: 00 - 24
server3: 25 - 49
server4: 75 - 99
The above shows that when add-brick from 2 bricks to 4 bricks, server3
and server4 always get the _same_ hash range no matter what the original
hash range assignment was.
The fix in this patch is first do the standard new directory assignment
to a directory (with rotation etc.) and then do the reassignment to
maximize overlap. This way newly added servers still get random ranges
and existing servers have a probability of getting either of the quarters
which were part of its half previously. The same principles hold for
all add-brick from M to M+N.
Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f
BUG: 853258
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/3883
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* with the init option cleanups, setting of 'conf->disk_unit'
was reset, which made it not set the '%' in the option.
* bring a global check, which makes the option assume its
percent, as long as value is < 100.
Change-Id: I00bd1395a309cdc596a2b2b80304c6d98696a24a
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 852889
Reviewed-on: http://review.gluster.org/3918
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I83cccab6819d6a74e96c2717ca539fa1568cac89
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 843822
Reviewed-on: http://review.gluster.org/3912
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* ie, don't dereference dict_t pointer, instead use APIs everywhere
* other than dict_t only 'data_t' should be the valid export from dict.h
* added 'dict_foreach_fnmatch()' API
* changed dict_lookup() to use data_t, instead of data_pair_t
Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 850917
Reviewed-on: http://review.gluster.org/3829
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|