| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shard's writev implementation, as part of identifying
presence of participant shards that aren't in memory,
first sends an MKNOD on these shards, and upon EEXIST error,
looks up the shards before proceeding with the writes.
The VM corruption was caused when the following happened:
1. DHT had n subvolumes initially.
2. Upon add-brick + fix-layout, the layout of .shard changed
although the existing shards under it were yet to be migrated
to their new hashed subvolumes.
3. During this time, there were writes on the VM falling in regions
of the file whose corresponding shards were already existing under
.shard.
4. Sharding xl sent MKNOD on these shards, now creating them in their
new hashed subvolumes although there already exist shard blocks for
this region with valid data.
5. All subsequent writes were wound on these newly created copies.
The net outcome is that both copies of the shard didn't have the correct
data. This caused the affected VMs to be unbootable.
FIX:
For want of better alternatives in DHT, the fix changes shard fops to do
a LOOKUP before the MKNOD and upon EEXIST error, perform another lookup.
Change-Id: I8a2e97d91ba3275fbc7174a008c7234fa5295d36
BUG: 1440051
RCA'd-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
Reported-by: Mahdi Adnan <mahdi.adnan@outlook.com>
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: https://review.gluster.org/17010
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT seems to link inode during lookup even before initializing
inode ctx with layout information, which comes after
directory healing.
Consider two parallel writes. As part of the first write,
shard sends lookup on .shard which in its return path would
cause DHT to link .shard inode. Now at this point, when a
second write is wound, inode_find() of .shard succeeds and
as a result of this, shard goes to create the participant
shards by issuing MKNODs under .shard. Since the layout is
yet to be initialized, mknod fails in dht call path with EIO,
leading to VM pauses.
The fix involves shard maintaining a flag to denote whether
a fresh lookup on .shard completed one network trip. If it
didn't, all inode_find()s in fop path will be followed by a
lookup before proceeding with the next stage of the fop.
Big thanks to Raghavendra G and Pranith Kumar K for the RCA
and subsequent inputs and feedback on the patch.
Change-Id: I9383ec7e3f24b34cd097a1b01ad34e4eeecc621f
BUG: 1420623
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: https://review.gluster.org/14419
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... as opposed to adding checks in "common" functions to choose the inode
to resolve based local->fop, which is rather ugly and prone to errors.
Change-Id: Ia46cc59992baa2979516369cb72d8991452c0274
BUG: 1420623
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: https://review.gluster.org/16709
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GEO-REP INTEROP WITH SHARD FEATURE
shard xlator updates size of the file using FXATTROP
or XATTROP. Hence record the same in changelog.
Change-Id: Ie0c21e9326da05ea78dc1ef3fd32a90ef38b4bb9
BUG: 1265148
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/12225
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
xattr value has changed
Change-Id: Ia3225a523287f6689b966ba4f893fc1b1fa54817
BUG: 1272986
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12400
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
objects
Shard translator will now maintain an lru list of inodes associated with
individual shards of constant size, and will make sure that at no point the
number of these inodes will exceed the configured limit.
This is to keep the memory consumption by the thousands of shards of every large
file from exploding.
Change-Id: I5e60eea5dcf3130257fb431ca70cfaba53cae7f3
BUG: 1252263
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12254
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iac01e6a89a0d0c37a12a5e47f17f7ced85a31590
BUG: 1265516
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12217
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch 1/2 of the performance improvement work
for sharding in the IO path.
What this patch does:
Since the primary use-case where sharding is targeted -
VM store - is a single-writer workload, instead of
performing lookup on the base file everytime to gather the
size and block count from the backend in reads, writes and
truncate, now the size and block count is also cached and
kept up-to-date after every inode write in the inode ctx.
TO-DO:
Make changes in rename, link, unlink, [f]setattr and [f]stat
to keep the relevant iatt members up-to-date in the inode ctx.
Change-Id: Ica87d020dabc3a3dbccec814b26b01d6a629ff4d
BUG: 1258905
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12126
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I40e4a5dbd13d6c3d777e7e01f93dabc83e52b137
BUG: 1260637
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12121
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does the following:
* reverts commit b467af0e99b39ef708420d3f7f6696b0ca618512
* changes ownership on shards under /.shard to be root:root
* makes readv, writev, [f]truncate, rename, and unlink fops
to perform operations on files under /.shard with
frame->root->{uid,gid} as 0.
This would ensure that a [f]setattr on a sharded file
does not need to be called on all the shards associated with it.
Change-Id: Idcfb8c0dd354b0baab6b2356d2ab83ce51caa20e
BUG: 1251824
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/11992
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iceccef8f3f466c7ffb9991f8eb248b81e7b80efb
BUG: 1256580
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12020
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
original file
Change-Id: Id759af8f3ff5fd8bfa9f8121bab25722709d42b7
BUG: 1251824
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/11874
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Icd8984976812bb47ae7129426f6c1aa9393b3ab9
BUG: 1232391
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/11467
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to get_lowest_block() being a macro, what needs to be passed
to it is the evaluation of the expression (local->offset - 1), without
which its substitution can cause junk values to be assigned to
local->first_block.
This patch also fixes calls to get_highest_block() where if offset and
size are both equal to zero, it could return negative values.
Change-Id: I3ae918a0a3251ffd9ce8d2294bc5f9b681447627
BUG: 1200082
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10804
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When readdir(p) is performed on '/' and ".shard" happens to be
the last of the entries read in a given iteration of dht_readdir(p)
(in other words the entry with the highest offset in the dirent list
sorted in ascending order of d_offs), shard xlator would delete this
entry as part of handling the call so as to avoid exposing its presence
to the application. This would cause xlators above (like fuse,
readdir-ahead etc) to wind the next readdirp as part of the same req
at an offset which is (now) the highest d_off (post deletion of .shard)
from the previously unwound list of entries. This offset would be less
than that of ".shard" and therefore cause /.shard to be read once again.
If by any chance this happens to be the only entry until end-of-directory,
shard xlator would delete this entry and unwind with 0 entries, causing the
xlator(s) above to think there is nothing more to readdir and the fop is
complete. This would prevent DHT from gathering entries from the rest of
its subvolumes, causing some entries to disappear.
Fix:
At the level of shard xlator, if ".shard" happens to be the last entry,
make shard xlator wind another readdirp at offset equal to d_off of
".shard". That way, if ".shard" happens to be the only other entry under '/'
until end-of-directory, DHT would receive an op_ret=0. This would enable it
to wind readdir(p) on the rest of its subvols and gather the complete picture.
Also, fixed a bug in shard_lookup_cbk() wherein file_size should be fetched
unconditionally in cbk since it is set unconditionally in the wind path, failing
which, lookup would be unwound with ia_size and ia_blocks only equal to that of
the base file.
Change-Id: I6c2bc770f1bcdad51c273c777ae0b42c88c53f61
BUG: 1222379
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10809
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To-Do:
* Make ftruncate work even in the absence of path
* Aggregate and update ia_blocks appropriately when a file is
truncated to a lower size.
Change-Id: Ifd24c2f5e80d2c3bc921261f5481251df8948126
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10631
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4cc060710482de8633141170dd35f669f01f639b
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10528
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I1a90ad6669c1cb79aaae6b4bd9621c75d9985c8a
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10446
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia39fa0f29eac84c18d13a94f704b1703565b504e
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10373
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I5cdd805821a4f3657f490223b97f42c724ee588f
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10249
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Metadata read fops like lookup, stat etc will now fetch the xattr that
holds the size and block count information, extract the size and block
count fields and set them in respective stbuf before unwinding the
resultant iatt to the parent xlator.
Change-Id: I881be8955092fa6b75f8b0e4f3deb01344cb638e
BUG: 1207603
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10098
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With each inode write FOP, the size and block count of the file will be
updated within the xattr. There are two 64 byte fields that are
intentionally left blank for now for future use when consistency
guarantee is introduced later in sharding.
Change-Id: I40a2e700150c1f199a6bf87909f063c84ab7bb43
BUG: 1207603
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10097
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Renamed shard_writev_create_write_shards() to shard_common_resolve_shards()
to appropriately reflect its functionality and for reuse in other fops too.
* Move code common to MKNOD and CREATE into a macro.
* Cut down on if nesting in shard_lookup_cbk()
Change-Id: I488255499673accd426390c6d42f2b39bab3d637
BUG: 1205661
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10096
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reusing local->xattr_req for the several calls and callbacks per xlator fop
would cause keys set from previous call/cbk (sometimes even by the xlators
below) to remain which in some cases can lead to errors. For instance, the
presence of "trusted.glusterfs.dht.*" keys (which are remnants of the previous
call/cbk), can cause the GF_IF_INTERNAL_XATTR_GOTO() check in DHT to fail when
the same dict is used to wind [f]setxattr.
Change-Id: I8612d020f83f3dc55e4a34d10ccbdaf11d7b4fdd
BUG: 1205661
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10095
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Return number of bytes written in writev cbk on success
* Eliminate separate inode table for sharding xlator.
* Fix appearance of "shard" as an option within the
volfile for subvolume of type features/shard.
* Fix values of min and max allowed shard block size
* Return @new as opposed to NULL in shard_create_gfid_dict() on success
Change-Id: I6319d377a196d1c5ceed1f65d337ff8eabcb21f8
BUG: 1205661
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10003
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
Based on the high-level design by Anand V. Avati which can be found @
https://gist.github.com/avati/af04f1030dcf52e16535#sharding-xlator-stripe-20
Still to-do:
* complete implementation of inode write fops - [f]truncate,
zerofill, fallocate, discard
* introduce transaction mechanism in inode write fops
* complete readv
* Handle open with O_TRUNC
* Handle unlinking of all shards during unlink/rename
* Compute total ia_size and ia_blocks in lookup, readdirp, etc
* wind fsync/flush on all shards
Note: Most of the items above are related. Once we come up
with a clean way to determine the last shard/shard count for
a file/file size and the mgmt of sparse regions of the file,
implementing them becomes trivial.
Change-Id: Id871379b53a4a916e4baa2e06f197dd8c0043b0f
BUG: 1200082
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/9841
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|