| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces multithreaded filesystem scrubber based
on throttling option configured for a particular volume. The
implementation "logically" breaks scanning and scrubbing with
the number of scrubber threads auto-configured depending upon
the throttle configuration. Scanning (crawling) is left single
threaded (per brick) with entries scrubbed in bulk. On reaching
this "bulk" watermark, scanner waits until entries are scrubbed.
Bricks for a particular volume have a set of thread(s) assigned
for scrubbing, with entries for each brick scrubbed in a round
robin fashion to avoid scrub "stalls" when a brick (out of N
bricks) is under active scrubbing.
This mechanism helps us implement "pause/resume" with ease: all
one need to do is to cleanup scrubber threads and let the main
scanner thread "wait" untill scrubbing is resumed (where the
scrubber thread(s) are spawned again), therefore continuing
where we left off (unless we restart the deamons, where crawl
initiates from root directory again, but I guess that's OK).
[
NOTE:
Throttling is optional for the signer daemon, without which
it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER"
predefined in CFLAGS enables CPU throttling (during checksum
calculation) thereby avoiding high CPU usage.
]
Subsequent patches would introduce CPU throttling during hash
calculation for scrubber.
Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f
BUG: 1207020
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10511
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a WIP.
Change-Id: Ia36f77d158a370f77cb866a32308b27e10d39b5e
BUG: 1218638
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10656
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of tiering feature is recommended only for testing purpose in
this release, attaching tier should get a confirmation from the user
before proceeding with the command.
Change-Id: I141bbb1d0439f0a28eb51d17f7800908e35c75ad
BUG: 1219032
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/10610
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To-Do:
* Make ftruncate work even in the absence of path
* Aggregate and update ia_blocks appropriately when a file is
truncated to a lower size.
Change-Id: Ifd24c2f5e80d2c3bc921261f5481251df8948126
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10631
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems possible that auth_cache->cache_dict is not always allocated
before it is accessed. Instead of allocating the dict upon the 1st
access, just create it in auth_cache_init().
Change-Id: I00e60522478b433cb0aae0c1f0948eac544dfd2b
URL: http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10710
BUG: 1143880
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10600
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) The glupy.so xlator should embed the runtime search path for
the python libraries. Unfortunately, python-config does not
gives the appprioate flags, therefore we need to also use
pkg-config to obtain them
2) Fix the glupy python module directory layout so that python
can import the module without problem
That two fixes seems to let glupy.t pass on NetBSD again.
BUG: 1129939
Change-Id: I397aa726ab8bf7d91fa0d6d870a30910a5f4a5d9
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10616
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change broke the build on NetBSD, FreeBSD, and MacOS X:
http://review.gluster.org/10526/
We restore the build with two fixes:
- Use POSIX-compliant sysconf(_SC_NPROCESSORS_ONLN) to get the
number of processors, instead of Linux specific get_nprocs().
That let us remove Linux-specific #include <sys/sysinfo.h>
- Only define MAX() if it is not already defined. NetBSD defines
it in <sys/param.h> which is already included
BUG: 1129939
Change-Id: I62341c670598670e47ea2f69ab94864f96588b18
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10652
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The global config file should not be emptied when
tear down is called. Improving the sed expression
to delete the ".conf" files only
Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com>
Change-Id: Ida3d303f629cb512c02dadacce1ec7e5f07db018
BUG: 1218854
Reviewed-on: http://review.gluster.org/10630
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: soumya k <skoduri@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BitRot daemons (signer & scrubber) are disk/cpu hoggers when left
running full throttle. Checksum calculations (especially SHA family
of hash routines) can be quite CPU intensive. Moreover periodic
disk scans performed by scrubber followed by reading data blocks
for hash calculation (which is also done by signer) generate lot
of heavy IO request(s). This causes interference with actual client
operations (be it a regular client or filesystems daemons such as
self-heal, etc..) and results in degraded system performance.
This patch introduces throttling based on Token Bucket Filtering[1].
It's a well known algorithm for checking (and ensuring) that data
transmission conform to defined limits and generally used in packet
switched networks. Linux control groups (Cgroups) uses a variant[2]
of this algorithm to provide block device IO throttling (cgroup
subsys "blkio": blk-iothrottle).
So, why not just live with Cgroups?
Cgroups is linux specific. We need to have a throttling mechanism
for other supported UNIXes. Moreover, having our own implementation
gives much more finer control in terms of tuning it for our needs
(plus the simplicity of the alogorithm itself).
Ideally, throttling should be a part of server stack (either as a
separate translator or integrated with io-threads) since that's
the point of entry for IO request(s) from *all* client(s). That
way one could selectively throttle IO request(s) based on client
PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc..
(*actual* clients can run full throttle). This implementation
avoids that deliberately (there needs to be a much more smarter
queueing mechanism) and throttles CPU usage for hash calculations.
This patch is just the infrastructure part with no interfaces
exposed to set various throttling values. The tunable selected
here (basically hardcoded) avoids 100% CPU usage during hash
calculation (with some bursts cycles). We'd need much more
intensive test(s) to assign values for various throttling
options (lazy/normal/aggressive).
[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket
Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8
BUG: 1207020
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10307
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Linux, kernel cache can be flushed using
echo 3 > /proc/sys/vm/drop_caches
This non-portable approach can be replaced by an on-purpose
failed attempt to unmount: if the mount point is the current
directory and umount is called, the kernel will flush inodes
until it realize it cannot complete the operation because
root of filesystem is busy:
( cd $M0 ; umount $M0 )
Unfortunately this does not flush everything. Entries may
still be present in the kenrel FUSE cache. Using $GFS to
mount the filesystem ensure --entry-timeout=0 and clears
this problem.
Some stall information may also remain in glusterfs caches,
and that may have to be adressed by appropriate volume option.
For instance tests/bugs/rpc/bug-954057.t needs to disable
performance.stat-prefetch. Qtherwise, root's new credentials
are not evaluated after root-quash is enabled. The test could
also be done with performance.stat-prefetch enabled using
various tricks: copying the file to read, creating a hard link
on it, or just waiting long enough for metadata cache to expire.
BUG: 1129939
Change-Id: I54929e899d55c04dcd9d947809133549f01fd0e1
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10411
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Default Values for last_synced, checkpoint_time and
checkpoint_completion_time was zero instead of 'N/A'
BUG: 1212410
Change-Id: Ie775508f8dcb9ba6f311946a2039739e4336d9a6
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/10580
Reviewed-by: darshan n <dnarayan@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Argparse Python library is available as standard library in Python 2.7
In rhel, we need to install python-argparse.
Also added pyxattr dependency to the spec file.
Change-Id: Ie50278eddf81e7cbf98d1e8d27e411b46676b432
Signed-off-by: Aravinda VK <avishwan@redhat.com>
BUG: 1209138
Reviewed-on: http://review.gluster.org/10321
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When rsync is executed using Python subprocess, by default
stdout of subprocess will be None. With the log rsync performance
patch stdout is assigned to PIPE. Rsync writes to that PIPE
whenever it syncs files. If log_rsync_performance is disabled
then nobody will consume stdout and that gets full. Rsync hangs
if PIPE is full.
log_rsync_performance option is introduced with patch 10070
With this patch stdout=PIPE only if log_rsync_performance is
enabled. Also removed -v option from Rsync.
Thanks Venky and Kotresh for RCA.
BUG: 1218552
Change-Id: I4ebcfb6999358c8e2c147f7964255bd836ed7499
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/10556
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NFS-Ganesha related config files have to be copied over to the new node
and NFS-Ganesha service has to be started.
Similary NFS-Ganesha service has to be stopped when a node is
deleted from the HA cluster.
Change-Id: Ia38e72cac86713fe23b7d1b829a256637a9ca796
BUG: 1212816
Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com>
Reviewed-on: http://review.gluster.org/10596
Reviewed-by: soumya k <skoduri@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a snap is activated or deactivated, when a node is down,
it is not retrieving the data properly during the handshake
of glusterd
With this patch, a version check will made when a glusterd
is started running. If there is a mismach in version, then
peers will exchange the healed data.
Change-Id: I8bd2a347723db2194d3fa73295878b4dd2e9be5d
BUG: 1122377
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9664
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modified the calls to open in fops-sanity.c to pass in
the mode as well if flags includes O_CREAT (as per man page).
The missing mode randomly caused T files to be created causing DHT
to treat them as linkto files and fail the fop.
Modified 2 other files where the mode was not being provided.
Change-Id: I047573d43655b4957d0703f7df36238f7e729c1f
BUG: 1218951
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10590
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NFS-server sets EOF only in the READ reply when op_errno is set to
ENOENT. Xlators are expected to set op_errno to ENOENT when EOF is
reached, op_ret will contain the number of bytes returned by the READ.
When an NFS-client (like VMware ESXi) do a READ that exceeds the size of
the file, errno should be set to EOF and the return value contains the
number of bytes that are read (from the requested offset, until the end
of the file). Not setting EOF on a correct short READ, can result in
errors on the NFS-client.
This is not an issue with the Linux NFS-client (or VFS). Linux is smart
enough to not try to read more bytes than the file contains.
BUG: 1209298
Change-Id: Ib15538744908a6001d729288d3e18a432d19050b
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10142
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Throttle value will be "normal" by default. For throttling down,
a thread will be put in to sleep. And for throttling up,
gf_defrag_process_dir will wake up the sleeping threads.
Change-Id: I74d530e3effd6e60e6eec81ccc8ff65789fa9c13
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10526
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Provided setfattr command to set timeout for split-brain
choice.
2) If split-brain inspection/resolution is being done
from the mount for a file, ref the inode when
split-brain-choice is set.
This inode will be unconditionally unref-ed after timeout
seconds set by the user/default otherwise.
3) Updated the doc and testcase to reflect the changes.
Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1
BUG: 1209104
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/10134
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bitrot scrubber paused/resume command should give proper error messages if
scrubber already pause/resume and user again try to perform same
operation on a volume.
Change-Id: I01ad69c80f03b177535a4e5f1c95ab7709a804b0
BUG: 1210684
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/10209
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The generation number for each peerinfo object is unique. It can be used
to find the exact peerinfo object, which is required for peer RPC
notifications.
Using hostname and uuid matching to find peerinfos can return incorrect
peerinfos to be returned in certain cases like multi network peer probe.
This could cause updates to happen to incorrect peerinfos.
Change-Id: Ia0aada8214fd6d43381e5afd282e08d53a277251
BUG: 1215018
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/10495
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace-brick operation with data migration support have been
deprecated from gluster.
With this fix replace brick command will support only one commad
gluster volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> {commit force}
Change-Id: Ib81d49e5d8e7eaa4ccb5830cfec2bc081191b43b
BUG: 1094119
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/10101
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As optional feature, during unlink, full path will be recorded.
Changelog Version number to be bumped up to 1.2.
With this patch, parser checks the version number before parsing
and handles accordingly.
Change-Id: Ic1ad98259c39e417029a08e26a1d4b467817e65a
BUG: 1214561
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/10166
Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding 64 bits in "version" key of extended attributes. First 64 bits (Left)
represents Data version. Last 64 bits (right) represents Meta Data version.
Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in
3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as
the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with
old version files. For upgrades we need to tell users to complete heals and
then upgrade
BUG: 1215265
Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/10312
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inode quota is a new feature implemented in glusterfs-3.7
if quota is enabled in the older version and is upgraded
to a new version, we can hit setxattr spike during self-heal
of inode quotas. So, when a quota is enabled, turn off
inode-quotas with a xlator option.
With this patch, we still account for inode quotas but only
when a write operation is performed for a particular file.
User will be able to query inode quotas once the Inode-quota
xlator option is enabled.
Change-Id: I52fb28bf7024989ce7bb08ac63a303bf3ec1ec9a
BUG: 1209430
Signed-off-by: vmallika <vmallika@redhat.com>
Signed-off-by: Sachin Pandit <spandit@redhat.com>
Reviewed-on: http://review.gluster.org/10152
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The global config file will need new blocks to support client lock recovery.
Current cleanup function empties the entire file. Deleting only "include"
lines in the config file.
Change-Id: I21f09e30a738d2ba01861ce480ecf906667d887b
BUG: 1218854
Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com>
Reviewed-on: http://review.gluster.org/10592
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: soumya k <skoduri@redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Global option gluster features.ganesha enable
writes into the global 'option' file. The snapshot
feature also writes into the same file.
To handle concurrent multiple transactions correctly,
a new lock has to be introduced on this file.
Every operation using this file needs
to contest for the new lock type.
Change-Id: Ia8a324d2a466717b39f2700599edd9f345b939a9
BUG: 1200254
Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com>
Reviewed-on: http://review.gluster.org/10130
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: soumya k <skoduri@redhat.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ic3c9da566673f4d9b1d0974c5aa15121da087fba
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10607
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Tested-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: The CTR xlator records file meta (heat/hardlinks)
into the data. This works fine for files which are created
after ctr xlator is switched ON. But for files which were
created before CTR xlator is ON, CTR xlator is not able to
record either of the meta i.e heat or hardlinks. Thus making
those files immune to promotions/demotions.
Solution: The solution that is implemented in this patch is
do ctr-db heal of all those pre-existent files, using named lookup.
For this purpose we use the inode-xlator context variable option
in gluster.
The inode-xlator context variable for ctr xlator will have the
following,
a. A Lock for the context variable
b. A hardlink list: This list represents the successful looked
up hardlinks.
These are the scenarios when the hardlink list is updated:
1) Named-Lookup: Whenever a named lookup happens on a file, in the
wind path we copy all required hardlink and inode information to
ctr_db_record structure, which resides in the frame->local variable.
We dont update the database in wind. During the unwind, we read the
information from the ctr_db_record and ,
Check if the inode context variable is created, if not we create it.
Check if the hard link is there in the hardlink list.
If its not there we add it to the list and send a update to the
database using libgfdb.
Please note: The database transaction can fail(and we ignore) as there
already might be a record in the db. This update to the db is to heal
if its not there.
If its there in the list we ignore it.
2) Inode Forget: Whenever an inode forget hits we clear the hardlink list in
the inode context variable and delete the inode context variable.
Please note: An inode forget may happen for two reason,
a. when the inode is delete.
b. the in-memory inode is evicted from the inode table due to cache limits.
3) create: whenever a create happens we create the inode context variable and
add the hardlink. The database updation is done as usual by ctr.
4) link: whenever a hardlink is created for the inode, we create the inode context
variable, if not present, and add the hardlink to the list.
5) unlink: whenever a unlink happens we delete the hardlink from the list.
6) mknod: same as create.
7) rename: whenever a rename happens we update the hardlink in list. if the hardlink
was not present for updation, we add the hardlink to the list.
What is pending:
1) This solution will only work for named lookups.
2) We dont track afr-self-heal/dht-rebalancer traffic for healing.
Change-Id: Ia4bbaf84128ad6ce8c3ddd70bcfa82894c79585f
BUG: 1212037
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10370
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Idaae234b9e81c40040393e748db1f61363a48ed0
BUG: 1211913
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-on: http://review.gluster.org/10250
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*"
NOTE:
With this patch, data on existing volumes would be resigned
(which should be OK as of now since we do not expect many
users as of now :-))
Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10181
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: In case a defrag is null, going to out section will crash rebalance.
Change-Id: I8b3ee1ad85dc23ef0e2f2dd6f912d07216bd619f
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10582
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4cc060710482de8633141170dd35f669f01f639b
BUG: 1207615
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10528
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Anonymous fd's are floating fd assigned to a glusterfs client
without a explicit file open. Here either it will create a new
anonymous fd or existing anonymous fd in the client stack for
requested file.The anonymous fd's are mainly used for IO's.
This patch introduces two api's glfs_h_anonymous_read and
glfs_h_anonymous_write which performs read and write respectively
Change-Id: Id646f2220e8387b2f8bb244c848dc1db6761444f
BUG: 1204651
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/9971
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multi-Head NFS-Ganesha servers need upcall (cache-invalidation)
support to notify them in case of any changes to the files in the backend.
Hence, upcall xlator option "features.cache-invalidation" needs to be enabled
when ganesha.enable is set to 'on'. Similarly, this feature needs
to be disabled when ganesha.enable is set to 'off'
Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com>
Change-Id: Ifdd1d50e48a2bd2a388f73c0b9e318c6092ac190
BUG: 1213752
Reviewed-on: http://review.gluster.org/10581
Reviewed-by: soumya k <skoduri@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
snapd feature require statfs fops and thereby
handle of statfs
Change-Id: I037ee846aee3971826a73c592e15f1be27796c64
BUG: 1176837
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/10356
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
building libvirt fails
Change-Id: I355879e6c6f56e49c4ee78e1a9763598acacd074
BUG: 1218625
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/10579
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I851b4bd74b595d739f7bf9eea920d07e31fa5fbc
BUG: 1202244
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10540
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a read IO occurs against a file that has reached rebalance
phase 2, we redirect the IO to the destination. For tiered
volumes, when we try to reopen the file (on the destination),
the lower level DHT receives the open call and fails; it does
not have a "cached subvol". Fix is to "teach" the lower level
DHT of the new location by sending a locate before the open.
Change-Id: Ia4acb0035ff1da15f6a8f9ed54f43c76e8b98f5f
BUG: 1214048
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: root <root@gprfs018.sbu.lab.eng.bos.redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10324
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Background:
Glusterfs changelogs are stored in each brick, which records the changes
happened in that brick. Georep will run in all the nodes of master and
processes changelogs "independently".
Processing changelogs is in brick level, but all the fops will be replayed
on "slave mount" point.
Problem:
With a DHT volume, in changelog "internal fops" are NOT recorded.
For Rename case, Rename is recorded in "hashed" brick changelog.
(DHT's internal fops like creating linkto file, unlink is NOT recorded).
This lead us to inconsistent rename operations.
For example,
Distribute volume created with Two bricks B1, B2.
//Consider master volume mounted @ /mnt/master
and following operations executed:
cd /mnt/master
touch f1 // f1 falls on B1 Hash
mv f1 f2 // f2 falls on B2 Hash
// Here, Changelogs are recorded as below:
@B1
CREATE f1
@B2
RENAME f1 f2
Here, race exists between Brick B1 and B2, say B2 will get executed first.
Source file f1 itself is "NOT PRESENT", so it will go ahead and create
f2 (Current implementation).
We have this problem When rename falls in another brick and
file is unlinked in Master.
Similar kind of issue exists in following case too(multiple rename):
CREATE f1
RENAME f1 f2
RENAME f2 f1
Solution:
Instead of carrying out "changelogging" at "HASHED volume",
carry out at the "CACHED volume".
This way we have rename operations carried out where actual files are present.
So,Changelog recorded as :
@B1
CREATE f1
RENAME f1 f2
credit: sarumuga@redhat.com
PS: Some of the races as the one below are _NOT_ fixed by this patch
* f1 and f2 exist. B1 and B2 are their respective cached subvols. For
both files hashed-subvol == cached-subvol
* mv f1 f2 on master.
* B1 has change-log entry of rename f1 f2
* rebalance migrates f2 from B1 and B2
* mv f2 f1 on master.
* B2 has change-log entry of rename f2 f1
Since changelog entries (rename f1 f2) and (rename f2 f1) are processed
independently by gsyncds, which of either f1 and f2 survives on slave
is subject to race. Note that on master its file f1 with name f1 which
survived. On slave it can be either file f1 with name f1 or file f2
with name f2 based on who wins the race of processing changelog.
Change-Id: Iebc222f582613924c3a7cba37fb6d3e2d8332eda
BUG: 1141379
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10410
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the patch http://review.gluster.org/#/c/9657
the client pid set by tiering migration was getting over-
written in dht_start_rebalance_task(). Just corrected it
in dht_setxattr() before calling dht_start_rebalance_task()
and removed it from dht_start_rebalance_task().
Change-Id: I37cfa111f83a4e5d498042575c93799f60b49870
BUG: 1217937
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/10502
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
After attaching tier, we have to start tier rebalance process.
This patch is to trigger tier start along with attch-tier.
Change-Id: I39380f95123f0087a82213ef263f9f33adcc5adc
BUG: 1214222
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/10363
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not allow attach or detach tier to work if any of the nodes
is running gluster code < 3.7.
Change-Id: Id9af8f4057f6fad9cb703ec7645bc01eccb11fc1
BUG: 1218287
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10531
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we attach a tier, the hot tier becomes the hashed
subvolume. But directories may not yet have been replicated by
the fix layout process. Hence lookups to those directories
will fail on the hot subvolume. We should only go to the hashed
subvolume once the layout has been fixed. This is known if the
layout for the parent directory does not have an error. If
there is an error, the cold tier is considered the hashed
subvolume. The exception to this rules is ENOCON, in which
case we do not know where the file is and must abort.
Note we may revalidate a lookup for a directory even if the
inode has not yet been populated by FUSE. This case can
happen in tiering (where one tier has completed a lookup
but the other has not, in which case we revalidate one tier
when we call lookup the second time). Such inodes are
still invalid and should not be consulted for validation.
Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523
BUG: 1214289
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10435
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The volume reset network.compression operation cause brick processes to
be restarted. If the volume is already started, a brick process is already
there and the restart will fail, as the brick TCP port is already in use.
Because the new brick process is not started, the volume is left with
no brick online, and the volume stop operation will timeout waiting
for bricks to stop.
Obviosuly we have two bugs here
- If volume reset network.compression needs to restart the bricks, it
should first make sure the previous brick process is terminated
- volume stop should not wait forever for bricks to come back online
This change does not fix the bugs but just makes sure the volume
is stoped before volume reset network.compression, so that the failure
oes not happen.
BUG: 1129939
Change-Id: I9cd5cdc767ef6ee9dd31f2121d672dc3bfdce45f
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10553
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Have added support to send attributes of both entries and
its parent (include oldparent in case of RENAME fop) in the
same notification request to avoid multiple rpc requests.
Also, made changes in gfapi to send parent object and its
attributes changed in a single upcall event.
Change-Id: I92833da3bcec38d65216921c2ce4d10367c32ef1
BUG: 1200262
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-on: http://review.gluster.org/10460
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While doing RMDIR worker gets ENOTEMPTY because same directory will
have files from other bricks which are not deleted since that worker
is slow processing. So geo-rep does recursive_delete.
Recursive delete was done using shutil.rmtree. once started, it will
not check disk_gfid in between. So it ends up deleting the new files
created by other workers. Also if other worker creates files after one
worker gets list of files to be deleted, then first worker will again
get ENOTEMPTY again.
To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA.
And disk_gfid check added for original path for which recursive_delete is
called. This disk gfid check executed before every Unlink/Rmdir. If disk
gfid is not matching with GFID from Changelog, that means other worker
deleted the directory. Even if the subdir/file present, it belongs to
different parent. Exit without performing further deletes.
Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR
failed with ENOENT will not succeed unless parent directory is created
again.
Rsync errors handling was handling unlinked_gfids_list only for one
Changelog, but when processed in batch it fails to detect unlinked_gfids
and retries again. Finally skips the entire Changelogs in that batch.
Fixed this issue by moving self.unlinked_gfids reset logic before batch
start and after batch end.
Most of the Geo-rep races with rm -rf is eliminated with this patch,
but in some cases stale directories left in some bricks and in mount
point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log)
BUG: 1211037
Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/10204
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are registering iobuf_pool with rdma. When rdma transport is
unloaded, we need to deregister all the buffers registered with
rdma. Otherwise iobuf_arena destroy will fail.
Also if rdma.so is loaded again, then register iobuf_pool with
rdma
Change-Id: Ic197721a44ba11dce41e03058e0a73901248c541
BUG: 1200704
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9854
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I99028fe6ff4547f837b0997543234a9020b75607
BUG: 1216067
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/10429
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID: 1293504 (Calling xlator_set_option without checking return value )
CID: 1293502 (Dereferencing a pointer that might be null xl when calling
xlator_set_option)
CID: 1293500 (Assigning value from dict_get_int32(dict, "type", &type)
to ret here, but that stored value is overwritten before
it can be used.)
Change-Id: I5314fb399480df70bd77bc374e3b573f2efd5710
BUG: 1093692
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/10201
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaushal M <kaushal@redhat.com>
|