| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three kinds of inline functions: plain inline, extern inline,
and static inline. All three have been removed from .c files, except
those in "contrib" which aren't our problem. Inlines in .h files, which
are overwhelmingly "static inline" already, have generally been left
alone. Over time we should be able to "lower" these into .c files, but
that has to be done in a case-by-case fashion requiring more manual
effort. This part was easy to do automatically without (as far as I can
tell) any ill effect.
In the process, several pieces of dead code were flagged by the
compiler, and were removed.
Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155
BUG: 1245331
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/11769
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When bitrot is configured on multiple volumes
in a cluster and scrubber-frequency is changed
for one volume, it is resetting frequency for
all other volumes w.r.t to its scrubber-frequency.
This should not happen. Changing scrubber-frequency
should affect only that volume on which it is set.
This patch fixes the issue.
Also restricted the logs to the configure volume.
Change-Id: I90d6e864b131e3d8dd4010079a00f924032f2098
BUG: 1252825
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/11897
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If bad file detected by scrubber then scrubber should log that bad
file as a ALERT message in scrubber log.
Change-Id: I410429e78fd3768655230ac028fa66f7fc24b938
BUG: 1240218
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/11965
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While rescheduling scrub frequency, boot time of
the brick was considered where it is not required
and also delta is calculated using unsigned int
resulting in the loss of fractional part leading to
wrong scrub frequency. Boot time is completely
removed and delta calculation is simplified.
Change-Id: If54697389f663afc86408dc8a01a3ea07e00f2dc
BUG: 1251042
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/11853
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In stub, for fops like readv, writev etc, if the the object is bad, then the fop
is denied. But for checking if the object is bad inode context should be
checked. Now, if the inode context is not there, then the fop is allowed to
continue. This patch fixes it and the fop is unwound with an error, if the inode
context is not found.
Change-Id: I5ea4d4fc1a91387f7f9d13ca8cb43c88429f02b0
BUG: 1243391
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/11449
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia8706ec9b66d78c4e33e7b7faf69f0d113ba68a4
BUG: 1245981
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/11729
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dict_set_bin() is handling the pointer that it passed inconsistently.
Depending on the errors that can occur, the pointer passed to the dict
can be free'd, but there is no guarantee.
It is cleaner to have the caller free the pointer that allocated it and
dict_set_bin() returned an error. When dict_set_bin() returned success,
the given pointer will be free'd when dict_unref() calls data_destroy().
Many callers of dict_set_bin() already take care of free'ing the pointer
on error. The ones that did not, are corrected with this change too.
Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966
BUG: 1242280
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/11638
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass -DDBR_RATE_LIMIT_SIGNER CFLAGS to enable fixed value throttling
based on TBF to rate limit signer. The following messags is dumped
in bitd log file that this change introduces.
[
[Rate Limit Info] "tokens/sec (rate): 131072, maxlimit: 524288"
]
Bug: 1242809
Change-Id: I063e41d4c7bcddd7a940cc175e89536cd4fe2804
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11641
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Which was done at half the set expiry time resulting in actual
IOs incrementing the object version. Now this is done just at
the last moment with re-notification now cut-shorting into
checksum calculation without waiting in the timer-wheel.
BUG: 1242317
Change-Id: If655b77d822ebf7b2a4f65e1b5583dd3609306e7
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11461
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4937a578185ebacd2558cb8e22f130cd10193188
BUG: 1240219
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11547
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
* setxattr and {f}removexattr of versioning, signature and bad-file xattrs are
returned with error.
Change-Id: Ib423466195d1d8e4c6f80c2906a574e21ed624fb
BUG: 1210689
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/11389
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Access to bad objects (especially operations such as open, readv, writev)
should be denied to prevent applications from getting wrong data.
* Do not allow anyone apart from scrubber to set bad object xattr.
* Do not allow bad object xattr to be removed.
Change-Id: Ia9185a067233a9f26e3d41d41d11d9a4eb0da827
BUG: 1210689
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/11126
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Idfd245327b485459ccbda503510b8ca0127bb66c
BUG: 1231619
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11396
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A bunch of command line options for scrubber tempted the use of
state machine to track current state of scrubber under various
circumstances where the options could be in effect.
Change-Id: Id614bb2e6af30a90d2391ea31ae0a3edeb4e0d69
BUG: 1231619
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11149
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses "cleanup, v1" infrastrcuture to cleanup scrubber
(data structures, threads, timers, etc..) on brick disconnection.
Signer is not cleaned up yet: probably would be done as part of
another patch.
Change-Id: I78a92b8a7f02b2f39078aa9a5a6b101fc499fd70
BUG: 1231619
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11148
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a short series of patches (with other cleanups) aimed at
cleaning up some of the incorrect assumptions taken in reconfigure()
leading to crashes when subvolumes are not fully initialized (as
reported here[1] on gluster-devel@). Furthermore, there is some
amount of code cleanup to handle disconnection and cleanup up data
structure (as part of subsequent patch).
[1] http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html
Change-Id: I68ac4bccfbac4bf02fcc31615bd7d2d191021132
BUG: 1231617
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11147
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I83c494f2bb60d29495cd643659774d430325af0a
BUG: 1194640
Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com>
Reviewed-on: http://review.gluster.org/10297
Tested-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I56d5236c37a413046b5766320184047a908f2c8d
BUG: 1231620
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11190
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The need to perform object versioning in the truncate() code path
required an fd to reuse existing versioning infrastructure that's
used by fd based operations (such as writev(), ftruncate(), etc..).
This tempted the use of anonymous fd which was never ever unref()'d
after use resulting in fd and/or memory leak depending on the code
path taken. Versioning resulted in a dangling file descriptor left
open in the filesystem effecting the signing process of a given
object (no release() would be trigerred, hence no signing would be
performed). On the other hand, cases where the object need not be
versioned, the anonymous fd in still ref()'d resulting in memory
leak (NOTE: there's no "dangling" file descriptor in this case).
Change-Id: I29c3d2af9bbc5cd4b8ddf38954080e3c7a44ba61
BUG: 1227996
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11077
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Let bit-rot stub check both on disk ongoing version, signed version xattrs and
the in memory flags in the inode and then decide whether the inode is stale or
not. This information is used by one shot crawler in BitD to decide whether to
trigger the sign for the object or skip it.
NOTE: The above check should be done only for BitD. For scrubber its still the
old way of comparing on disk ongoing version with signed version.
* BitD's one shot crawler should not sign zero byte objects if they do not contain
signature. (Means the object was just created and nothing was written to it).
Change-Id: I6941aefc2981bf79a6aeb476e660f79908e165a8
BUG: 1224611
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/10947
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently bitrot using 120 second waiting time for object to be signed
after all fop's released. This signing waiting time value should be tunable.
Command for changing the signing waiting time will be
#gluster volume bitrot <VOLNAME> signing-time <waiting time value in second>
Change-Id: I89f3121564c1bbd0825f60aae6147413a2fbd798
BUG: 1228680
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/11105
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the absence of mknod() fop implementation in bitrot stub,
further operations that trigger versioning resulted in crashes
as they expect the inode context to be valid. Therefore, this
patch implements mknod() following similar simantics to fops
such as create().
Furthermore, bitrot stub test C program is fixed to stop lying
and validate obj versions according to the versioning protocol.
Change-Id: If76f252577445d1851d6c13c7e969e864e2183ef
BUG: 1221914
Original-Author: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10790
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current signing interface (fsetxattr()) had couple of issues:
One, a signing request (by bitrot daemon) is denied if the version
against which an object is to be signed is unequal to the current
version of the object (cases where another subsequent modification
increments the version). Such request(s) are rejected with EINVAL
sent back to the signer resulting in a bunch of errors (in logs)
reported by bitrot daemon. Although, the object would be eventaully
signed with the version matching the current version, the "lagging"
request should be correctly handled.
Two, more than one signing request could race against each other
with the object getting signed with a version depending on which
request ended up last in the race. Although harmless to some extent,
such a case could end up marking the object's signature as stale
for infinity (if the object is *never* touched) thereby resulting
in scrubber skipping the object during verification.
This patch fixes these issues by ordering signing request(s) and
fixing version comparison checks at the time of signing.
Change-Id: I9fa83dfa3be664ba4db61d7f2edc408f4bde77dd
BUG: 1221938
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10832
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Brick connection was bloated (and not implemented efficiently) with
calls which were not required to be called under lock. This resulted
in starvation of lock by critical code paths. This eventally did not
scale when the number of bricks per volume increases (add-brick and
the likes).
Also, this patch cleans up some of the weird reconnection logic that
added more to the starvation of resources and cleans up uncontrolled
growing of log files.
Change-Id: I05e737f2a9742944a4a543327d167de2489236a4
BUG: 1207134
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/10763
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reimplments existing scrub-frequency mechanism used
to schedule scrubber runs. Existing mechanism uses periodic
sleeps (waking up periodically on minimum granularity) and
performing a number of tracking checks based on counters and
sleep times. This patch does away with all the nifty counters
and uses timer-wheel to schedule scrub runs.
Scheduling changes are peformed by merely calculating the new
expiry time and calling mod_timer() [mod_timer_pending() in
some cases] making the code more debuggable and easier to
follow. This also introduces "hourly" scrubbing tunable as an
aid for testing scrubbing during development/testing cycle.
One could also implement on-demand scrubbing with ease: by
invoking mod_timer() with an expiry of one (1) second, thereby
scheduling a scrub run the very next second.
Change-Id: I6c7c5f0c6c9f886bf574d88c04cde14b76e60a8b
BUG: 1224596
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10893
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors the signing trigger mechanism used by bitrot
daemon as a "catch up" meachanism to sign files which _missed_
signing on the last run either due to bitrot being disabled and
enabled again or if bitrot is enabled for a volume with existing
data.
Existing implementation relies on overloading writev() to trigger
signing which just by the looks sounded dangerous and I hated it
to the core. This change moves all that business to the setxattr
interface thereby keeping the writev path strictly for client
IO.
Why not use IPC fop to trigger signing?
There's a need to access the object's inode to perform various
maintainance operations. inode is not _directly_ accessible in
the IPC fop (although, it can be found via inode_grep() for the
object's GFID - the inode just needs to be pinned in memory,
which is the case if there's an active fd on the inode). This
patch relies on good old technique of overloading fsetxattr()
to do the job instead of using IPC fop.
There are some pretty nice cleanups along the lines of memory
deallocations, unncessary allocations and redundant ref()ing
of structures (such as fd's) provided by this patch. All in
all - much improved code navigation.
Change-Id: Id93fe90b1618802d1a95a5072517dac342b96cb8
BUG: 1224600
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10942
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently scrubber is crawling all the files continuously. It should
crawl files based on the scrubber frequency which user have set.
By default scrubber crawling frequency value will be biweekly.
Change-Id: I5762a92c1e700134cfe4283d1f631904adbfe31d
BUG: 1208131
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/10602
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of open
* This patch brings in the changes where object versioning is done in write and
truncate fops instead of tracking them in open and create fops. This model
works for both regular and anonymous fds. It also removes the race associated
with open calls, create and lookups.
This patch follows the below method for object versioning and notifications:
Before sending writev on the fd, increase the ongoing
version first. This makes anonymous fd write similar to the regular
fd write by having the ongoing version increased before doing the
write.
Do following steps to do versioning:
1) For anonymous fds set the fd context (so that release is invoked) and add
the fd context to the list maintained in the inode context.
For regular fds the above think would have been done in open itself.
2) Increase the on-disk ongoing version
3) Increase the in memory ongoing version and mark inode as non-dirty
3) Once versioning is successfully done send write operation. If
versioning fails, then fail the write fop.
5) In writev_cbk mark inode as modified.
Change-Id: I7104391bbe076d8fc49b68745d2ec29a6e92476c
BUG: 1207979
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/10233
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With logical scan/scrub split, pausing filesystem scrubber is an
override to the thread throttling mechanism, which effectively
throttles "down" number of scrubber threads to zero. This causes
scanner to wait until threads are spawned again (when resumed)
thereby continuing where it left off (since the file tree walk
stack is effectively preserved when the main scanner thread
is waiting for scrubbers to consume scanned entries).
The only catch is when scrubber daemon restarts: file tree walk
stack is lost and scrubbing initiates from root. This is probably
OK for now (can be changed later to persist parent directory
information before entering pause state).
Change-Id: I5109a749b7fccd0f5367765078f46e6522dd32a1
BUG: 1208131
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10521
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces multithreaded filesystem scrubber based
on throttling option configured for a particular volume. The
implementation "logically" breaks scanning and scrubbing with
the number of scrubber threads auto-configured depending upon
the throttle configuration. Scanning (crawling) is left single
threaded (per brick) with entries scrubbed in bulk. On reaching
this "bulk" watermark, scanner waits until entries are scrubbed.
Bricks for a particular volume have a set of thread(s) assigned
for scrubbing, with entries for each brick scrubbed in a round
robin fashion to avoid scrub "stalls" when a brick (out of N
bricks) is under active scrubbing.
This mechanism helps us implement "pause/resume" with ease: all
one need to do is to cleanup scrubber threads and let the main
scanner thread "wait" untill scrubbing is resumed (where the
scrubber thread(s) are spawned again), therefore continuing
where we left off (unless we restart the deamons, where crawl
initiates from root directory again, but I guess that's OK).
[
NOTE:
Throttling is optional for the signer daemon, without which
it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER"
predefined in CFLAGS enables CPU throttling (during checksum
calculation) thereby avoiding high CPU usage.
]
Subsequent patches would introduce CPU throttling during hash
calculation for scrubber.
Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f
BUG: 1207020
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10511
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BitRot daemons (signer & scrubber) are disk/cpu hoggers when left
running full throttle. Checksum calculations (especially SHA family
of hash routines) can be quite CPU intensive. Moreover periodic
disk scans performed by scrubber followed by reading data blocks
for hash calculation (which is also done by signer) generate lot
of heavy IO request(s). This causes interference with actual client
operations (be it a regular client or filesystems daemons such as
self-heal, etc..) and results in degraded system performance.
This patch introduces throttling based on Token Bucket Filtering[1].
It's a well known algorithm for checking (and ensuring) that data
transmission conform to defined limits and generally used in packet
switched networks. Linux control groups (Cgroups) uses a variant[2]
of this algorithm to provide block device IO throttling (cgroup
subsys "blkio": blk-iothrottle).
So, why not just live with Cgroups?
Cgroups is linux specific. We need to have a throttling mechanism
for other supported UNIXes. Moreover, having our own implementation
gives much more finer control in terms of tuning it for our needs
(plus the simplicity of the alogorithm itself).
Ideally, throttling should be a part of server stack (either as a
separate translator or integrated with io-threads) since that's
the point of entry for IO request(s) from *all* client(s). That
way one could selectively throttle IO request(s) based on client
PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc..
(*actual* clients can run full throttle). This implementation
avoids that deliberately (there needs to be a much more smarter
queueing mechanism) and throttles CPU usage for hash calculations.
This patch is just the infrastructure part with no interfaces
exposed to set various throttling values. The tunable selected
here (basically hardcoded) avoids 100% CPU usage during hash
calculation (with some bursts cycles). We'd need much more
intensive test(s) to assign values for various throttling
options (lazy/normal/aggressive).
[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket
Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8
BUG: 1207020
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10307
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*"
NOTE:
With this patch, data on existing volumes would be resigned
(which should be OK as of now since we do not expect many
users as of now :-))
Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10181
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently whatever bitrot/scrubber tunable value user set for one
volume that value is considering for all other volumes also.
Each volume should act on their respective bitrot/scrubber tunable
value.
For handling bitrot/scrubber tunable value independently with respect
to all the volume bitrot and scrubber translator should run seperatly
for each volume.
Change-Id: I1d9379508afe6cfd2f78e3ebf29c829c362d84a9
BUG: 1170075
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/10352
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I761927ea263b4144b851881f25791fda5b794f59
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10381
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If scrubber detect any bad object by mismatching of checksum of scrubber
and signer then log messages shold come as a Alert instead of warning.
Change-Id: I075d80700cbe6182e525a04419a80ab18419ff91
BUG: 1210687
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/10226
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Failing to reset scanning counter causes "incorrect" delay of around
50 seconds per directory entry. This causes scrubber to run extremely
slowly.
[
NOTE: This is a temporary fix. With the introduction of token
bucket based throttling, inducing throttle via sleep()
call would be unneeded.
]
Also, fix logging messages in scrubber to log brick and full path
of the object which is identified/marked as corrupted.
Change-Id: Id501bd15dcdbd8a09613f80f9d84050304740027
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10375
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changelog xlator was capturing bitrot-stub's fsetxattr sent
for versioning. Since it was using the same frame as of the
create fop, there was inconsistency in fop number and gfid
of capturing metadata. So fix is to mark fsetxattr used for
versioning as internal and add internal fop filter in
changelog_fsetxattr.
Change-Id: I51ff468995139838b22bf293a59a0713a92ee7a5
BUG: 1170075
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/10148
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using __attribute__ ((__packed__)) for object signature xattr
saves some bytes (7 bytes to be particular) occupied by the
extended attribute on-disk as compared to the unpacked format.
Change-Id: I91a6a0a54aa60e6fd8c357d72f7601b6ed213f2d
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10161
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for xdata in both the
request and response path of syncops.
Few calls like lookup already had the support;
have renamed variables in few places to maintain
uniformity.
xdata passed downwards is known as xdata_in
and xdata passed upwards is known as xdata_out.
There is an old patch by Jeff Darcy at
http://review.gluster.org/#/c/8769/3 which does the
same for some selected calls. It also brings in
xdata support at gfapi level.
xdata support at gfapi level would be introduced
in subsequent patches.
Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc
BUG: 1158621
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/9859
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Missing "bit-rot-object-version.h" causing devrpm failures.
Change-Id: I5af326c5871cc468a10dece4772b29eda06c4fa9
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10160
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a handful of problem with scrubber which
are detailed below.
Scrubber used to skip objects for verification due to missing
fd iterface to fetch versioning extended attributes. Similar
to the inode interface, an fd based interface in POSIX is now
introduced.
Moreover, this patch also fixes potential false reporting by
scrubber due to:
An object gets dirtied and signed when scrubber is busy
calculatingobject checksum. This is fixed by caching the
signed version when an object is first inspected for
stalenes, i.e., during pre-compute stage. This version is
used to verify checksum in the post-compute stage when the
signatures are compared for possible corruption.
Side effect of _not_ sending signature length during signing
resulted in "truncated" signature to be set for an object.
Now, at the time of signing, the signature length is sent
and is used in place of invoking strlen() to get signature
length (which could have possible 00s). The signature length
itself is not persisted in the signature xattr, but is
calculated on-the-fly by substracting the xattr length by
the "structure" header size.
Some of the log entries are made more meaningful (as and aid
for debugging).
Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269
BUG: 1207624
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10118
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces basic object versioning test(s) which
is required for bitrot detection to work correctly. Basic
test(s) such as opening a file in read-only mode, single
open, multiple open()s are covered on FUSE mount _only_ as
stub does not support anonymous fds yet. For this reason,
the test case disables open-behind.
Actual verification is implemented as a C source which
makes use of the same on-disk data structures as used by
the stub code. The data structures are moved to separate
header file which is included by the test script. Such
modularization helps in future enhancements to keep the
version "data type" opaque and provide handful of APIs
version checking (equal/greater/etc..).
[
This is just a start and should grow over time as stub
is enhanced and codebase matures.
]
Change-Id: Ibee20e65a15b56bbdd59fd2703f9305b115aec7a
BUG: 1201724
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10140
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
.. and potential bug fixes / memleak.
While assigning initial version to an object, both extended attributes
(namely, ongoing version and the default signing version) were persisted.
This is optimized to just persist the ongoing version along with safe
handling of xattr request(s) in it's absence. This is better than the
earlier approach as the two xattr sets were not atomic anyway (allowing
a request to sneak in between between two set operations). This also
allows to perform sanity checks on objects during lookup()/getxattr():
objects with missing ongoing version but presence of signature are
possible candidates of tampering (and catching implementation bugs).
There were couple of instances in the code where versioning xattrs
were incorrectly removed before in-memory versions were initialized,
which have been fixed with this patch. A memory leak in the IPC code
path is also fixed.
Change-Id: I01c690ccfe7156a883582275f40f79a7c10c0900
BUG: 1207054
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10117
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterfs relies on Linux uuid implementation, which
API is incompatible with most other systems's uuid. As
a result, libglusterfs has to embed contrib/uuid,
which is the Linux implementation, on non Linux systems.
This implementation is incompatible with systtem's
built in, but the symbols have the same names.
Usually this is not a problem because when we link
with -lglusterfs, libc's symbols are trumped. However
there is a problem when a program not linked with
-lglusterfs will dlopen() glusterfs component. In
such a case, libc's uuid implementation is already
loaded in the calling program, and it will be used
instead of libglusterfs's implementation, causing
crashes.
A possible workaround is to use pre-load libglusterfs
in the calling program (using LD_PRELOAD on NetBSD for
instance), but such a mechanism is not portable, nor
is it flexible. A much better approach is to rename
libglusterfs's uuid_* functions to gf_uuid_* to avoid
any possible conflict. This is what this change attempts.
BUG: 1206587
Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10017
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity fixes:
CID 1124725
CID 1291742
The problem is that gf_tw_cleanup_timers() frees the handed
in priv->timer_wheel but it can not set the pointer to NULL,
so subsequent checks for priv->timer_wheel show it as not NULL
and allow for access after free.
The proper change might be to change gf_tw_cleanup_timers() to
take a reference to the pointer and set it to NULL after free,
but since it is under contrib/, I did not want to change that
function. Instead this patch uses the function's return code
which was not used previously. (Maybe this should even be done
in a wrapper macro or function?)
Change-Id: I31d80d3df2e4dc7503d62c7819429e1a388fdfdd
BUG: 789278
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-on: http://review.gluster.org/10056
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes Coverity CIDs 1291728, 1291723, 1291732.
Change-Id: I62f3d540cac0f555fe2839b8418e59691c3ff4fd
BUG: 789278
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-on: http://review.gluster.org/10055
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scrubber performs signature verification for objects that were
signed by signer. This is done by recalculating the signature
(using the hash algorithm the object was signed with) and
verifying it aginst the objects persisted signature. Since the
object could be undergoing IO opretaion at the time of hash
calculation, the signature may not match objects persisted
signature. Bitrot stub provides additional information about
the stalesness of an objects signature (determinted by it's
versioning mechanism). This additional bit of information is
used by scrubber to determine the staleness of the signature,
and in such cases the object is skipped verification (although
signature staleness is performed twice: once before initiation
of hash calculation and another after it (an object could be
modified after staleness checks).
The implmentation is a part of the bitrot xlator (signer) which
acts as a signer or scrubber based on a translator option. As
of now the scrub process is ever running (but has some form of
weak throttling mechanism during filesystem scan). Going forward,
there needs to be some form of scrub scheduling and IO throttling
(during hash calculation) tunables (via CLI).
Change-Id: I665ce90208f6074b98c5a1dd841ce776627cc6f9
BUG: 1170075
Original-Author: Raghavendra Bhat <rabhat@redhat.com>
Original-Author: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9914
Tested-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the "Signer" -- responsible for signing files with their
checksums upon last file descriptor close (last release()).
The event notification facility provided by the changelog xlator
is made use of.
Moreover, checksums are as of now SHA256 hash of the object data
and is the only available hash at this point of time. Therefore,
there is no special "what hash to use" type check, although it's
does not take much to add various hashing algorithms to sign
objects with. Signatures are stored in extended attributes of the
objects along with the the type of hashing used to calculate the
signature. This makes thing future proof when other hash types
are added. The signature infrastructure is provided by bitrot
stub: a little piece of code that sits over the POSIX xlator
providing interfaces to "get or set" objects signature and it's
staleness.
Since objects are signed upon receiving release() notification,
pre-existing data which are "never" modified would never be
signed. To counter this, an initial crawler thread is spawned
The crawler scans the entire brick for objects that are unsigned
or "missed" signing due to the server going offline (node reboots,
crashes, etc..) and triggers an explicit sign. This would also
sign objects when bit-rot is enabled for a volume and/or after
upgrade.
Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b
BUG: 1170075
Original-Author: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9711
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Implement the skeleton of bit-rot xlator.
Original-Author: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Signed-off-by: Anand Nekkunti <anekkunt@redhat.com>
Change-Id: If33218bdc694f5f09cb7b8097c4fdb74d7a23b2d
BUG: 1170075
Reviewed-on: http://review.gluster.org/9710
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|