| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements syncop equivalent for cluster of xlators. The xlators on
which the fop needs to be performed is taken in input arguments to the
functions and the responses are gathered and provided as the output.
This idea is taken from afr-v2 self-heal implementation by Avati.
Change-Id: I2b568f4340cf921a65054b8ab0df7edc4478b5ca
BUG: 1213358
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10240
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I056c9777b242a11af7f576ad19b2db93dbdf82d4
BUG: 1215117
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10367
Reviewed-by: Poornima G <pgurusid@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use global_xlator for allocations so that we don't try to free objects
belonging to an already-deleted translator (which will crash).
Change-Id: Ie72a546e7770cf5cb8a8370e22448c8d09e3ab37
BUG: 1212660
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/10319
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with the inode quota feature, quota size is now
increased from 64bit to 192bits which contains
values of 'file size', 'file count' and 'dir count'
This change in quota size xattr needs to be handled
in disperse xattr aggregation
Signed-off-by: vmallika <vmallika@redhat.com>
Change-Id: I5fd28aa9f5b8b6cba83a98360236417a97ac16ee
BUG: 1207967
Reviewed-on: http://review.gluster.org/10112
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Sachin Pandit <spandit@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transaction peer lists were used in GlusterD to peers belonging to a
transaction. This was needed to prevent newly added peers performing
partial transactions, which could be incorrect.
This was accomplished by creating a seperate transaction peers list at
the beginning of every transaction. A transaction peers list referenced
the peerinfo data structures of the peers which were present at the
beginning of the transaction. RCU protection of peerinfos referenced by
the transaction peers list is a hard problem and difficult to do
correctly.
To have proper RCU protection of peerinfos, the transaction peers lists
have been replaced by an alternative method to identify peers that
belong to a transaction. The alternative method is to the global peers
list along with generation numbers to identify peers that should belong
to a transaction.
This change introduces a global peer list generation number, and a
generation number for each peerinfo object. Whenever a peerinfo object
is created, the global generation number is bumped, and the peerinfos
generation number is set to the bumped global generation.
With the above changes, the algorithm to identify peers belonging to a
transaction with RCU protection is as follows,
- At the beginning of a transaction, the current global generation
number is saved
- To identify if a peers belonging to the transaction,
- Start a RCU read critical section
- For each peer in the global peers list,
- If the peers generation number is not greater than the saved
generation number, continue with the action on the peer
- End the RCU read critical section
The above algorithm guarantees that,
- The peer list is not modified when a transaction is iterating through
it
- The transaction actions are only done on peers that were present when
the transaction started
But, as a transaction could iterate over the peers list multiple times,
the algorithm cannot guarantee that same set of peers will be selected
every time. A peer could get deleted between two iterations of the list
within a transaction. This problem existed with transaction peers list
as well, but unlike before now it will not lead to invalid memory access
and potential crashes. This problem will be addressed seprately.
This change was developed on the git branch at [1]. This commit is a
combination of the following commits on the development branch.
52ded5b Add timespec_cmp
44aedd8 Add create timestamp to peerinfo
7bcbea5 Fix some silly mistakes
13e3241 Add start time to opinfo
17a6727 Use timestamp comparisions to identify xaction peers instead
of a xaction peer list
3be05b6 Correct check for peerinfo age
70d5b58 Use read-critical sections for peer list iteration
ba4dbca Use peerinfo timestamp checks in op-sm instead of xaction peer
list
d63f811 Add more peer status checks when iterating peers list in
glusterd-syncop
1998a2a Timestamp based peer list traversal of mgmtv3 xactions
f3c1a42 Remove transaction peer lists
b8b08ee Remove unused labels
32e5f5b Remove 'npeers' usage
a075fb7 Remove 'npeers' from mgmt-v3 framework
12c9df2 Use generation number instead of timestamps.
9723021 Remove timespec_cmp
80ae2c6 Remove timespec.h include
a9479b0 Address review comments on 10147/4
[1]: https://github.com/kshlm/glusterfs/tree/urcu
Change-Id: I9be1033525c0a89276f5b5d83dc2eb061918b97f
BUG: 1205186
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/10147
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Linux systems we should use the libuuid from the distribution and not
bundle and statically link the contrib/uuid/ bits.
libglusterfs/src/compat-uuid.h has been introduced and should become an
abstraction layer for different UUID APIs. Non-Linux operating systems
should implement their compatibility layer there.
Once all operating systems have an implementation in compat-uuid.h, we
can remove contrib/uuid/ from the repository completely.
Change-Id: I345e5357644be2521685e00358bb8c83c4ea0577
BUG: 1206587
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10129
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quota hard-limit is supported only upto: 9223372036854775807 (int 64)
In CLI, it is allowed to set the value upto 16384PB (unsigned int 64),
this is not a valid value as the xattrop for quota accounting and
the quota enforcer operates on a signed int64 limit value.
This patches fixes the problem in CLI and allows user to set
the hard-limit value only from range 0 - 9223372036854775807
Change-Id: Ifce6e509e1832ef21d3278bacfa5bd71040c8cba
BUG: 1206432
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/10022
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ctx is passed to gf_log_inject_timer_event() and pass through to
__gf_log_inject_timer_event() where the struct members are getting
dereferenced, and can cause crash if the passed ctx is null. This patch
avoids the issue.
Change-Id: I153dbb5d3744898429139e3d40bb4f0e9093632a
BUG: 1208118
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/10102
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for xdata in both the
request and response path of syncops.
Few calls like lookup already had the support;
have renamed variables in few places to maintain
uniformity.
xdata passed downwards is known as xdata_in
and xdata passed upwards is known as xdata_out.
There is an old patch by Jeff Darcy at
http://review.gluster.org/#/c/8769/3 which does the
same for some selected calls. It also brings in
xdata support at gfapi level.
xdata support at gfapi level would be introduced
in subsequent patches.
Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc
BUG: 1158621
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/9859
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I3a47cdd06595c87da8e822d11683d68b43c11cda
BUG: 1194640
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/9945
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix will solve the heating of the files during the promotion
or demotion.
Promotion:
~~~~~~~~~
When a file gets promoted it get the current time stamp
during creation only, but following writes or reads during the
migration wont heat the file.
Demotion:
~~~~~~~~
When a file gets demoted it get the wind/unwind time stamp is set to
zero. The following writes or reads during the migration wont heat
the file.
What is remaining ?
~~~~~~~~~~~~~~~~~
Bug 1209129 ( https://bugzilla.redhat.com/show_bug.cgi?id=1209129 )
Inspite of this fix there is still a issue remaining, i.e the heat of
the file is not keep intact during a internal rebalance activity i.e
a rebalance within a tier.
Change-Id: I01e82dc226355599732d40e699062cee7960b0a5
BUG: 1207867
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/10080
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a handful of problem with scrubber which
are detailed below.
Scrubber used to skip objects for verification due to missing
fd iterface to fetch versioning extended attributes. Similar
to the inode interface, an fd based interface in POSIX is now
introduced.
Moreover, this patch also fixes potential false reporting by
scrubber due to:
An object gets dirtied and signed when scrubber is busy
calculatingobject checksum. This is fixed by caching the
signed version when an object is first inspected for
stalenes, i.e., during pre-compute stage. This version is
used to verify checksum in the post-compute stage when the
signatures are compared for possible corruption.
Side effect of _not_ sending signature length during signing
resulted in "truncated" signature to be set for an object.
Now, at the time of signing, the signature length is sent
and is used in place of invoking strlen() to get signature
length (which could have possible 00s). The signature length
itself is not persisted in the signature xattr, but is
calculated on-the-fly by substracting the xattr length by
the "structure" header size.
Some of the log entries are made more meaningful (as and aid
for debugging).
Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269
BUG: 1207624
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/10118
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch resolves tiering translator issues taken from the list
in bug 1203776. These issues have been selected to be fixed
first. The rest will be fixed in a subsequent patch (or are not a
problem).
3. Replace hardcoded #defines of promote/demote file names
6. Use loc_wipe() in migrate_using_query_file()
9. Only promote/demote files on the same node on which they reside.
14. Replace calloc with GF_CALLOC in tier.c and ensure freeing done
properly.
15. Handle if parse_query_str fails
22. Only load gfdb library on server side, remove SQL references
from client.
Change-Id: I6563b11e58ab2e4c6b1ce44db755781ad6d930fb
BUG: 1203776
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9987
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterfs relies on Linux uuid implementation, which
API is incompatible with most other systems's uuid. As
a result, libglusterfs has to embed contrib/uuid,
which is the Linux implementation, on non Linux systems.
This implementation is incompatible with systtem's
built in, but the symbols have the same names.
Usually this is not a problem because when we link
with -lglusterfs, libc's symbols are trumped. However
there is a problem when a program not linked with
-lglusterfs will dlopen() glusterfs component. In
such a case, libc's uuid implementation is already
loaded in the calling program, and it will be used
instead of libglusterfs's implementation, causing
crashes.
A possible workaround is to use pre-load libglusterfs
in the calling program (using LD_PRELOAD on NetBSD for
instance), but such a mechanism is not portable, nor
is it flexible. A much better approach is to rename
libglusterfs's uuid_* functions to gf_uuid_* to avoid
any possible conflict. This is what this change attempts.
BUG: 1206587
Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10017
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
iobuf_get will be creating the iobuf and
hence lock is not necessary to increment ref.
However, it is a good practice to call
iobuf_ref instead of __iobuf_ref so that
we have a single point to get refs and
this can be used later to do mem
accounting etc.
Change-Id: I1fd328c3c463c23fd5f6df505ccb5c86f6207f28
BUG: 1199075
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/9812
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity CID 1288820
strncpy executed with a limit equal to the target array
size potentially leaves the target string not null terminated.
Make sure the copied string is a valid 0 terminated string.
Change-Id: I39ff6a64ca5b9e30562226dd34c5b06267b75b87
BUG: 789278
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-on: http://review.gluster.org/10063
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Poornima G <pgurusid@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Joseph Fernandes <josferna@redhat.com>
Tested-by: Joseph Fernandes <josferna@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I208289aae2423e4bb015cf33bafd2a961e1c3fc6
BUG: 1197593
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/9779
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Sachin Pandit <spandit@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID : 1124884
Change-Id: I3332e844a01c1432f1d80a6acda7a87e8b01801c
BUG: 789278
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/9677
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This also lists the files that are on-going I/O, which
will be fixed later.
Change-Id: Ib3f60a8b7e8798d068658cf38eaef2a904f9e327
BUG: 1203581
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10020
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Query fix in find_changed_with_freq()
2) Volume option typo fix for write_freq_threshold
and read_freq_threshold
Change-Id: I38e154818178aab412b2d7b2914cd29acef66ffb
BUG: 1207343
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/10050
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
We've observed that glusterd was OOM killed after some minutes when volume set
command was run in a loop.
Analysis:
Initially the suspection was in glusterd code, but a deep dive into the codebase
revealed that while validating all the options as part of graph reconfiguration
at the time of freeing up the xlator object its one of the member mem_acct is
left over which causes memory leak.
Solution:
Free up xlator's mem_acct.rec in xlator_destroy ()
Change-Id: Ie9e7267e1ac4ab7b8af6e4d7c6660dfe99b4d641
BUG: 1201203
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/9862
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Debugging where memory gets free'd with help from overwriting the memory
before it is free'd with some structures (repeatedly). The struct
mem_invalid starts with a magic value (0xdeadc0de), followed by a
pointer to the xlator, the mem-type. the size of the GF_?ALLOC()
requested area and the baseaddr pointer to what GF_?ALLOC() returned.
With these details, and the 'struct mem_header' that is placed when
calling GF_?ALLOC(), it is possible to identify overruns and possible
use-after-free. A memory dump (core) or running with a debugger is
needed to read the surrounding memory of corrupt structures.
This additional memory invalidation/poisoning needs to be enabled by
passing --enable-debug to ./configure.
Change-Id: I9f5f37dc4b5b59142adefc90897d32e89be67b82
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10019
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current glfs_new() function is not flexible enough to error out
and destroy the struct members or objects initialized just before the
error path/condition. This make the structs or objects to continue or
left out with partially recorded data in fs and ctx structs and cause
crashes/issues later in the code path. This patch avoid the issue.
Change-Id: Ie4514b82b24723a46681cc7832a08870afc0cb28
BUG: 1202492
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/9903
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Poornima G <pgurusid@redhat.com>
Reviewed-by: soumya k <skoduri@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
http://review.gluster.org/9269 addresses maintaining local xaction_peers in
syncop and mgmt_v3 framework. This patch is to maintain local xaction_peers list
for op-sm framework as well.
Change-Id: Idd8484463fed196b3b18c2df7f550a3302c6e138
BUG: 1204727
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/9972
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scrubber performs signature verification for objects that were
signed by signer. This is done by recalculating the signature
(using the hash algorithm the object was signed with) and
verifying it aginst the objects persisted signature. Since the
object could be undergoing IO opretaion at the time of hash
calculation, the signature may not match objects persisted
signature. Bitrot stub provides additional information about
the stalesness of an objects signature (determinted by it's
versioning mechanism). This additional bit of information is
used by scrubber to determine the staleness of the signature,
and in such cases the object is skipped verification (although
signature staleness is performed twice: once before initiation
of hash calculation and another after it (an object could be
modified after staleness checks).
The implmentation is a part of the bitrot xlator (signer) which
acts as a signer or scrubber based on a translator option. As
of now the scrub process is ever running (but has some form of
weak throttling mechanism during filesystem scan). Going forward,
there needs to be some form of scrub scheduling and IO throttling
(during hash calculation) tunables (via CLI).
Change-Id: I665ce90208f6074b98c5a1dd841ce776627cc6f9
BUG: 1170075
Original-Author: Raghavendra Bhat <rabhat@redhat.com>
Original-Author: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9914
Tested-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the "Signer" -- responsible for signing files with their
checksums upon last file descriptor close (last release()).
The event notification facility provided by the changelog xlator
is made use of.
Moreover, checksums are as of now SHA256 hash of the object data
and is the only available hash at this point of time. Therefore,
there is no special "what hash to use" type check, although it's
does not take much to add various hashing algorithms to sign
objects with. Signatures are stored in extended attributes of the
objects along with the the type of hashing used to calculate the
signature. This makes thing future proof when other hash types
are added. The signature infrastructure is provided by bitrot
stub: a little piece of code that sits over the POSIX xlator
providing interfaces to "get or set" objects signature and it's
staleness.
Since objects are signed upon receiving release() notification,
pre-existing data which are "never" modified would never be
signed. To counter this, an initial crawler thread is spawned
The crawler scans the entire brick for objects that are unsigned
or "missed" signing due to the server going offline (node reboots,
crashes, etc..) and triggers an explicit sign. This would also
sign objects when bit-rot is enabled for a volume and/or after
upgrade.
Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b
BUG: 1170075
Original-Author: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9711
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: If54f4be7db8b6f98e65570b09c07251e21ebae15
BUG: 1194640
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/9837
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bitrot stub implements object versioning required for identifying
signature freshness. More details about versioning is explained
as a part of the "bitrot feature documentation" patch.
Change-Id: I2ad70d9eb109ba4a12148ab8d81336afda529ad9
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9709
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Certain translators may require to update the inode context
of an already linked inode before unwinding the call to the
client. Normally, such a case in encountered during parallel
operations when a fresh inode is chosen at call (wind) time.
In the callback path, one of inodes is successfully linked
in the inode table, thereby the other inodes being thrown
away (and the inode pointers for these calls being pointed
to the linked inode).
Translators which may have strict dependency on the correct
value in the inode context would get stale values in inode
context. This patch introduces a new callback which provides
gives translators an opportunity to "patch" their respective
inode contexts. Note that, as of now, this callback is only
invoked during create()s unwind path. Although this might
needed to be done for all dentry fops and lookup, but let
that be done as an when required (bitrot stub requires
this *only* for create()).
Change-Id: I6cd91c2af473c44d1511208060d3978e580c67a6
BUG: 1170075
Original-Author: Raghavendra Bhat <rabhat@redhat.com>
Original-Author: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9913
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Translators which wish to send event notifications can send
"down" an IPC FOP with op_type as GF_IPC_TARGET_CHANGELOG
and xdata carrying event structures (changelog_event_t).
Change-Id: I0e5f8c9170161c186f0e58d07105813e34e18786
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9775
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tier translator shares most of DHT's code. It differs in how
subvolumes are chosen for I/Os, and how file migration (cache promotion
and demotion) is managed. That different functionality is split to either
DHT or tier logic according to the "tier_methods" structure.
A cache promotion and demotion thread is created in a manner
similar to the rebalance daemon. The thread operates a timing
wheel which periodically checks for promotion and demotion candidates
(files). Candidates are queued and then migrated. Candidates must exist on
the same node as the daemon and meet other critera per caching policies.
This patch has two authors (Dan Lambright and Joseph Fernandes). Dan
did the DHT changes and Joe wrote the cache policies. The fix depends on
DHT readidr changes and the database library which have been submitted
separately. Header files in libglusterfs/src/gfdb should be reviewed in
patch 9683.
For more background and design see the feature page [1].
[1]
http://www.gluster.org/community/documentation/index.php/Features/data-classification
Change-Id: Icc26c517ccecf5c42aef039f5b9c6f7afe83e46c
BUG: 1194753
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9724
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Id9a7d0f457d9759ab7d0a52a4000b5ae36d211f8
BUG: 1194753
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/9946
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Part 2/2 patch to enable users analyze and resolve
split-brain.
This patch enables :
1) Users to inspect the files in data and metadata split-brain.
2) Resolve the split-brain.
Both using a series of setfattr commands.
Consider a volume "test" with 2 bricks.
1) To inspect a file f1:
setfattr -n replica.split-brain-choice -v test-client-0 f1
After the execution of this command, if no read_subvol
is found, reads will be served from test-client-0 (corresponding
to brick-0).
2) To resolve split-brain :
setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1
Execution of this command will lead to the resolution
of data and metadata split-brain with subvol mentioned in the
command (test-client-0 here) as the source and the rest as sink.
Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179
BUG: 1191396
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/9743
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**********************************************************************
ChangeTimeRecorder(CTR) Xlator |
**********************************************************************
ChangeTimeRecorder(CTR) is server side xlator(translator) which sits
just above posix xlator. The main role of this xlator is to record the
access/write patterns on a file residing the brick. It records the
read(only data) and write(data and metadata) times and also count on
how many times a file is read or written. This xlator also captures
the hard links to a file(as its required by data tiering to move
files).
CTR Xlator is the consumer of libgfdb.
To Enable/Disable CTR Xlator:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gluster volume set <volume-name> features.ctr-enabled {on/off}
To Enable/Disable Frequency Counter Recording in CTR Xlator:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gluster volume set <volume-name> features.record-counters {on/off}
Change-Id: I5d3cf056af61ac8e3f8250321a27cb240a214ac2
BUG: 1194753
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/9935
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NFS now has the ability to use a separate file for "netgroups" and
"exports". An administrator should have the ability to check the
validity of the files before applying the configuration.
The "glusterfsd" command now has the following additional arguments that
can be used to check the configuration:
--print-netgroups: Validate the netgroups file and print it out
--print-exports: Validate the exports file and print it out
BUG: 1143880
Change-Id: I24c40d50110d49d8290f9fd916742f7e4d0df85f
URL: http://www.gluster.org/community/documentation/index.php/Features/Exports_Netgroups_Authentication
Original-author: Shreyas Siravara <shreyas.siravara@gmail.com>
CC: Richard Wareing <rwareing@fb.com>
CC: Jiffin Tony Thottan <jthottan@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/9365
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch imports timer-wheel[1] algorithm from the linux
kernel (~/kernel/time/timer.c) with some modifications.
Timer-wheel is an efficent way to track millions of timers for
expiry. This is a variant of the simple but RAM heavy approach
of having a list (timer bucket) for every future second.
Timer-wheel categorizes every future second into a logarithmic
array of arrays. This is done by splitting the 32 bit "timeout"
value into fixed "sliced" bits, thereby each category has a
fixed size array to which buckets are assigned.
A classic split would be 8+6+6+6 (used in this patch) which
results in 256+64+64+64 == 512 buckets. Therefore, the entire
32 bit futuristic timeouts have been mapped into 512 buckets.
[
NOTE:
There are other possible splits, such as "8+8+8+8", but
this patch sticks to the widely used and tested default.
]
Therfore, the first category "holds" timers whose expiry range
is between 1..256, the next cateogry holds 257..16384, third
category 16385..1048576 and so on. When timers are added,
unless it's in the first category, timers with different
timeouts could end up in the same bucket. This means that the
timers are "partially sorted" -- sorted in their highest bits.
The expiry code walks the first array of buckets and exprires
any pending timers (1..256). Next, at time value 257, timers
in the first bucket of the second array is "cascaded" onto
the first category and timers are placed into respective
buckets according to the thier timeout values. Cascading
"brings down" the timers timeout to the coorect bucket
of their respective category. Therefore, timers are sorted
by their highest bits of the timeout value and then by the
lower bits too.
[1] https://lwn.net/Articles/152436/
Change-Id: I1219abf69290961ae9a3d483e11c107c5f49c4e3
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9707
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces rotational buffers aiming at the classic
multiple producer and multiple consumer problem. A fixed set
of buffer list is allocated during initialization, where each
list consist of a list of buffers. Each buffer is an iovec
pointing to a memory region of fixed allocation size. Multiple
producers write data to these buffers. A buffer list starts with
a single buffer (iovec) and allocates more when required (although
this can be preallocatd in multiples of k).
rot-buffs allow multiple producers to write data parallely with
a bit of extra cost of taking locks. Therefore, it's much suited
for large writes. Multiple producers are allowed to write in the
buffer parallely by "reserving" write space for selected number
of bytes and returning pointer to the start of the reserved area.
The write size is selected by the producer before it starts the
write (which is often known). Therefore, the write itself need not
be serialized -- just the space reservation needs to be done safely.
The other part is when a consumer kicks in to consume what has
been produced. At this point, a buffer list switch is performed.
The "current" buffer list pointer is safely pointed to the next
available buffer list. New writes are now directed to the just
switched buffer list (the old buffer list is now considered out
of rotation). Note that the old buffer still may have producers
in progress (pending writes), so the consumer has to wait till
the writers are drained. Currently this is the slow path for
producers (write completion) and needs to be improved.
Currently, there is special handling for cases where the number
of consumers match (or exceed) the number of producers, which
could result in writer starvation. In this scenario, when a
consumers requests a buffer list for consumption, a check is
performed for writer starvation and consumption is denied
until at least another buffer list is ready of the producer
for writes, i.e., one (or more) consumer(s) completed, thereby
putting the buffer list back in rotation.
[
NOTE:
I've not performance tested this producer-consumer model
yet. It's being used in changelog for event notification.
The list of buffers (iovecs) are directly passed to RPC
layer.
]
Change-Id: I88d235522b05ab82509aba861374a2312bff57f2
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9706
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
==========================================================================
Inode quota
==========================================================================
= Currently, the only way to retrieve the number of files/objects in a =
= directory or volume is to do a crawl of the entire directory/volume. =
= This is expensive and is not scalable. =
= =
= The proposed mechanism will provide an easier alternative to determine =
= the count of files/objects in a directory or volume. =
= =
= The new mechanism proposes to store count of objects/files as part of =
= an extended attribute of a directory. Each directory's extended =
= attribute value will indicate the number of files/objects present =
= in a tree with the directory being considered as the root of the tree. =
= =
= The count value can be accessed by performing a getxattr(). =
= Cluster translators like afr, dht and stripe will perform aggregation =
= of count values from various bricks when getxattr() happens on the key =
= associated with file/object count. =
A new interface is introduced:
------------------------------
limit-objects : limit the number of inodes at directory level
list-objects : list the directories where the limit is set
remove-objects : remove the limit from the directory
==========================================================================
CLI COMMAND:
gluster volume quota <volname> limit-objects <path> <number> [<percent>]
* <number> is a hard-limit for number of objects limitation for path "<path>"
If hard-limit is exceeded, creation of file/directory is no longer
permitted.
* <percent> is a soft-limit for number of objects creation for path "<path>"
If soft-limit is exceeded, a warning is issued for each creation.
CLI COMMAND:
gluster volume quota <volname> remove-objects [path]
==========================================================================
CLI COMMAND:
gluster volume quota <volname> list-objects [path] ...
Sample output:
------------------
Path Hard-limit Soft-limit Used Available
Soft-limit exceeded?
Hard-limit exceeded?
------------------------------------------------------------------------
--------------------------------------
/dir 10 80% 10 0
Yes
Yes
==========================================================================
[root@snapshot-28 dir]# ls
a b file11 file12 file13 file14 file15 file16 file17
[root@snapshot-28 dir]# touch a1
touch: cannot touch `a1': Disk quota exceeded
* Nine files are created in directory "dir" and directory is included in
* the
count too. Hence the limit "10" is reached and further file creation
fails
==========================================================================
Note: We have also done some re-factoring in cli for volume name
validation. New function cli_validate_volname is created
==========================================================================
Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b
BUG: 1190108
Signed-off-by: Sachin Pandit <spandit@redhat.com>
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/9769
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
*************************************************************************
Libgfdb |
*************************************************************************
Libgfdb provides abstract mechanism to record extra/rich metadata
required for data maintenance, such as data tiering/classification.
It provides consumer with API for recording and querying, keeping
the consumer abstracted from the data store used beneath for storing data.
It works in a plug-and-play model, where data stores can be plugged-in.
Presently we have plugin for Sqlite3. In the future will provide recording
and querying performance optimizer. In the current implementation the schema
of metadata is fixed.
Schema:
~~~~~~
GF_FILE_TB Table:
~~~~~~~~~~~~~~~~~
This table has one entry per file inode. It holds the metadata required to
make decisions in data maintenance.
GF_ID (Primary key) : File GFID (Universal Unique IDentifier in the namespace)
W_SEC, W_MSEC : Write wind time in sec & micro-sec
UW_SEC, UW_MSEC : Write un-wind time in sec & micro-sec
W_READ_SEC, W_READ_MSEC : Read wind time in sec & micro-sec
UW_READ_SEC, UW_READ_MSEC : Read un-wind time in sec & micro-sec
WRITE_FREQ_CNTR INTEGER : Write Frequency Counter
READ_FREQ_CNTR INTEGER : Read Frequency Counter
GF_FLINK_TABLE:
~~~~~~~~~~~~~~
This table has all the hardlinks to a file inode.
GF_ID : File GFID (Composite Primary Key)``|
GF_PID : Parent Directory GFID (Composite Primary Key) |-> Primary Key
FNAME : File Base Name (Composite Primary Key)__|
FPATH : File Full Path (Its redundant for now, this will go)
W_DEL_FLAG : This Flag is used for crash consistancy, when a link is unlinked.
i.e Set to 1 during unlink wind and during unwind this record
is deleted
LINK_UPDATE : This Flag is used when a link is changed i.e rename.
Set to 1 when rename wind and set to 0 in rename unwind
Libgfdb API:
~~~~~~~~~~~
Refer libglusterfs/src/gfdb/gfdb_data_store.h
Change-Id: I2e9fbab3878ce630a7f41221ef61017dc43db11f
BUG: 1194753
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/9683
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
position in the graph rather than relative (local) to a particular
translator.
Encoding the volume in this way allows a single translator to manage
which brick is currently being scanned for directory entries. Using a
single translator minimizes allocated bits in the d_off. It also allows
multiple DHT translators in the same graph to have a common frame of
reference (the graph position) for which brick is being read. Multiple
DHT translators are needed for the Tiering feature.
The fix builds off a previous change (9332) which removed subvolume
encoding from AFR. The fix makes an equivalent change to the EC
translator.
More background can be found in fix 9332 and gluster-dev discussions [1].
DHT and AFR/EC are responsibile (as before) for choosing which brick to
enumerate directory entries in over the readdir lifecycle.
The client translator receiving the readdir fop encodes the dht_t. It
is referred to as the "leaf node" in the graph and corresponds to the
brick being scanned.
When DHT decodes the d_off, it translates the leaf node to a local
subvolume, which represents the next node in the graph leading to
the brick.
Tracking of leaf nodes is done in common utility functions. Leaf nodes
counts and positional information are updated on a graph switch.
[1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8
BUG: 1190734
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9688
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The crash is seen when, glfs_init failed for some reason
and glfs_fini was called for cleaning up the partial initialization.
The fix is in two folds:
1. In timer store and restore the THIS, previously
it was being overwritten.
2. In glfs_free_from_ctx() and glfs_fini() check for
NULL before destroying.
Change-Id: If40bf69936b873a1da8e348c9d92c66f2f07994b
BUG: 1202290
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/9895
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the only way to retrieve the number of files/objects in a
directory or volume is to do a crawl of the entire directory/volume.
This is expensive and is not scalable.
The new mechanism proposes to store count of objects/files as part of
an extended attribute of a directory. Each directory's extended
attribute value will indicate the number of files/objects present
in a tree with the directory being considered as the root of the tree.
Currently file usage is accounted in marker by doing multiple FOPs
like setting and getting xattrs. Doing this with STACK WIND and
UNWIND can be harder to debug as involves multiple callbacks.
In this code we are replacing current mechanism with syncop approach
as syncop code is much simpler to follow and help us implement inode
quota in an organized way.
Change-Id: Ibf366fbe07037284e89a241ddaff7750fc8771b4
BUG: 1188636
Signed-off-by: vmallika <vmallika@redhat.com>
Signed-off-by: Sachin Pandit <spandit@redhat.com>
Reviewed-on: http://review.gluster.org/9567
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several features - e.g. encryption, erasure codes, or NSR - involve
multiple cooperating translators which sometimes need a "private" means
of communication amongst themselves. Historically we've used virtual or
synthetic xattrs, but that's not very elegant and clutters up the
getxattr/setxattr path which must also handle real xattr requests. This
new fop should address that.
The only argument is an int32_t "op" which should be recognized by the
target translator. It is recommended that translators using these
feature follow some convention regarding the ops that they define, to
avoid conflicts. Using a hash of the target translator's type string as
a base for a series of ops would probably be a good start. Any other
information can be passed in both directions using xdata.
The default behavior for this fop, as with any other, is to pass through
to FIRST_CHILD. That makes use of this fop "transparent" to other
translators that were written before it existed, but it also means that
it only really works with pass-through translators. If a routing
translator (such as DHT) or a fan-out translator (such as AFR) is
involved, the IPC might not reach its intended destination unless those
translators are modified to forward IPC fops along all paths.
If an IPC gets all the way to storage/posix it is considered an error,
much like an uncaught exception. We don't actually *do* anything in
that case, but we do log it send back an EOPNOTSUPP error. This makes
the "unrecognized opcode" condition distinguishable from the "no IPC
support" condition (which would yield an RPC error instead) so clients
can probe for the presence of a handler for their own favorite opcode
and either use that or use old-school xattrs depending on the result.
BUG: 1158628
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Change-Id: I84af1b17babe5b30ec03ecf027ae37d09b873968
Reviewed-on: http://review.gluster.org/8812
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Framework on the server-side, to handle certain state of the files
accessed and send notifications to the clients connected.
A generic and extensible framework, used to maintain states in
the glusterfsd process for each of the files accessed
(including the clients info doing the fops) and send
notifications to the respective glusterfs clients incase of
any change in that state.
This patch handles "Inode Update/Invalidation" upcall event.
Feature page:
URL: http://www.gluster.org/community/documentation/index.php/Features/Upcall-infrastructure
Below link has a writeup which explains the code changes done -
URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/
Change-Id: Ie3d724be9a3419fcf18901a753e8ec2df2ac802f
BUG: 1200262
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-on: http://review.gluster.org/9535
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
CC libglusterfs_la-inode.lo
inode.c: In function 'inode_table_destroy':
inode.c:1630:19: warning: variable 'this' set
but not used [-Wunused-but-set-variable]
xlator_t *this = NULL;
Change-Id: If4b37ab896ee0a309826d4be48c6599d6ec2710b
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/9846
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anoop C S <achiraya@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Poornima G <pgurusid@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I41acd9970bef04bb16cd4d8532a84a95d5fb642a
BUG: 1199003
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>.
Reviewed-on: http://review.gluster.org/9810
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Parses linux style export file/netgroups file into a structure that
can be lookedup.
* This parser turns each line into a structure called an "export
directory". Each of these has a dictionary of hosts and netgroups
which can be looked up during the mount authentication process.
(See Change-Id Ic060aac and I7e6aa6bc)
* A string beginning withan '@' is treated as a netgroup and a string
beginning without an @ is a host.
(See Change-Id Ie04800d)
* This parser does not currently support all the options in the man page
('man exports'), but we can easily add them.
BUG: 1143880
URL: http://www.gluster.org/community/documentation/index.php/Features/Exports_Netgroups_Authentication
Change-Id: I181e8c1814d6ef3cae5b4d88353622734f0c0f0b
Original-author: Shreyas Siravara <shreyas.siravara@gmail.com>
CC: Richard Wareing <rwareing@fb.com>
CC: Jiffin Tony Thottan <jthottan@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/8758
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The /etc/exports format for NFS-exports (see Change-Id I7e6aa6b) allows
a more fine grained control over the authentication. This change adds
the functions and structures that will be used in by Change-Id I181e8c1.
BUG: 1143880
Change-Id: Ic060aac7c52d91e08519b222ba46383c94665ce7
Original-author: Shreyas Siravara <shreyas.siravara@gmail.com>
CC: Richard Wareing <rwareing@fb.com>
CC: Jiffin Tony Thottan <jthottan@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/9362
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
syncop_inodelk doesn't work properly as lk_owner is not set
in the frame created by 'synctask_create'.
There is a possibility that more than one thread can acquire inode lock
with syncop_inodelk
Change-Id: I8193edb0d24b3a6e3a3f6a0c5d7ab5a1be8e7daf
BUG: 1188636
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/9858
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipe2() doesn't works on Linux kernel version < 2.6.27 and
glibc < version 2.9. Hence replacing it with pipe(),
so that the build will not fail on Centos5.
Change-Id: If17aed0d51466cd7528cf8dde0edfa28b68139e5
BUG: 1200255
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/9844
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
|