| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the initial version of the Changelog Translator.
What is it
-----------
Goal is to capture changes performed on a GlusterFS volume.
The translator needs to be loaded on the server (bricks) and
captures changes in a plain text file inside a configured
directory path (controlled by "changelog-dir", should be
somewhere in <export>/.glusterfs/changelog by default).
Changes are classified into 3 types:
- Data: : TYPE-I
- Metadata : TYPE-II
- Entry : TYPE-III
Changelog file is rolled over after a certain time interval
(defauls to 60 seconds) after which a changelog is started.
The thing to be noted here is that for a time interval
(time slice) multiple changes for an inode are recorded only
once (ie. say for 100+ writes on an inode that happens within
the time slice has only a single corresponding entry in the
changelog file). That way we do not bloat up the changelog
and also save lots of writes.
Changelog Format
-----------------
TYPE-I and TYPE-II changes have the gfid on the entity on
which the operation happened. TYPE-III being a entry op
requires the parent gfid and the basename. Changelog format
has been kept to a minimal and it's upto the consumers to
do the heavy loading of figuring out deletes, renames etc..
A single changelog file records all three types of changes,
with each change starting with an identifier ("D": DATA,
"M": METADATA and "E": ENTRY). Option is provided for the
encoding type (See TUNABLES).
Consumers
----------
The only consumer as of today would be geo-replication, although
backup utilities, self-heal, bit-rot detection could be possible
consumers in the future.
CLI
----
By default, change-logging is disabled (the translator is present
in the server graph but does nothing). When enabled (via cli) each
brick starts to log the changes. There are a set of tunable that
can be used to change the translators behaviour:
- enable/disable changelog (disabled by default)
gluster volume set <volume> changelog {on|off}
- set the logging directory (<brick>/.glusterfs/changelogs is the
default)
gluster volume set <volume> changelog-dir /path/to/dir
- select encoding type (binary (default) or ascii)
gluster volume set <volume> encoding {binary|ascii}
- change the rollover time for the logs (60 secs by default)
gluster volume set <volume> rollover-time <secs>
- when secs > 0, changelog file is not open()'d with O_SYNC flag
- and fsync is trigerred periodically every <secs> seconds.
gluster volume set <volume> fsync-interval <secs>
features/changelog: changelog consumer library (libgfchangelog)
A shared library is provided for the consumer of the changelogs
for easy acess via APIs. Application can link against this library
and request for changelog updates. Conversion of binary logs to
human-readable ascii format is also taken care by the library which
keeps a copy of the changelog in application provided working
directory.
Change-Id: I75575fb7f1c53d2bec3dba1a329ea7bb3c628497
BUG: 847839
Original Author: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/5127
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* files can be accessed directly through their gfid and not just
through their paths. For eg., if the gfid of a file is
f3142503-c75e-45b1-b92a-463cf4c01f99, that file can be accessed
using <gluster-mount>/.gfid/f3142503-c75e-45b1-b92a-463cf4c01f99
.gfid is a virtual directory used to seperate out the namespace
for accessing files through gfid. This way, we do not conflict with
filenames which can be qualified as uuids.
* A new file/directory/symlink can be created with a pre-specified
gfid. A setxattr done on parent directory with fuse_auxgfid_newfile_args_t
initialized with appropriate fields as value to key "glusterfs.gfid.newfile"
results in the entry <parent>/bname whose gfid is set to args.gfid. The
contents of the structure should be in network byte order.
struct auxfuse_symlink_in {
char linkpath[]; /* linkpath is a null terminated string */
} __attribute__ ((__packed__));
struct auxfuse_mknod_in {
unsigned int mode;
unsigned int rdev;
unsigned int umask;
} __attribute__ ((__packed__));
struct auxfuse_mkdir_in {
unsigned int mode;
unsigned int umask;
} __attribute__ ((__packed__));
typedef struct {
unsigned int uid;
unsigned int gid;
char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid string
* in canonical form.
*/
unsigned int st_mode;
char bname[]; /* bname is a null terminated string */
union {
struct auxfuse_mkdir_in mkdir;
struct auxfuse_mknod_in mknod;
struct auxfuse_symlink_in symlink;
} __attribute__ ((__packed__)) args;
} __attribute__ ((__packed__)) fuse_auxgfid_newfile_args_t;
An initial consumer of this feature would be geo-replication to
create files on slave mount with same gfids as that on master.
It will also help gsyncd to access files directly through their
gfids. gsyncd in its newer version will be consuming a changelog
(of master) containing operations on gfids and sync corresponding
files to slave.
* Also, bring in support to heal gfids with a specific value.
fuse-bridge sends across a gfid during a lookup, which storage
translators assign to an inode (file/directory etc) if there is
no gfid associated it. This patch brings in support
to specify that gfid value from an application, instead of relying
on random gfid generated by fuse-bridge.
gfids can be healed through setxattr interface. setxattr should be
done on parent directory. The key used is "glusterfs.gfid.heal"
and the value should be the following structure whose contents
should be in network byte order.
typedef struct {
char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid
* string in canonical form
*/
char bname[]; /* a null terminated basename */
} __attribute__((__packed__)) fuse_auxgfid_heal_args_t;
This feature can be used for upgrading older geo-rep setups where gfids
of files are different on master and slave to newer setups where they
should be same. One can delete gfids on slave using setxattr -x and
.glusterfs and issue stat on all the files with gfids from master.
Thanks to "Amar Tumballi" <amarts@redhat.com> and "Csaba Henk"
<csaba@redhat.com> for their inputs.
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Change-Id: Ie8ddc0fb3732732315c7ec49eab850c16d905e4e
BUG: 952029
Reviewed-on: http://review.gluster.com/#/c/4702
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/4702
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an inode/dentry is linked via a client and removed via a
separate client, the inode/dentry mapping in the initial client
remains. A lookup of the removed name on the initial client
typically returns ENOENT once the associated caches expire. If the
initial client has multiple dentries linked to the same inode,
however, lookups on the non-removed dentry create windows of time
where lookups on the stale/removed name return successfully. This
occurs because the stale mapping resolves to the still valid inode
and tricks md-cache into returning valid lookup data.
To correct this situation, unlink the stale inode mapping on a
failed (ENOENT) revalidation lookup (i.e., when fuse has resolved
the inode but a lookup returns ENOENT). Note that with this change,
the state still occurs until an md-cache window has expired,
allowed a lookup to pass through to the server and given the fuse
translator an opportunity to clean up.
Change-Id: I47dde2f11e2ef5b8dd51e9ac8be0f36cdb5081a3
BUG: 985074
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/5337
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following is the semantics of the 'cmd':
1) If @domain is NULL - returns no. of locks blocked/granted in all domains
2) If @domain is non-NULL- returns no. of locks blocked/granted in that
domain
3) If @domain is non-existent - returns '0'; This is important since
locks xlator creates a domain in a lazy manner.
where @domain - a string representing the domain.
Change-Id: I5e609772343acc157ca650300618c1161efbe72d
BUG: 951195
Original-author: Krishnan Parthasarathi <kparthas@redhat.com>
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4889
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with the sequence of operations are like below, we have issues
with current code (MP == mountpoint):
T0,MP1# mkdir /abcd (Succeeds on hash_subvol)
T1,MP2# mkdir /abcd (Gets EEXIST as dir exists in hash_subvol)
T2,MP2# mkdir /.gfid/<abcd's gfid>/xyz (lookup happens on abcd's
gfid, calls dht_discover)
T3,MP1# (Completes mkdir(), goes to dir_selfheal to set the layouts).
T4,MP2# (dht_discover_cbk gets success for lookup as the entry
existed, as layout is not yet written, it says normalize
done, found holes).
T5,MP2# (as layout anomaly is not considered an issue in this patch,
dht_layout_set happens on inode, with all xlators pointing
to 0s)
T6,MP1# (completes mkdir call, inode has proper layouts)
T7,MP2# mkdir /.gfid/<abcd's gfid>/xyz fails with ENOENT
(with log saying no subvol found for hash value of xyz.
Porting Amar's fix from down-stream beta branch.
Change-Id: Ibdc37ee614c96158a1330af19cad81a39bee2651
BUG: 982913
Original-author: Amar Tumballi <amarts@redhat.com>
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5302
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If access fails with ENOTCONN, do not wind to same subvol.
We wind to first-up-subvol if access fails with ENOTCONN.
In few cases, if dht has only 1 subvolume, and access fails with
ENOTCONN, we go into a infinite loop of winding to same subvol
The fix is to check if we previously wound to same subvol, and
fail if first-up-subvol is same.
Change-Id: Ib5d3ce7d33e8ea09147905a7df1ed280874fa549
BUG: 983431
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5319
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using the new 'pluggable policies' API of libxlator.
Change-Id: Ie7528182dff8fb42c6e8287a106d3057944df775
BUG: 847839
Original Author: Csaba Henk <csaba@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4904
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The API is described in libxlator.h.
Behavior remains the same for this commit; this
is a preparatory step for per-translator customization
of aggregation.
Change-Id: I5d42923af59b2fd78e1ff59c12763875b57c5190
BUG: 847839
Original Author: Csaba Henk <csaba@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4903
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this works similar to pathinfo now except that the request is sent
to all subvolumes of dht. Underlying replica selects it's subvolume
in a round-robin fashion till one of them returns successfully.
Change-Id: Ie46c5f7090d04d8c2e487b209916ae6791e94624
BUG: 847839
Original Author: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/5225
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* in both distribute and replicate (ignoring stripe for now),
add logic to calculate the min() of stime values.
* What is a 'stime' ? Why is this required:
- stime means 'slave xtime', mainly used to keep track of slave
node's sync status when distributed geo-replication is used.
Logic of calculating 'min()' for this stime is very important as
in case of crashes/reboots/shutdown, we will have to 'restart'
with crawling from stime time value from the mount point, which
gives the 'min()' of all the bricks, which means, we don't miss
syncing any files in the above cases.
Change-Id: I2be8d434326572be9d4986db665570a6181db1ee
BUG: 847839
Original Author: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4893
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default fuse kernel readdirp usage in fuse xlator is off.
When mount option use-readdirp=yes is provided it starts using
fuse-kernel's readdirp.
Change-Id: Id37edc53b1adc1638186d956c2f74c1e4e48aa59
BUG: 983477
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5322
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a rename happens during migration, unlink fails. This leads
to stale xattrs, and Sticky bits still being set.
By removing the xattrs and Sticky bits from dst (through fd ops),
the stale link file would be cleaned up eventually (even if unlink
fails on src).
Change-Id: Iec537d021905438327a20e1d811aa06e74034364
BUG: 983399
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5316
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if linkfile fails with ENOENT, we do not fail. We also
need to treat failures with ENOTCONN as success, as if cached subvol
is up, rm of a file should succeed. A stale linkfile will get removed
later
Change-Id: I71d136847933351ed9e2c939bda4a69bc96a3cfc
BUG: 983416
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5317
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to prevent the possibility of a deadlock when
rpc_connection_cleanup being called in the same thread as rpc_clnt_unref
Change-Id: Ia4dcc0a8a6e6158d4ddec68b780fccbc4cd64adb
BUG: 962619
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5321
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New options being introduced in the master branch should now have
op-version set to the GD_OP_VERSION_MAX (3). Some of the options have
been backported to release-3.3 branch and hence should have their
op-version reduced. Some other options had op-version incorrectly set as
1.
Change-Id: If40325b7b2da7aa36f90261024117cd18cf51ef0
BUG: 981278
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5318
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently two keys are exposed:
'glusterfs.gfid' : output is 16byte binary gfid
'glusterfs.gfid.string' : output is 36 byte canonical format of gfid
e.g.
[root@supernova glusterfs]# getfattr -n glusterfs.gfid -e hex f0
glusterfs.gfid=0x68305acb73e541719804fcf36a4857e8
[root@supernova glusterfs]# getfattr -n glusterfs.gfid.string f0
glusterfs.gfid.string="68305acb-73e5-4171-9804-fcf36a4857e8"
early consumers for this key would be geo-replication
(as it has being designed to do namespace operations on
gfid from the mount point, thereby needing the GFID for
entry operations on the slave).
Change-Id: I10b23dbd11628566ad6924334253f5d85d01a519
BUG: 847839
Original Author: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/5129
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A new non-linked inode is added to lru list. Hence it might be possible
that gfid might be NULL when inode_dump is called. To pass asserts in
inode_path, we've to check for non-null gfid before invoking that
procedure.
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Change-Id: Iff14efc6d6e2faa33b9f7a81e0a66f6a947b77ed
BUG: 976189
Reviewed-on: http://review.gluster.org/5241
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently when selecting a alternative subvolume when hashed
subvol has exceeded min-free-disk/inodes, we do not check if
layouts have errors (including decommissioning). This leads
to data being written to those subvolumes, and in case of
decommissioning, will lead to data loss.
Change-Id: Ie0c6cf4a29d7c53d8a6d8a8c1bd595cf58a0012a
BUG: 982919
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5299
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: NFS allows exporting subdirectories but there is not support for
providing AUTH on per directory basis.
Fix: Modified nfs.export-dir to include AUTH parameters
e.g. nfs.export-dir "/dir1(10.1.1.2),/dir2(10.1.1.0/24|host1)
During mount operation NFS will check if the IP from where the connection is made
is configured in the AUTH parameter, else the mount operation will fail with
EACCES error.
Updated admin-guide and volume set help message.
Change-Id: I5c6d22edb168b4f46376d1cd6878cd065fc081cc
BUG: 968227
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/5124
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the logging levels from WARNING to DEBUG in the lookup path to
minimize incessant logging in case of gfid mismatch errors.
Change-Id: I631b16df3249cf826606f547531f985dac696088
BUG: 959083
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/4939
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, nufa wouldn't work on volume topologies such as
distribute-replicate or distribute-stripe.
Change-Id: Ia89ed4412a00601022c1fc94f046056ce4820fe8
BUG: 980838
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5262
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: If47e209cb61ea0eb74ee2d6ef9e9342b2d6ee13a
BUG: 980838
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5261
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Goal of this health-checker is to detect fatal issues of the underlying
storage that is used for exporting a brick. The current implementation
requires the filesystem to detect the storage error, after which it will
notify the parent xlators and exit the glusterfsd (brick) process to
prevent further troubles.
The interval the health-check runs can be configured per volume with the
storage.health-check-interval option. The default interval is 30
seconds.
It is not trivial to write an automated test-case with the current
prove-framework. These are the manual steps that can be done to verify
the functionality:
- setup a Logical Volume (/dev/bz970960/xfs) and format is as XFS for
brick usage
- create a volume with the one brick
# gluster volume create failing_xfs glufs1:/bricks/failing_xfs/data
# gluster volume start failing_xfs
- mount the volume and verify the functionality
- make the storage fail (use device-mapper, or pull disks)
# dmsetup table
..
bz970960-xfs: 0 196608 linear 7:0 2048
# echo 0 196608 error > dmsetup-error-target
# dmsetup load bz970960-xfs dmsetup-error-target
# dmsetup resume bz970960-xfs
# dmsetup table
...
bz970960-xfs: 0 196608 error
- notice the errors caught by syslog:
Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 512
Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): I/O Error Detected. Shutting down filesystem
Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Jun 24 11:31:49 vm130-32 kernel: VFS:Filesystem freeze failed
Jun 24 11:31:50 vm130-32 GlusterFS[1969]: [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
Jun 24 11:32:09 vm130-32 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jun 24 11:32:20 vm130-32 GlusterFS[1969]: [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
- these errors are in the log of the brick as well:
[2013-06-24 10:31:50.500607] W [posix-helpers.c:1102:posix_health_check_thread_proc] 0-failing_xfs-posix: stat() on /bricks/failing_xfs/data returned: Input/output error
[2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
[2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
- the glusterfsd process has exited correctly:
# gluster volume status
Status of volume: failing_xfs
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick glufs1:/bricks/failing_xfs/data N/A N N/A
NFS Server on localhost 2049 Y 1897
Change-Id: Ic247fbefb97f7e861307a5998a9a7a3ecc80aa07
BUG: 971774
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/5176
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment data-self-heal acquires locks in following
pattern. It takes full file lock then gets xattrs on files on both
replicas. Decides sources/sinks based on the xattrs. Now it acquires
lock from 0-128k then unlocks the full file lock. Syncs 0-128k range
from source to sink now acquires lock 128k+1 till 256k then unlocks
0-128k, syncs 128k+1 till 256k block... so on finally it takes full file
lock again then unlocks the final small range block.
It decrements pending counts and then unlocks the full file lock.
This pattern of locks is chosen to avoid more than 1 self-heal
to be in progress. BUT if another self-heal tries to take a full
file lock while a self-heal is already in progress it will be put in
blocked queue, further inodelks from writes by the application will
also be put in blocked queue because of the way locks xlator grants
inodelks. So until the self-heal is complete writes are blocked.
Here is the code:
xlators/features/locks/src/inodelk.c - line 225
if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {
ret = -EAGAIN;
if (can_block == 0)
goto out;
gettimeofday (&lock->blkd_time, NULL);
list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);
}
This leads to hangs in applications.
Fix:
Since we want to prevent two parallel self-heals. We let them compete
in a separate "domain". Lets call the domain on which the locks have
been taken on in previous approach as "data-domain".
In the new approach When a self-heal is triggered,
it acquires a full lock in the new domain "self-heal-domain".
After this it performs data-self-heal using the locks in
"data-domain" as before.
unlock the full file lock in "self-heal-domain"
With this approach, application's writevs don't have to wait
in pending queue when more than 1 self-heal is triggered.
Change-Id: Id79aef3dfa888945977fb9758374ac41c320d0d5
BUG: 967717
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5100
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- afr_local_copy should not be memduping locked nodes, that would
mean that lock is taken in self-heal on those nodes even before
it actually takes the lock. So removed memdup code. Even entry
lock related copying (lockee info) is also not necessary for
self-heal functionality, so removing that as well. Since it is
not local_copy anymore changed its name.
- My editor changed tabs to spaces.
Change-Id: I8dfb92cb8338e9a967c06907a8e29a8404782d61
BUG: 967717
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5099
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I40eec20ca6b3f857245a2438883822e251077ee9
BUG: 979365
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5269
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment afr-flush makes sure that a delayed post-op
is woken up but it does not wait for it to complete the
post-op before flush unwinds.
These are the steps that are happening:
1) flush fop comes on an fd which wakes up a delayed post-op
and continues with the flush fop.
2) post-op sends fsync on the wire.
3) flush completes and unwinds to fuse.
4) graph switch happens on the fuse mount disconnecting the
old graph's client connections to bricks.
5) xattrop after fsync fails with ENOTCONN because the
connections from old graph are taken down now.
Fix:
Wait for post-op to complete before starting to flush.
We could make flush act similar to fsync (i.e.) wind
flush as is but wait for post-op to complete before unwinding
flush, but it is better to send flush as the final fop. So
wind of flush will start after post-op is complete. Had to
change fsync to accommodate this change.
Change-Id: I93aa642647751969511718b0e137afbd067b388a
BUG: 980548
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5274
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check if a previous remove-brick operation has been committed before
starting a new rebalance/remove-brick task.
Change-Id: I553e5ba64a6a352ca91032ab1a17997051a4494e
BUG: 963541
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/5019
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Currently whenever there is metadata split-brain, a variable
sh->op_failed is set to 1 to denote that self heal got failed.
But if we proceed for data self heal, even code-path of data
self heal also relies on the sh->op_failed variable. So if will
check for sh->op_failed variable and will eventually fails to
do data self heal. So needed a mechanism to allow data self heal
even if metadata is in split brain.
Fix:
Some data structure revamp is done in
http://review.gluster.com/#/c/5106/ fix and this patch is
based on the above fix. Now we can store which particular self-heal
got failed i.e GFID_OR_MISSING_ENTRY_SELF_HEAL, METADATA, DATA,
ENTRY. And we can do two types of self heal failure check.
1. Individual type check: We can check which among all four
(Metadata, Data, Gfid or missing entry, entry self heal)
got failed.
2. In afr_self_heal_completion_cbk, we need to make check
based on the fact that if any specific self heal got failed treat
the complete self heal as failure so that it will populate
corresponding circular buffer of event history accordingly.
Change-Id: Icb91e513bcc752386fc8a78812405cfabe5cac2d
BUG: 977797
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5253
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of triggering 4-5 error logs, when nfs is
disabled for all volumes, exit the process.
Change-Id: Ib286f143c4f74ba22f502aca0e7dcd0907db6563
BUG: 976750
Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com>
Reviewed-on: http://review.gluster.org/5245
Reviewed-by: Santosh Pradhan <spradhan@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If fdctx is NULL in afr_fsync, process crashes because
of NULL dereference.
Fix:
if fdctx is NULL, always say witnessed unstable write so
that fsyncs are done properly. Handled fdctx being null
in afr_delayed_changelog_post_op otherwise fsync stub is
never resumed and the mount was hanging.
Change-Id: Icacc900e9be63c29db3325cb0e19cc250adebaac
BUG: 978794
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I46badd812e9b936911ddd2793cef7ce30ec220a6
BUG: 979237
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5266
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bd-xlator can not be built successfully on certain Debian
distributions due to a missing declaration of lvm_lv_from_name(). This
function is available for linking, but it does not exist in the header
file.
This change adds a detection for lvm_lv_from_name() in both the library
for linking, and the declaration in the header file. If the 1st is
missing, the bd-xlator can not be built, and if only the 2nd one is
missing, we'll declare lvm_lv_from_name() ourselves. This makes it
possible to build the bd-xlator on the affected Debian distributions
too.
Change-Id: I0c823a7861b02bb5d9c1abb76ebfff92f272f9eb
BUG: 976946
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/5250
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We added this code as an interim fix until afr can
handle split-brains even when opens are not issued.
Afr code has matured to reject fd based fops when
there are split-brains so we can remove it.
Change-Id: Ib337f78eccee86469a5eaabed1a547a2cea2bdcf
BUG: 974972
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5227
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I764883811e30ca9d9c249ad00b6762101083a2fe
BUG: 976800
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5248
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If no decommissioned nodes options are set in the options, then
clear the conf->decommissioned_bricks.
Change-Id: I426d2bcc874aab21b2eba0b16a580b9a26672ea2
Signed-off-by: shishir gowda <sgowda@redhat.com>
BUG: 973073
Reviewed-on: http://review.gluster.org/5199
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Duplicate request cache provides a mechanism for detecting
duplicate rpc requests from clients. DRC caches replies
and on duplicate requests, sends the cached reply instead of
re-processing the request.
Change-Id: I3d62a6c4aa86c92bf61f1038ca62a1a46bf1c303
BUG: 847624
Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com>
Reviewed-on: http://review.gluster.org/4049
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Making the glusterd_store_* functions re-usable will help with future
changes that need to read/write lists of items.
BUG: 904065
Change-Id: I99fb8eced76d12d5a254567eccff9790b43d8da3
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/4676
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Nfs xlator never does open on a file for performing writes,
afr does not perform changelog wakeup for this fd so operations
which do metadata operations as soon as the data operations are
completed perceive a delay of 'post-op-delay-secs'.
Fix:
Perform changelog wakeup on anon-fd if the fd with same pid is
not present in inode-list.
Note:
This approach is a short-term fix. A proper fix needs a new domain
for taking metadata locks so that data/metadata locks don't compete
with each other.
Change-Id: I253afb289eadf30c7951e56fb2c4840d7132f5e4
BUG: 966018
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5066
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* __mnt3_resolve_export_subdir_comp: if nfs_entry_loc_fill fails,
mres->resolveloc does not contain valid data
* gf_log should use 'gfid' instead of mres->resolveloc.inode->gfid
* fixes a crash if program flow gets to this line
Change-Id: Idb0d6f97ea73eaf9056d28267ad7a42aa8cf6579
Signed-off-by: Michael Brown <michael@netdirect.ca>
Reviewed-on: http://review.gluster.org/4948
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we have tuple (server-xlator-name, callid) for identifying
a call. However it does not uniquely identify the operation when there
are multiple clients (since operations from all clients go through same
server). Adding connection-id resolves this ambiguity.
Also printing connection-id helps diagnose failures associated with
connection state (like fds, locks).
Change-Id: I13563bd06ee9b72fc1a10d239f77db5183658573
BUG: 963540
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Reviewed-on: http://review.gluster.org/5011
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia8e1af082078f2f791708ba4faa4992bf291dd6e
BUG: 961339
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/5023
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
rpc_transport object, which is part of rpc_clnt, is destroyed
prematurely. This is because, rpc_transport object is ref'd by socket
layer and rpc layer. These ref's, until the synctask'izing of
operations, were unref'd sequentially in the epoll thread.
With more threads at play, the sequential unref guarantee is off.
Fix:
Shutting down the transport before proceeding with cleaning up of
rpc_clnt object would serialize the unref's on the rpc_transport object
and thus eliminating the race.
Also, we don't store the address of brickinfo in brick's rpc notify
function, to avoid the possibility of referring a freed brickinfo.
Instead we use a string based id to 'reach' the corresponding brickinfo.
Change-Id: If2739e2eeaee1e8b071ab2b6754b7ea0f81cfceb
BUG: 962619
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5000
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
log the state mount errors only occasionally so
as not to fill log file with too many of them.
Change-Id: Ib5a2485dc2ce3a181cff34bbb6d7aba17a2e4d4d
BUG: 804301
Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com>
Reviewed-on: http://review.gluster.org/5229
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While looking at the newly introduced procedures FALLOCATE and DISCARD,
it seems that these were added with already existing procedure numbers.
This makes the protocol incompatible with existing roll-outs.
It is very confusing when new procedures are added somewhere in the
middle of the array. This will cause the number of existing procedures
to change. It is much preferred to add new procedures at the end of the
array. This changes not only corrects the enum that generates the
procedure numbers, but also the ordering in the client and server
fops-array for clarity.
Correcting this greatly simplifies adding support for these new
procedures in Wireshark and will prevent confusion to the people reading
network traces (with or without Wireshark).
Change-Id: Ib9e7978531d016c7230d756b855cb94cb0793b0f
BUG: 974976
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/5215
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Option to disable or enable acl with nfs.acl boolean
option.
2. Deregister the acl service with the portmapper service
when no longer required.
Change-Id: I6562b6b40138d040aa2bf1e5641f4c0e0e9f9d09
BUG: 970070
Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com>
Reviewed-on: http://review.gluster.org/5136
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
store being glusterd's persistent store under /var/lib/glusterd/
Change-Id: I1c01a09a8ce4a73ea612f05e7f14d4ab39ad1628
BUG: 971796
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5177
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
As the end of the self heal, message logged by
"afr_self_heal_completion_cbk" is inadequate to determine what exactly failed
during the course of afr self heal. It is worth to have knowledge of what all
types of self heal got triggered for an entity and whether the status is success
or failure.
Fix:
At the end of self heal, it will log information about out of 4 types of self
heal (gfid or missing entry self heal, metadata, data and entry self heal),
who all got triggered and who all got failed or successful at the end.
Change-Id: I5360762fbd7d391ac4c6af6706b4835c5801835a
BUG: 968301
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5106
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the DISCARD file operation. Discard punches a hole
in a file in the provided range. Block de-allocation is implemented
via fallocate() (as requested via fuse and passed on to the brick
fs) but a separate fop is created within gluster to emphasize the
fact that discard changes file data (the discarded region is
replaced with zeroes) and must invalidate caches where appropriate.
BUG: 963678
Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/5090
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support for the fallocate file operation. fallocate
allocates blocks for a particular inode such that future writes
to the associated region of the file are guaranteed not to fail
with ENOSPC.
This patch adds fallocate support to the following areas:
- libglusterfs
- mount/fuse
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
BUG: 949242
Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4969
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|