| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libgfchangelog was not respecting the log_level configured
in Geo-replication. With this patch Libgfchangelog log level
can be configured using `config changelog_log_level TRACE`.
Default Changelog log level is INFO
BUG: 1363965
Change-Id: Ida714931129f6a1331b9d0815da77efcb2b898e3
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/15078
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Event Type defined in #15351 to avoid merge conflicts
Add geo-rep events applicable to changes in
geo-rep session in the server side.
Change-Id: Ia66574d2abccad7fce6a96667efbc7c6c8903fc6
BUG: 1370445
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/15328
Tested-by: Aravinda VK <avishwan@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, Data and Meta GFIDs are post processed. If Changelog has
UNLINK entry then remove from Data and Meta GFIDs list(If stat on GFID is
ENOENT in Master).
While processing Changelogs,
- Collect all the data and meta operations in a temporary database
- Delete all Data and Meta GFIDs which are already unlinked as per Changelogs
(unlink only if stat on GFID is ENOENT)
- Process all Entry operations as usual
- Process data and meta operations in batch(Fetch from Db in batch)
- Data sync is again batched based on number of changelogs(Default 1day
changelogs). Once the sync is complete, Update last Changelog's time as last_synced
time as usual.
Additionally maintain entry_stime on Brick root, ignore Entry ops if changelog
suffix time is less than entry_stime. If data stime is more than entry_stime,
this can happen only when passive worker updates stime by itself by getting
mount point stime. Use entry_stime = data_stime in this case.
New configurations:
max-rsync-retries - Default Value is 10
max-data-changelogs-in-batch - Max number of changelogs to be considered in a
batch for syncing. Default value is 5760(4 changelogs per min * 60 min *
24 hours)
max-history-changelogs-in-batch - Max number of history changelogs to be
processed at once. Default value 86400(4 changelogs per min * 60 min * 24
hours * 15 days)
BUG: 1364420
Change-Id: I7b665895bf4806035c2a8573d361257cbadbea17
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/15110
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add error logs if gf_history_changelog fails. If requested
changelog range is not available, log the error and exit
instead of continuing the loop and exiting in readdir
without logging. Also fixed the duplicate MSGID number in
'changelog-lib-messages.h'
Change-Id: Icd71b89ae23b48a71380657ba5649029c32fabfd
BUG: 1362151
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/15064
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If ssh returns 127 that means the remote gsyncd path is wrong
or push-pem failed during create. Existing error message was
pointing old documentation.
Change-Id: Ifbbb4a604fc0ae0fd5cb2746df6363bf28cde1e9
BUG: 1343943
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/14673
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle ESTALE returned by lstat gracefully
by retrying it. Do not crash the worker.
Change-Id: I2527cd8bd1f7d2428cb4fa3f20782bebaf2df12a
BUG: 1247529
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/11772
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While doing RMDIR worker gets ENOTEMPTY because same directory will
have files from other bricks which are not deleted since that worker
is slow processing. So geo-rep does recursive_delete.
Recursive delete was done using shutil.rmtree. once started, it will
not check disk_gfid in between. So it ends up deleting the new files
created by other workers. Also if other worker creates files after one
worker gets list of files to be deleted, then first worker will again
get ENOTEMPTY again.
To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA.
And disk_gfid check added for original path for which recursive_delete is
called. This disk gfid check executed before every Unlink/Rmdir. If disk
gfid is not matching with GFID from Changelog, that means other worker
deleted the directory. Even if the subdir/file present, it belongs to
different parent. Exit without performing further deletes.
Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR
failed with ENOENT will not succeed unless parent directory is created
again.
Rsync errors handling was handling unlinked_gfids_list only for one
Changelog, but when processed in batch it fails to detect unlinked_gfids
and retries again. Finally skips the entire Changelogs in that batch.
Fixed this issue by moving self.unlinked_gfids reset logic before batch
start and after batch end.
Most of the Geo-rep races with rm -rf is eliminated with this patch,
but in some cases stale directories left in some bricks and in mount
point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log)
BUG: 1211037
Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/10204
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ENTRY operations failures on slave left no trace for debugging purposes.
This patch captures such failures on slave cluster and forwards them to
the master and logs them. Failures of specific interest are the ones
which return code EEXIST on the failing operations.
Change-Id: Iecab876f16593c746d53f4b7ec2e0783367856bb
BUG: 1207115
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/10048
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Making geo-rep use the common storage shared by nfs,
snapshot and geo-rep. The meta volume should be named
as gluster_shared_storage, and it should be mounted
at "/var/run/gluster/shared_storage/".
geo-rep will have create a directory called 'geo-rep'
in the meta-volume and all the lock files are created
inside it.
Change-Id: I82d0bff9be191f75f643606a9a21d53559047ac4
BUG: 1210344
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/10196
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CURRENT DESIGN AND ITS LIMITATIONS:
-----------------------------------
Geo-replication syncs changes across geography using changelogs captured
by changelog translator. Changelog translator sits on server side just
above posix translator. Hence, in distributed replicated setup, both
replica pairs collect changelogs w.r.t their bricks. Geo-replication
syncs the changes using only one brick among the replica pair at a time,
calling it as "ACTIVE" and other non syncing brick as "PASSIVE".
Let's consider below example of distributed replicated setup where
NODE-1 as b1 and its replicated brick b1r is in NODE-2
NODE-1 NODE-2
b1 b1r
At the beginning, geo-replication chooses to sync changes from NODE-1:b1
and NODE-2:b1r will be "PASSIVE". The logic depends on virtual getxattr
'trusted.glusterfs.node-uuid' which always returns first up subvolume
i.e., NODE-1. When NODE-1 goes down, the above xattr returns NODE-2 and
that is made 'ACTIVE'. But when NODE-1 comes back again, the above xattr
returns NODE-1 and it is made 'ACTIVE' again. So for a brief interval of
time, if NODE-2 had not finished processing the changelog, both NODE-2
and NODE-1 will be ACTIVE causing rename race as mentioned in the bug.
SOLUTION:
---------
1. Have a shared replicated storage, a glusterfs management volume specific
to geo-replication.
2. Geo-rep creates a file per replica set on management volume.
3. fcntl lock on the above said file is used for synchronization
between geo-rep workers belonging to same replica set.
4. If management volume is not configured, geo-replication will back
to previous logic of using first up sub volume.
Each worker tries to lock the file on shared storage, who ever wins will
be ACTIVE. With this, we are able to solve the problem but there is an
issue when the shared replicated storage goes down (when all replicas
goes down). In that case, the lock state is lost. So AFR needs to rebuild the
lock state after brick comes up.
NOTE:
-----
This patch brings in the, pre-requisite step of setting up management volume
for geo-replication during creation.
1. Create mgmt-vol for geo-replicatoin and start it. Management volume should
be part of master cluster and recommended to be three way replicated
volume having each brick in different nodes for availability.
2. Create geo-rep session.
3. Configure mgmt-vol created with geo-replication session as follows.
gluster vol geo-rep <mastervol> slavenode::<slavevol> config meta_volume \
<meta-vol-name>
4. Start geo-rep session.
Backward Compatiability:
-----------------------
If management volume is not configured, it falls back to previous logic of
using node-uuid virtual xattr. But it is not recommended.
Change-Id: I7319d2289516f534b69edd00c9d0db5a3725661a
BUG: 1196632
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/9759
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shutil.rmtree was failing to remove file if file was not
exists. Added error handling function to ignore ENOENT if
a file/dir not present.
BUG: 1198101
Change-Id: I1796db2642f81d9e2b5e52c6be34b4ad6f1c9786
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/9792
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iacc67e4ba9ac45e0858f3befe84ffb8fccf7e1c3
BUG: 1075417
Signed-off-by: arao <arao@redhat.com>
Reviewed-on: http://review.gluster.org/9502
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changelog consumption/processing now happens in seperate process
group than monitor. When monitor process group gets SIGSTOP all
worker process, ssh, rsync will be paused except the changelog
processing. When it gets SIGCONT it resumes its operation.
Changelog agent runs as RepceServer, geo-rep worker communicates
with changelog agent using RepceClient.
Change-Id: I35c333e4d8b13d03a7808aed601960eef23cfa04
BUG: 1093602
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/7322
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In source install, libgfchangelog is installed in /usr/local/lib
When glusterd runs /usr/local/libexec/glusterfs/python/gsyncd --version
it fails to find library without LD_LIBRARY_PATH.
This patch avoids loading library when it is run from glusterd
during start.
BUG: 1096026
Change-Id: I59912227ac27ff4877d947a7c8f1fe2e8c5be06e
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/7713
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every time when geo-rep restarts it first does FS crawl using
XCrawl and then switches to Changelog Mode. This is because changelog
only had live API, that is we can get changes only after registering.
Now this(http://review.gluster.org/#/c/6930/) patch introduces History
API for changelogs. If history is available then geo-rep will use it
instead of FS Crawl.
History API returns TS till what time history is available for
given start and end time. If TS < endtime then switch to FS Crawl.
(History => FS Crawl => Live Changelog)
If TS >= endtime, then switch directly to Changelog mode
(History => Live Changelog)
Change-Id: I4922f62b9f899c40643bd35720e0c81c36b2f255
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/6938
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pep8 is a style guide for python.
http://legacy.python.org/dev/peps/pep-0008/
pep8 can be installed using, `pip install pep8`
Usage: `pep8 <python file>`, For example, `pep8 master.py`
will display all the coding standard errors.
flake8 is used to identify unused imports and other issues
in code.
pip install flake8
cd $GLUSTER_REPO/geo-replication/
flake8 syncdaemon
Updated license headers to each source file.
Change-Id: I01c7d0a6091d21bfa48720e9fb5624b77fa3db4a
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/7311
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-> "threaded" hybrid crawl.
-> Enabling metatadata synchronization.
-> Handling EINVAL/ESTALE gracefully while syncing metadata.
-> Improvments to changelog crawl code.
-> Initial crawl changelog generation format.
-> No gsyncd restart when checkpoint updated.
-> Fix symlink handling in hybrid crawl.
-> Slave's xtime key is 'stime'.
-> tar+ssh as data synchronization.
-> Instead of 'raise', just log in warning level for xtime missing cases.
-> Fix for JSON object load failure
-> Get new config value after config value reset.
-> Skip already processed changelogs.
-> Saving status of each individual worker thread.
-> GFID fetch on slave for purges.
-> Add tar ssh keys and config options.
-> Fix nlink count when using backend.
-> Include "data" operation for hardlink.
-> Use changelog time prefix as slave's time.
-> Process changelogs in parallel.
Change-Id: I09fcbb2e2e418149a6d8435abd2ac6b2f015bb06
BUG: 1036539
Signed-off-by: Ajeet Jha <ajha@redhat.com>
Reviewed-on: http://review.gluster.org/6404
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I8961633a7371c941a3feee44c949d5c934eca998
Original-Author: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 847839
Reviewed-on: http://review.gluster.org/5933
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends the persistent instrumentation work done by
Aravinda (@avishwa), by introducing a handfull of instrumentation
variables for crawl. These variables are "pulled up" by glusterd
in the event of a geo-replication status cli command and looks
something like below:
"Uptime=00:21:10;FilesSyned=2982;FilesPending=0;BytesPending=0;DeletesPending=0;"
"FilesPending", "BytesPending" and "DeletesPending" are short-lived
variables that are non-zero when a changelog is being processes (ie.
when an active sync in ongoing). After a successfull changelog process
"FilesPending" is summed up into "FilesSynced". The three short-lived
variabled are then reset to zero and the data is persisted
Additionally this patch also reverts some of the changes made for
BZ #986929 (those were not needed).
Change-Id: I948f1a0884ca71bc5e5bcfdc017d16c8c54fc30b
BUG: 990420
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/5441
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A hostname fqdn can be of length 255 according to RFC1123
------------------------->
/usr/include/bits/posix1_lim.h:#define _POSIX_HOST_NAME_MAX 255
<-------------------------
On linux this length is 64
------------------------->
/usr/include/bits/local_lim.h:#define HOST_NAME_MAX 64
<-------------------------
When a given hostname is > 45 (characters) - SSH fails with
-------------------------->
"ControlPath too long for Unix domain socket".
<--------------------------
Indicating that the total length of ControlPath which is
on linux should be 108
------------------------->
/usr/include/linux/un.h:#define UNIX_PATH_MAX 108
<-------------------------
This leads to "faulty" geo-replication status.
This patch brings in a new file called manifest which carries
given a geo-rep session some unique information - with which
a unique `md5` is generated in a 32length digest, this ensures
that we don't exceed UNIX_PATH_MAX limitations instead we use
a conservative approach and still be able to provide a unique
socket path.
Change-Id: I3a6a27d605d751a86e7c82eace4561d9b0134fe1
BUG: 990330
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/5681
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* also consume changelog for change detection.
* Status fixes
* Use new libgfchangelog done API
* process (and sync) one changelog at a time
Change-Id: I24891615bb762e0741b1819ddfdef8802326cb16
BUG: 847839
Original Author: Csaba Henk <csaba@redhat.com>
Original Author: Aravinda VK <avishwan@redhat.com>
Original Author: Venky Shankar <vshankar@redhat.com>
Original Author: Amar Tumballi <amarts@redhat.com>
Original Author: Avra Sengupta <asengupt@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/5131
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
Change-Id: Ibd0faefecc15b6713eda28bc96794ae58aff45aa
BUG: 847839
Original Author: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/5133
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|