| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Users are still using geo-rep with the old, deprecated, insecure, unsupported
ssh setup. Not their fault -- the implementation of the new method had the
following charasteristics:
- old method is possible, but with default settings it's not working
- it can be made operational by fiddling with "remote-gsyncd" tunable
- with default setting, an unhelpful, actually misleading error message is
produced
- the UI gave no hint to the changes in the ssh setup
http://review.gluster.org/4392 tried to fix these; what it accomplished was
unrestricted support to the bad practice (by making the default old setup
operational).
From this on:
- we disable the old method by reserving the "remote-gsyncd" tunable
- if the old method is attempted, give a hint what to do
Change-Id: Icade94725d8d8d2d4c89cab992d4226351637b86
BUG: 895656
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: http://review.gluster.org/4602
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
http://www.python.org/dev/peps/pep-0352/ explains that the .message
property of BaseException is being removed. Most of the other exception
handlers access <Exception>.args[] which should be suitable for this
case too.
Change-Id: I1810450b78d2b3d7f8bd07f2beb02cbe9e2adecb
BUG: 888346
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/4328
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- gluster vol geo-rep M S conf checkpoint <LABEL|now>
sets a checkpoint with LABEL (the keyword "now" is special,
it's rendered to the label "as of <timestamp of current time>")
that's used to refer to the checkpoint in the sequel.
(Technically, gsyncd makes a note of the xtime of master's root
as of setting the checkpoint, called the "checkpoint target".)
- gluster vol geo-rep M S conf \!checkpoint
deletes the checkpoint.
- gluster vol geo-rep M S stat
if status is OK, and there is a checkpoint configured, the checkpoint
info is appended to status (either "not yet reached", or
"completed at <timestamp of completion>").
(Technically, the worker runs a thread that monitors / serializes /
verifies checkpoint status, and answers checkpoint status requests
through a UNIX socket; monitoring boils down to querying the xtime
of slave's root and comparing with the target.)
- gluster vol geo-rep M S conf log-file | xargs grep checkpoint
displays the checkpoint history. Set, delete and completion events
are logged properly.
Change-Id: I4398e0819f1504e6e496b4209e91a0e156e1a0f8
BUG: 826512
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: http://review.gluster.com/3491
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Regarding issue of leftover ssh control dirs:
If master side worker is stuck in connection establishment
phase, have the monitor kill it softly (ie. first by SIGTERM,
to let it cleanup). This is trickier than sounds on first hearing,
because if worker is stuck in waiting for a RePCe answer
(in threading.Condition().wait()), then SIGTERM is ignored
(more precisely, Python holds it back for the wait and resends it to
itself when wait is over).
So instead of signalling the worker only, we send TERM to the
whole process group -- that brings down the ssh connection, which
wakes up the waiting worker, which then can cleanup. Only problem
is that monitor is also in the process group and it should not coomit
a suicide. That is taken care by setting up a one-time SIGTERM
handler in the monitor.
- Regarding slave gsyncd stuck in chdir:
Slave gsyncd is usually well behaved: if master does not send
keepalives, it takes care to exit. However, if a hang occurs
in early phase, when slave is to change to the gluster mountpoint,
no timeout is set up for that (and unlike on master side, neither
is there an external actor like the monitor to do that).
So, to manage this scenario, we do the chdir in a (supposedly)
short lived thread, and in the main thread we wait for the termination
of this thread. If that does not happen within the time limit, main
thread calls for cleanup and exit. (This logic explicitely takes the
appropriate action in the cases when chdir succeeds or when hangs;
but what about the remaining case, when chdir fails? Well in that case
the chdir thread's exception handler will put the process to
cleanup and exit route.)
Change-Id: I6ad6faa9c7b1c37084d171d1e1a756abaff9eba8
BUG: 786291
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: http://review.gluster.com/3376
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the bugs to fix were found by the following stress-test:
make "glusterfs --client-pid=-1" exit immediately on slave
side.
Also fix eintr_wrap which should not "adopt" exceptions generated
by the wrapped call, by re-raising them as GsyncdError.
Change-Id: Ia0d39e0635975ebbbf98d86e1e26f3122e1ed6ff
BUG: 764678
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: http://review.gluster.com/3258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't dump stack, rather log the "glusterfs session went down" message.
If the aux glusterfs is already dead when we try to do some file
operation, we get a failure with ENOTCONN, which is already handled
as above. However, it's also possible that glusterfs dies while we
are in a syscall into it -- in that case we get ECONNABORTED, and
so far then we end up with an ugly stack strace. From now on we
take ECONNABORTAD as well into consideration.
Nb. wrt. testing: it's not easy to synthetically force the aux glusterfs
to end this way; for that we have to provoke gsyncd into intensive
synchronization. I succeeded in that with the following ruby oneliner:
ruby -rcgi -e '
Dir.chdir($*[0])
a=[]
Thread.new { loop { while a.size >= 100; File.delete a.shift; end; sleep 1 }}
loop { a<<CGI.escape(STDIN.read 10); open(a[-1], "w") {}}' MTPT < /dev/urandom
where the geo-rep master is mounted at MTPT. With this going on, deliver a
SIGKILL to the geo-rep session's aux glusterfs. (It is giving ECONNABORTED
non-deterministically, actually in the minority of cases.)
Change-Id: I24fd8d0295cdba91d8b994057a1255ca8e2d1a67
BUG: 764510
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: http://review.gluster.com/3078
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I91fe16d7e5e4c21f138eab4ee0b9334aec40e41b
BUG: 765433
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: http://review.gluster.com/2838
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I2eef82faab3eed1189e3786a5dca296773e1caa0
BUG: 784498
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.com/2690
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rotating geo-replication master/monitor log files from cli.
On invocation, the log file for a given master-slave session
is backed up with the current timestamp suffixed to the file
name and signal is sent to gsyncd to start logging to a new
log file.
Sample commands:
* Rotate log file for this <master>:<slave> session:
gluster volume geo-replication <master> <slave> log-rotate
* Rotate log files for all session for master volume <master>
gluster volume geo-replication <master> log-rotate
* Rotate log files for all sessions:
gluster volume geo-replication log-rotate
Change-Id: I75f641b4e082a04d5373c18583ca4a1d9651d27a
BUG: 3519
Reviewed-on: http://review.gluster.com/529
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Csaba Henk <csaba@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When this option is set, a file deleted on master will not trigger
a delete operation on the slave. Hence, the slave will remain as a
superset of the master and can be used to recover the master in case
of crash and/or accidental deletes.
This options is not enabled by default.
Change-Id: I9244d9dfa4f38f19436036f36bec0d9c3a1f7993
BUG: 3552
Reviewed-on: http://review.gluster.com/426
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Csaba Henk <csaba@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gsyncd:
- mounting code is split to a direct and a mountbroker based backend
- option gluster-command gone
- new options: gluster-params, gluster-cli-options, mountbroker
- mountbroker mount backend is used if either a mountbroker label
is given through the mountbroker option, or if gsyncd is
unprivileged; in this case the username is used as label
- have gluster cli invocations log to stderr so that we don't
hit a permission issue with the logfiles
glusterd:
- do gsyncd pre-config with new options
- add option geo-replication-log-group, so if that specified
geo-rep logfile directories are given to that group (and
thus members of the given group can do logging there)
This is just WIP as geo-rep relies on trusted extended attributes
and those are not accessible for unprivileged users. Even if we
solved this issue, glusterd security settings are too coarse,
so that if we made it possible for an unprivileged gsyncd
to operate, we would open up too far.
Change-Id: Icd520b58cbadccea3fad7c0f437b99de1e22db14
BUG: 2825
Reviewed-on: http://review.gluster.com/399
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Also add __codecheck script which can verify if source is OK at the
syntactical level with a given Python interpreter.
Change-Id: Ieff34bcd3efd1cdc0e8f9a510c05488f35897bbe
BUG: 1570
Reviewed-on: http://review.gluster.com/320
Reviewed-by: Kaushik BV <kaushikbv@gluster.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I559e6a0709b8064cfd54c693e289c741f9c4c4ab
BUG: 1570
Reviewed-on: http://review.gluster.com/319
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushik BV <kaushikbv@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use subprocess module instead of os.spawn* / ad-hoc fork/exec.
With this, we do now:
- close uneeded files in children
- watch childrens' stderr:
- have a thread which collects childrens' stderr into a ring buffer
(so that stderr pipe doesn't get stuffed)
- on command failure show stderr
- distinguish between rsync exit values, tolerate only partial errors
- if connection is broken to slave, show ssh/slave gsycd's stderr
Change-Id: Ia92f57b5bdfa47f8c44375c50cf279006a0bf69b
BUG: 2946
Reviewed-on: http://review.gluster.com/85
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Kaushik BV <kaushikbv@gluster.com>
Reviewed-by: Kaushik BV <kaushikbv@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- exceptions raised by us will be logged as single-line error messages
(full stack strace is shown only at DEBUG loglevel)
- common/well understood exceptions are mapped to "user-parsable" error logs
Change-Id: I75f1fb848483372364b2093878d9cfed576c9739
BUG: 2778
Reviewed-on: http://review.gluster.com/125
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Iedd8c0ce9dec2d8dcb01e0e5b409cb53185b1716
BUG: 2778
Reviewed-on: http://review.gluster.com/82
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushik BV <kaushikbv@gluster.com>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2736 (gsyncd hangs if crash occurs in the non-main thread)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2736
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final cleanup sequence + call to _exit, which was just done in the main
thread, now is called for in each thread when the thread crashes. Seems we
aren't left there hanging this way.
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2736 (gsyncd hangs if crash occurs in the non-main thread)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2736
|
|
|
|
|
|
|
|
| |
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2659 (gsync config-del option is not working properly)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2659
|
|
|
|
|
|
|
|
| |
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2736 (gsyncd hangs if crash occurs in the non-main thread)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2736
|
|
|
|
|
|
|
|
| |
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2659 (gsync config-del option is not working properly)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2659
|
|
|
|
|
|
|
|
| |
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2537 (gsync autorestart)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2537
|
|
|
|
|
|
|
|
|
|
|
|
| |
So updating the config file from multiple contexts won't mess it up.
This prepares the next commit where we'll set options internaly (which lacks
the serial nature of user actions).
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2537 (gsync autorestart)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2537
|
|
form
Signed-off-by: Kaushik BV <kaushikbv@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1570 (geosync related changes)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1570
|